I've done deepfake work for clients who shot their own footage, and my suggestions would be:
Aim for about 15-20 minutes of high-quality video footage of each person's face to obtain enough images to make a face set for each person. Medium distance, front face views, as in an interview, work best.
The background of the face should be a neutral color or darker than the face to be swapped. Avoid bright light behind or above the face, even lighting is best. Fast motion of the head and motion blur is problematic.
As torzdf said, the video of each person's face should contain a variety of face angles, lighting settings, and expressions. Think of the model as trying to re-create a 3D image of the face, or of the head moving 180 degrees from shoulder to shoulder.
Profiles or side views of a face can't be easily deepfaked. Extreme close-ups of the face should be avoided, because of the model's face resolution limitations. The face is typically trained at 128x128 to 256x256 pixels and that is a tiny portion of the whole video frame.