New to the project, apologies if this is covered somewhere. Search didn't turn up what I thought were relevant results. I want to be able to train a model for a source (aka "A") based on a video from a vidconf meeting, then give do a faceswap on that participant and give him or her the results later. The vidconf file is several "talking heads" each fixed in one location on a given video. So, for a given file, my "A" will always be inside coordinates 0,0 to 320,480 (as an example)
My question is about overall workflow. Does faceswap have features to crop (intentionally not using the word mask, it has a different meaning in this project) out all but a fixed rectangle from source frames or video? Or should I pre-process it with something like ffmpeg?