tl;dr: What would the best model/settings be to just put a single face/expression on a video? I don't care about changing expressions... just rotate and stretch the B photo enough that the alignment landmarks are "close enough", and render it.
So, I was reading the "why you need more sources" thread, and thought to myself... that 8 photo one actually looks pretty good, if you don't care about fluid motion... like if you're just trying to put someone's face on someone else, and it's meant to be kind of a caricature, rather than a convincing fake.
... So I tried training some models with a single face... like, I have a full on "A" video, and the "B" video is just a 30+ frame video of a single photo (in your face, "your model has too few photos" check hehe).
Just to get quicker results, I trained these with lightweight BS=1 (only 1 photo, can't imagine batching does anything useful here) for 200K iterations.
It sort of works, but comes out surprisingly faint/blurry. I would have thought it'd converge to something sharper more quickly, given there are so few possible options. Anyone experimented with this?
Note: I know there are easier ways of doing something like this, like I could just cut out a PNG and use Kdenlive's "auto-motion-tracking mask" feature... but that's not nearly as detailed as something that does the motion tracking frame-by-frame, using the alignments/landmarks, rather than just x-y positioning an image...