Normally, deepfakes can do:
Input: One video of me saying something stupid.
Training data: Many video of celebrity X.
Output: Celebrity X saying something stupid. (with my voice)
But my goal is now:
Input A: One video of me saying something stupid.
Input B: One video of someone sitting on a cliff.
Output: Video of me saying something stupid and sitting on a cliff.
So it's face swap, without transforming facial expression and stuff. Just the input face appears normally in other's body.
The tool I need is more like the snapchat face swap filter. But is there anything more sophisticated?