One of the biggest misconceptions I think the community has is wanting higher quality results, but not using
higher quality data, or getting impatient and not training their models to completion. Instead they just search for a better model. Sometimes people who are unhappy with their results will take on ambitious projects using large heavy models with unreasonable large data sets, and the end result is that they have no hope of ever completing because of how slow these models will be. I know I still do this sometimes, and I assume you probably do as well. So in order to examine this issue in a different light, I thought I'd show that the "bad" models can give amazing results, and what better way than using original model trainer, and try to do a great deepfake. Hopefully the take away is that while yes, "good" models can produce better results than "bad" models, "bad" models can still produce good results!
For best results Keep it in the small window, but make sure to turn on 1080p
UPDATE VIDEO I was encouraged in the comments below to release the model and I decided to do a little more training on it before releasing. Here is the updated results. It's hard to notice the difference unless you overlay the old converted video over the new one, but it is better. Once again for best appearance, make the video 1080p but don't maximize the video
I understand why people have a hard time with the patience aspect, and not getting there models to maximum training,
It takes such a long time. And depending on your set up, faceswap may take up so many resources
that you won't be able to do most other things on your computer while it's training. It's almost like going on a computer fast. And one of the biggest reasons that discourages me from doing this is the possibility of wasting my time. "What If I let it train for several weeks and it hits max training, but even though it can't train any more detail...it still looks off.... perhaps because their facial shapes are slightly different and it looks weird" or perhaps "I knew there hair color was different when I started but I didn't think it would make that big of a deal, but it clearly does, and this feels off". It's these thoughts of wasting my time while doing this training computer fast that discourages me from having patience in the training process and I assume it probably affects you too. I was thinking about ways I could combat this thinking in myself when I had an idea that might help others out.
So I had an idea.
If you want to practice, You can practice on my exact deepfake.
I was reading user JansenSensei post in the general discussion asking for a "an online database full of datasets of celebrities known to give good results "basically eliminating the part of the faceswap where you search for data. The idea of it being if you wanted to do a Trump Faceswap for your next project, you could just go to a website and download a "trump pack" and you would be good to go. Sounds like a great idea, but the mods informed them why it wouldn't work. Essentially, it's because you want your data set to specifically correspond with your other data set for the highest quality possible swap. But you know what would work? Giving people an "A" and "B" data set, and knowing (with a very good estimate)at what the end product "Converted Video C" should look like. And it occurred to me. This could be an excellent learning experience for those interested in practicing! Someone might want to try and make a faceswap for the first time, but even though they've memorized the guides by heart, they don't know where to start, what's too ambitious for them, and what results they should expect.
Initially the idea of someone making the exact same faceswap as one already made seems pointless, But It can properly teach the lesson of high quality data and extensive training being the key ingredients to a quality product because it eliminates the fear that the training will be wasted because the end result will some how end up bad. How does it eliminate this fear you ask? Well if you are going to recreate my swap as practice, you've already seen my final result so you know your time won't be wasted. You know exactly what you are going to get. So basically I'm going to give you both my data sets (in video form), and show you what it's capable of producing if you have the patience. So you know your time won't be wasted.
**Before you start trying it on your own though, you should know there's always some variability in the way the model learns. So even though the end destination of maximum training is always the same, the model
will always take different routes to get there, every single time.
If you want a youtube video downloader, my favorite is simply typing the two letters "pp" into the youtube url right after the word "tube" and before ".com" on the specific video you want, like this "https://www.youtubePP.com/watch?v=iEY07Ut..." and it will take you to a download page for that exact video.
Here are the exact Settings I used:
Extractions were 256 (I'm still using the older faceswap. Newer faceswap's default of 512 shouldn't affect anything)
Learning Rate 5.1e-5
Loss function: Mae
Mask Loss function: Mae
Penalized mask loss
Eye and mouth multipler: Initially on defaults of 3 and 2, but started to mess around with after 50 hours, never going over a multiplier of 20
mask type extended
The amount of training I did was 160 million examples given
The amount of training time was about 115 hours.\
Gaussian Sharp of 240, radius .3, threshold 5.0
Writer crf of 14 with very fast preset
My average eg/s was around 400. It trained for 115 hours, and almost 160 million Examples Given
Unless your computer is the same as mine, it won't train at the same speed. Just because you trained for the same amount of hours as me doesn't mean you trained the exact amount. Also You may or may not be able to even reach my batch size. My results shown in the beginning of this post are at 160 million examples given. To count the amount of examples given you trained, I went to the analysis tab and looked at the total time trained and convert how many hours and minutes the model has been training into seconds, then times how many seconds of training with the average examples given per second. Your results should look like mine when you've hit 160 million examples given of training.
-I know I didn't use the highest quality data possible. But it was decently good and I had patience and I still got amazing results. I feel like this only proves the point more that if you want high quality results your best options are high quality data and patience. Because look what I got with decent data and patience.
-I didn't train to to max training, but I felt it was close enough to make the point that original can still produce great results with out giving it another 200 hours of training.
-The video was intentionally cut to exclude out all the times the camera zoomed in Melissa and Margot, it naturally gets worse the bigger the face is. Nothing you can do about that except control for what type of video you choose to convert.
-I think this project has lead me to believe that While you can technically faceswap anyone, unless you are personally recording the video of the person in HD, with amazing lighting, reacting at many different angles; your data/ product will probably be low quality. Realistically there's like 3 scenarios for high quality data/products. Volunteers you record in a profession manner like I just mentioned, Movie and TV stars, and youtubers.
- In the very beginning I stated that ". Sometimes people who are unhappy with their results will take on ambitious projects using large heavy models with unreasonable large data sets that will be hopeless endless" My source was me. Like... as of this very second. I'm attempting a villain model and I've already invested 150 hours into it so I cant quit now. Some people will never learn.
If you found this write up or that potential practice session useful, leave a like It makes me Smile