What things contribute to the best result?

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
Xicor28
Posts: 4
Joined: Fri Sep 04, 2020 5:17 am

What things contribute to the best result?

Post by Xicor28 »

Suppose I want to create a star wars deepfake ,will lower resolution of video work better or it doesnt matter? how does it affect performance? Do lower resolution videos require more iterations to get satisfactory result? or again it doesnt matter. I have a 1050ti so Trial and Error is gonna take a lot of time ,so im asking here

thanks

User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: What things contribute to the best result?

Post by abigflea »

You can do it with lower quality.
With a 1050 I understand you'll need to be smart.

Broadcast tv was 450x480 , old movies can still be 1080P and higher but still lack that detail and sharpness due to changes in technology.

I have an example of Archie bunker to Trump and this old Dean Martin.
When I did this, I noticed it had trouble with facial alignments on A (Archie bunker).
I spent more time making sure the alignments and mask were correct for training and converting. The manual alignments tool really helped for this.

Training seemed to be 'good enough' in about the same time, maybe earlier. The blurriness in my examples seemed to help a bit. It didn't need to be so clear to match.

A 1050Ti should have 4GB so it should work. Not sure what Models will fit on 4GB reliably, but you have room to work with.
I would try Dlight and Lightweight. Maybe others?

2GB and I believe you'll need to use the 1.0 version of FaceSwap which is still great.
Likely only the Lightweight or maybe Original model architecture.

Good luck!

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
Xicor28
Posts: 4
Joined: Fri Sep 04, 2020 5:17 am

Re: What things contribute to the best result?

Post by Xicor28 »

İ actually tried doing 250k iterations on a 12 min video (On DeepFaceLab), everything looked ok except the mouth wasn't moving how it was supposed to.
The mouth was just shut
Any idea why's this happening? Does the same happen on faceswap? Can İ create deepfake using only few different images?

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: What things contribute to the best result?

Post by bryanlyon »

app.php/faqpage#f3r0

The importance is your data. Under no circumstances are you going to get good results with bad data.

User avatar
cosmico
Posts: 95
Joined: Sat Jan 18, 2020 6:32 pm
Has thanked: 13 times
Been thanked: 35 times

Re: What things contribute to the best result?

Post by cosmico »

-To reiterate what bryanlyon said but in more detail, Your data is the most important. If you are swapping trumps face onto Nicolas cage, Your trump data and your Nicolas cage data should come from well-lit, clear, 1080p videos or 4k if you can, and it should not come from low quality videos like what you might find on their instagram or facebook videos.

-In addition to that remember that we use video dimensions as way of saying data quality, but I think in reality what we mean is "face size". For example a 720p video of Trump and Cage that a super zoom closeup of his face where you can see each hair follicle will be better quality data than a 1080p video where trump or cage are really far away and there face is really tiny. It's just implied that larger dimensions allow you to have larger facesizes. basically how many pixels in the face.

-when it comes to the animation aspect, if you want your model to be able to do something, it needs to know how to do it. I once did a smaller data set model where my data for the original person was smiling in 90% of the files and the other 10% was emotionless. When I tried converting it onto a video of someone talking, there head moved around nicely but there mouth was essentially sealed until they showed a tiny bit of teeth, and when they did the model put the massive grin on it. The problem was because my data didn't tell it how to blink, how to move its eyes and eyebrows around, how to open its mouth to talk and not smile or only show a tiny smile, because my data couldn't teach the model those things, the model didn't know how to deal with it when it was converted onto a video that did all those things. The opposite end of the spectrum is possible to, If all your source material is of someone with their eyes closed and mouth open, the model won't know how to open its eyes or close its mouth This is likely the problem that you mentioned in your second post

-Another thing I found to be super important is the size of the face you will be converting it to vs what resolution model you are using. The more you have to stretch the "modeled face" to cover the actual face, the worse it looks. Original, lightweight, IAe are all 64x64 models and the best results will be if you convert a final product where the face is roughly 64x64 in size. That doesnt mean you can't convert a 1080 video, just that the best results will be if the face is really small in that 1080p video like they are really far away, rather than if it was a closeup. So basically when building a model, use the biggest high quality facesizes you can find, then when you convert it, convert onto a face thats closest to the size of models resolution.

Here a pic of the exact same small resolution 64x64 model converting onto two different sized faces. Both are bigger than 64x64, I think the left is like 150x150 sized face while the right is 380x380, but because the 64 model has to stretch less for the left, it ends up looking better
Image
Oh and if your next thought is I'll just do a higher resolution model then, remember that doubling the resolution size does not mean doubling the training time. its more than that. like exponentially/raising to the power of 2 amount of time.

User avatar
Xicor28
Posts: 4
Joined: Fri Sep 04, 2020 5:17 am

Re: What things contribute to the best result?

Post by Xicor28 »

Thank you very much for the response.
Also I want to ask is it necessary to train a new model everytime I change the destination video? I want the same person(from src) in different videos. If I retrain the src with new dst video ,it shows lot of loss. So i wanna what you recommend.

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: What things contribute to the best result?

Post by bryanlyon »

If it's the same people on both sides you can just do a fit train instead of a full train. See viewtopic.php?f=6&t=74 for information on how to do that.

Locked