Global encoders?

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
Ssggaa
Posts: 2
Joined: Fri May 19, 2023 5:45 pm
Has thanked: 2 times

Global encoders?

Post by Ssggaa »

I'm learning about deep fakes and I went through the original guide (viewtopic.php?t=146).

This doc says "When we train our model, we are feeding it 2 sets of faces". If two faces share the same encoder, is it, in theory, possible to build a global encoder (given enough training data)?

That way, one only has to train a single decoder, using the globally trained encoder.

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Global encoders?

Post by bryanlyon »

In theory this would be possible, in practice, only sort-of. The problem is that the encoder will eventually need to focus on the specific faces that you're training to get the best results. That said, we've enabled the ability to copy encoders and even have pre-trained encoders available with Phaze-A that do follow this idea and allow you to start with a pretrained model and either freeze it or allow it to train on the new faces. EfficientNet is particularly useful at shortcutting the early training and can be left frozen for a good while into the initial training but it does require being unfrozen at some point to learn the new faces properly.

User avatar
Ssggaa
Posts: 2
Joined: Fri May 19, 2023 5:45 pm
Has thanked: 2 times

Re: Global encoders?

Post by Ssggaa »

Thank you Bryanlyon for such a fast response!

This is more of a science question - The part I'm having trouble understanding is that the encoder's role would be to create some kind of lossy intermediate format for decoders to then construct an image from. Is the in-practice issue that the conversion from the input to the lossy format is not trivial enough to globalize for any face?

Let's say I had unlimited resources and I could get 10k face datasets. I use a single encoder with 10k decoders (for each face) to influence it's weights. Could that work for a global encoder?

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Global encoders?

Post by bryanlyon »

There are datasets with over a million faces that have been used to train the newest model architectures to encode faces. They've been very good at starting training, but have never been able to beat a 1:1 trained encoder.

Is it theoretically possible for a generic encoder? Almost definitely. Has it been done? Not yet.

We have ALWAYS been able to get a good bump in quality by allowing the encoder to train on the individual faces we're looking at. It's just better able to encode the details of those two faces if we give it the room to ignore other possible faces.

The big advantage the pretrained encoders give is shortcutting the early training time. Once the decoder gets trained to catch up with the encoder though, we always recommend unfreezing the encoder so it can get better and the result can get the best it can. Is it possible that you might decide "this is good enough" and just keep the encoder frozen? Definitely. That's why we leave the setting of freezing the encoder up to the user. They're free to decide if the results meet their standards and can decide to stop training or alter settings at any time.

Locked