Page 1 of 1

Global encoders?

Posted: Fri May 19, 2023 5:56 pm
by Ssggaa

I'm learning about deep fakes and I went through the original guide (viewtopic.php?t=146).

This doc says "When we train our model, we are feeding it 2 sets of faces". If two faces share the same encoder, is it, in theory, possible to build a global encoder (given enough training data)?

That way, one only has to train a single decoder, using the globally trained encoder.


Re: Global encoders?

Posted: Fri May 19, 2023 5:59 pm
by bryanlyon

In theory this would be possible, in practice, only sort-of. The problem is that the encoder will eventually need to focus on the specific faces that you're training to get the best results. That said, we've enabled the ability to copy encoders and even have pre-trained encoders available with Phaze-A that do follow this idea and allow you to start with a pretrained model and either freeze it or allow it to train on the new faces. EfficientNet is particularly useful at shortcutting the early training and can be left frozen for a good while into the initial training but it does require being unfrozen at some point to learn the new faces properly.


Re: Global encoders?

Posted: Fri May 19, 2023 6:18 pm
by Ssggaa

Thank you Bryanlyon for such a fast response!

This is more of a science question - The part I'm having trouble understanding is that the encoder's role would be to create some kind of lossy intermediate format for decoders to then construct an image from. Is the in-practice issue that the conversion from the input to the lossy format is not trivial enough to globalize for any face?

Let's say I had unlimited resources and I could get 10k face datasets. I use a single encoder with 10k decoders (for each face) to influence it's weights. Could that work for a global encoder?


Re: Global encoders?

Posted: Fri May 19, 2023 6:25 pm
by bryanlyon

There are datasets with over a million faces that have been used to train the newest model architectures to encode faces. They've been very good at starting training, but have never been able to beat a 1:1 trained encoder.

Is it theoretically possible for a generic encoder? Almost definitely. Has it been done? Not yet.

We have ALWAYS been able to get a good bump in quality by allowing the encoder to train on the individual faces we're looking at. It's just better able to encode the details of those two faces if we give it the room to ignore other possible faces.

The big advantage the pretrained encoders give is shortcutting the early training time. Once the decoder gets trained to catch up with the encoder though, we always recommend unfreezing the encoder so it can get better and the result can get the best it can. Is it possible that you might decide "this is good enough" and just keep the encoder frozen? Definitely. That's why we leave the setting of freezing the encoder up to the user. They're free to decide if the results meet their standards and can decide to stop training or alter settings at any time.