Page 1 of 1

unbalanced model best practice

Posted: Wed Sep 23, 2020 9:15 pm
by afterparty

Hi,

I'm looking for any experienced advice!

I'm setting up for training using the unbalanced model with the hope of getting the highest possible resolution for my face. The face swap is being used for a visual effects sequence in a movie where an actor is working alongside herself and I am swapping her face onto her double. I've tested with great results using the Realface model, but now I'm hoping to improve resolution.

If I train on a photoset that is 512 x 512 - what is the setting that I should be using for the input size (training) and for the encoder and decoder "complexity". What specifically does the encoder / decoder value relate to?

I'm training on 4 Tesla T4 GPUs on an AWS G4dn12 server.

Thanks for any insights!

David


Re: unbalanced model best practice

Posted: Thu Sep 24, 2020 10:42 pm
by bryanlyon

It's best to think of it in terms of compression/decompression. The Encoder "compresses" the face into an intermediate form and the Decoder re-creates the original. This is the basics of an autoencoder. In faceswap we use 2 decoders and by switching decoders we switch the output face.

Encoder dims will enable more data to be stored and be stored more intelligently.

Decoder dims will enable better re-creation from the encoder.

You actually need both to get good results. I'd suggest tweaking the decoder up a bit while leaving the encoder at default. Remember that increasing the resolution provides a non-linear time increase (double the resolution is 4x the time). But having 4 T4 GPUs does help with that.