Model Types?

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
smonaghan
Posts: 4
Joined: Wed Sep 18, 2019 3:10 pm

Model Types?

Post by smonaghan »

I'm new to this. Is there anyone that can explain the relative advantages and disadvantages of the model types? What are the preferences on this forum?

I tried original first and that gave pretty nice results, but the faces looked a bit waxy. So now I'm trying villain, and it's taking WAY longer. Can I expect better results?

I don't have more than a laptop at home so I'm using Google Colab which has one GPU and 12GB of vram.

Thanks in advance!

by torzdf » Sun Sep 22, 2019 12:42 pm

Bigger models do take longer to train. Villain takes about a week on a GTX 1080

This is a repost from our Discord Server from @kvrooman. It is from April 2019, so misses some of the newer models, but has some useful information:

kvrooman wrote:
  1. (Dfaker == Villian) > Unbalanced > (DFL-H128 == Original ) > Lightweight in my personal opinion... IAE is its own special animal. Of course, there are some tweaks that make each of these models switch places and they have their own pros/cons.

    • Villain is likely the most detailed model but VRAM intensive and can give sub-par color matching

    • Dfaker is a great fire and forget model to get high quality results

    • Unbalanced is powerful and has a lot of ways to customize and improve the model but requires a bit more expertise and know-how to get results better than Villain / Dfaker

    • DFL-H128 actually uses the exact same encoder and decoder as Original but then uses a 128x128 input instead of 64x64 and then tries to compress the image into a representation of the face half as large as Original. The smaller 'latent space' has some downsides in quality vs. Original that negate the larger input size

    • Original. The model that started it all. Still can provide great results and useful to understand how your dataset quality is really one of the biggest drivers of swap quality

    • Lightweight A shrunken model based on Original designed to run on very small GPUs. It'll run and give one an idea of the process, but results will be sub-par in comparision to other models

  2. You have one thousand sixty GPUs, right?:stuck_out_tongue_winking_eye: NO, jk. you have a single GPU but we allow specialist code for people with multiple graphics cards installed in their computer

  3. timelapse is just a nifty way of seeing the progress on a specifc set of images. you specify the folder for a and b, and the program will look in those folders and do a preview for those image sets every save iteration
    Image

    • masking refers to the face area in an image. we can identify the region of the picture and sub-select it to focus learning on it only, or later in convert to do color adjustments only on the masked area. There are some pre-selected mask algorithms or you can have the NN learn its own ( I'm not a huge fan of the learned masks, personally )

    • warp to landmarks is an alternative way of training the model. The model introduces distortions into the image in order to make the NN work hard to put the image back together. One method randomly stretchs and compresses the image at selected grid points, and the other landmark method stretchs and compresses the image to align closer with the facial landmark keypoints of a similiar face

  4. if you stop training for any reason, ( Ctrl-C, hit enter on the preview window, or yank the power cord out of the computer ), there will be a saved model file in the location you specify. If you restart training and point to that same model folder ( and use the same model selection ), training will automatically pick up where it left off seamlessly

Go to full post
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: Model Types?

Post by torzdf »

Bigger models do take longer to train. Villain takes about a week on a GTX 1080

This is a repost from our Discord Server from @kvrooman. It is from April 2019, so misses some of the newer models, but has some useful information:

kvrooman wrote:
  1. (Dfaker == Villian) > Unbalanced > (DFL-H128 == Original ) > Lightweight in my personal opinion... IAE is its own special animal. Of course, there are some tweaks that make each of these models switch places and they have their own pros/cons.

    • Villain is likely the most detailed model but VRAM intensive and can give sub-par color matching

    • Dfaker is a great fire and forget model to get high quality results

    • Unbalanced is powerful and has a lot of ways to customize and improve the model but requires a bit more expertise and know-how to get results better than Villain / Dfaker

    • DFL-H128 actually uses the exact same encoder and decoder as Original but then uses a 128x128 input instead of 64x64 and then tries to compress the image into a representation of the face half as large as Original. The smaller 'latent space' has some downsides in quality vs. Original that negate the larger input size

    • Original. The model that started it all. Still can provide great results and useful to understand how your dataset quality is really one of the biggest drivers of swap quality

    • Lightweight A shrunken model based on Original designed to run on very small GPUs. It'll run and give one an idea of the process, but results will be sub-par in comparision to other models

  2. You have one thousand sixty GPUs, right?:stuck_out_tongue_winking_eye: NO, jk. you have a single GPU but we allow specialist code for people with multiple graphics cards installed in their computer

  3. timelapse is just a nifty way of seeing the progress on a specifc set of images. you specify the folder for a and b, and the program will look in those folders and do a preview for those image sets every save iteration
    Image

    • masking refers to the face area in an image. we can identify the region of the picture and sub-select it to focus learning on it only, or later in convert to do color adjustments only on the masked area. There are some pre-selected mask algorithms or you can have the NN learn its own ( I'm not a huge fan of the learned masks, personally )

    • warp to landmarks is an alternative way of training the model. The model introduces distortions into the image in order to make the NN work hard to put the image back together. One method randomly stretchs and compresses the image at selected grid points, and the other landmark method stretchs and compresses the image to align closer with the facial landmark keypoints of a similiar face

  4. if you stop training for any reason, ( Ctrl-C, hit enter on the preview window, or yank the power cord out of the computer ), there will be a saved model file in the location you specify. If you restart training and point to that same model folder ( and use the same model selection ), training will automatically pick up where it left off seamlessly

My word is final

User avatar
smonaghan
Posts: 4
Joined: Wed Sep 18, 2019 3:10 pm

Re: Model Types?

Post by smonaghan »

Thanks this is super helpful!

Locked