[Discussion] Notes on Loss functions

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Post Reply
User avatar
Icarus
Posts: 8
Joined: Mon Aug 15, 2022 9:18 pm
Has thanked: 10 times
Been thanked: 8 times

[Discussion] Notes on Loss functions

Post by Icarus »

Loss functions: :
As it says in the Training Guide, the choice you make here will have an outsized impact on your entire model. I've tried all and a combination of MS_SSIM and MAE (L1) at 100% have produced the best results. The weird quirk with MS_SSIM is whenever I've tried to start a model using it, my model crashes (which I honestly can't explain. So I usually start with SSIM then swap it out for MS_SSIM after 1k iterations. I also add a 3rd loss function, ffl at either 25% or 50% and I think it has made a positive impact. I've tried the lpips as tertiary losses and it completely ruined everything with the moire pattern described in the settings. I get that in theory using one of those as a supplimentary loss is supposed to help but I have no idea of how much weight to give it.

Copied from my previous (larger) Phaze A post.


Tags:
User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Notes on Loss functions

Post by torzdf »

Thanks for this. I think a discussion on Loss Functions is well worthwhile.

I have had some success with the following combination:

  • SSIM - main function
  • MAE - 25% - secondary L1 Reg Term
  • LPIPS-Alex 50% -. This function sharpens up the swap more than any other function I've seen. On its own, it is a total disaster zone though!
  • FFL 100% - How much this helps/does not help I could not say at this stage, but I feel it helps.

These are just numbers I've had success with. Testing loss functions (along with all the other variables involved) does not make it a realistic endeavour for one person to undertake, so take these numbers with a grain of salt, rather than the be-all and end-all

Would be very interested to know other people's findings too.

Some screen grabs from very early in training on a custom 384px Phaze-A model trained at BS 6 with this mix shows the model learning in a very different way from more traditional functions, looking thoroughly cursed:

early train.jpg
early train.jpg (131.06 KiB) Viewed 28355 times

By 30k, it looked less cursed, but still weird:

30k.jpg
30k.jpg (174.73 KiB) Viewed 28355 times

By 50k, it shows promise. Something interesting I found is that glasses are totally ignored with this mix (bisenet-fp obstructed weights) whilst I would get shadows with more trad. loss functions. Resembling more of an oil painting at this point:

50k.jpg
50k.jpg (91.47 KiB) Viewed 28355 times

By about 150k or so I was beginning to get insane eye/mouth detail:

eyes.jpg
eyes.jpg (8.17 KiB) Viewed 28355 times
mouth.jpg
mouth.jpg (2.87 KiB) Viewed 28355 times

By about 260k I was getting eyelashes:

2601.jpg
2601.jpg (55.83 KiB) Viewed 28355 times
2602.jpg
2602.jpg (56.18 KiB) Viewed 28355 times

At this point I had to stop my experiment though.

My word is final

User avatar
Icarus
Posts: 8
Joined: Mon Aug 15, 2022 9:18 pm
Has thanked: 10 times
Been thanked: 8 times

Re: Notes on Loss functions

Post by Icarus »

torzdf wrote: Thu Aug 18, 2022 11:00 pm

LPIPS-Alex 5% - This loss function outputs strong numbers, so it needs to be very low. How low will depend on what you are mixing it with. This function sharpens up the swap more than any other function I've seen. On its own, it is a total disaster zone though!
FFL 100% - How much this helps/does not help I could not say at this stage, but I feel it helps.

That's awesome, I'm going to try LPIPS at 5-10%. I had it at 25% which was also a total disaster. I love how FFL mysteriously just feels right for some reason but no one can't quite put their finger on why. :D

Curious what your thoughts are on why you chose to lower your L1 to 25% and give more weight to FFL. Also do you think LPIPS-Alex gives better results than LPIPS-VGG16?

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: [Discussion] Notes on Loss functions

Post by torzdf »

These were values I "lucked out" on. I was doing a lot of testing of loss functions before pushing the loss update (I wanted to be able to give some guidance prior to adding more options). A lot of this was training loss functions on their own to see what impact they had.

It was proving time consuming and slow, so I just plugged in some numbers I guesstimated based on my observations, and these are what I came up with. I was as surprised as anyone to see them start to get results.

MAE was set at 25% because I know it has a tendency towards the average, and therefore has a tendency towards blurrier results. Whether lowering it to 25% was necessary, or achieves anything, I could not tell you. I just didn't want it to flood the other loss functions I selected.

FFL posts weak numbers, so I boosted it to 100% (you can get an idea of the kinds of impact it will have by just training it on its own for a few thousand iters and looking at the loss values against, say, MSE/MAE on its own).

lpips-alex was chosen purely for vram reasons. I could not train the model with vgg-16. All things being equal I would expect vgg-16 to be better. However, you can glean more from the paper:
https://arxiv.org/abs/1801.03924

My word is final

User avatar
tochan
Posts: 21
Joined: Sun Sep 22, 2019 8:17 am
Been thanked: 5 times

Re: Notes on Loss functions

Post by tochan »

torzdf wrote: Thu Aug 18, 2022 11:00 pm

Thanks for this. I think a discussion on Loss Functions is well worthwhile.

I have had some success with the following combination:

Many thanks for this information.
I train my disney model (512) 964.000 times in default settings and i miss the details (eyelashes etc.). with your example, there details come now very fast ;)

User avatar
martinf
Posts: 27
Joined: Thu Sep 29, 2022 7:58 pm
Been thanked: 3 times

Re: [Discussion] Notes on Loss functions

Post by martinf »

Anyone start to see 'screen door' artifacts outside of the mask area or are you guys training without a mask?

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: [Discussion] Notes on Loss functions

Post by bryanlyon »

Things outside the mask are just noise. Don't worry about them at all.

User avatar
martinf
Posts: 27
Joined: Thu Sep 29, 2022 7:58 pm
Been thanked: 3 times

Re: [Discussion] Notes on Loss functions

Post by martinf »

I would agree with you, but any occasion to expand the mask a tad and it is a very noticeable thing. I'll mess with it a bit. For sure this ends up in a trained model in which the lpips_alex is added into the training midway.

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 176 times
Been thanked: 13 times

Re: [Discussion] Notes on Loss functions

Post by MaxHunter »

You know, I've heard a lot about these loss functions posted, but not a peep out of all the others. Does anyone have any experience with the other loss functions?

I've been having a heck of a time with NaN warnings when using MS SSIM based on this discussion, but recently switched after 600+it's to SSIM, and my loss dropped and my NaNs seem to as well, but now I'm concerned about losing detail.

Thoughts about the adding detail with the other functions not listed here? What about using SSIM with a weaker secondary or 3rd loss function of MS-SSIM, or is that being redundant?

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: [Discussion] Notes on Loss functions

Post by torzdf »

I would say that ms-ssim and ssim used together would probably be redundant.

I am (in a future update) going to lower the strength of the lpips output to allow more fine-grained control (as it is not possible to go below 1 for the weight), but need to think how best to do this

My word is final

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 176 times
Been thanked: 13 times

Re: [Discussion] Notes on Loss functions

Post by MaxHunter »

Does it matter where the loss function is placed? For instance, does having VGG-16 in the 2nd loss function give it slightly more weight, than if it were in the third or fourth position?

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: [Discussion] Notes on Loss functions

Post by torzdf »

No. The values are just summed for the final loss figure.

My word is final

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 176 times
Been thanked: 13 times

lpips Update

Post by MaxHunter »

I noticed the lpips strength has been lowered by ten. I'm already training my model with VGG-16 at 5% (your suggested weight,) should I raise it to 15, or 50% to maintain the strength?

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: lpips Update

Post by torzdf »

5% would become 50%

It has been updated to give more fine-grained control, as some users were finding that the old "1%" was not always low enough.

I'll update my post(s) with latest figures

My word is final

Post Reply