[Discussion] Notes on Loss functions

Icarus · Post by **Icarus** » Thu Aug 18, 2022 6:30 pm

Loss functions: :
As it says in the Training Guide, the choice you make here will have an outsized impact on your entire model. I've tried all and a combination of MS_SSIM and MAE (L1) at 100% have produced the best results. The weird quirk with MS_SSIM is whenever I've tried to start a model using it, my model crashes (which I honestly can't explain. So I usually start with SSIM then swap it out for MS_SSIM after 1k iterations. I also add a 3rd loss function, ffl at either 25% or 50% and I think it has made a positive impact. I've tried the lpips as tertiary losses and it completely ruined everything with the moire pattern described in the settings. I get that in theory using one of those as a supplimentary loss is supposed to help but I have no idea of how much weight to give it.

Copied from my previous (larger) Phaze A post.

Post by **torzdf** » Thu Aug 18, 2022 11:00 pm

Thanks for this. I think a discussion on Loss Functions is well worthwhile.

I have had some success with the following combination:

SSIM - main function
MAE - 25% - secondary L1 Reg Term
LPIPS-Alex 50% -. This function sharpens up the swap more than any other function I've seen. On its own, it is a total disaster zone though!
FFL 100% - How much this helps/does not help I could not say at this stage, but I feel it helps.

These are just numbers I've had success with. Testing loss functions (along with all the other variables involved) does not make it a realistic endeavour for one person to undertake, so take these numbers with a grain of salt, rather than the be-all and end-all

Would be very interested to know other people's findings too.

Some screen grabs from very early in training on a custom 384px Phaze-A model trained at BS 6 with this mix shows the model learning in a very different way from more traditional functions, looking thoroughly cursed:

: early train.jpg (131.06 KiB) Viewed 60083 times

By 30k, it looked less cursed, but still weird:

: 30k.jpg (174.73 KiB) Viewed 60083 times

By 50k, it shows promise. Something interesting I found is that glasses are totally ignored with this mix (bisenet-fp obstructed weights) whilst I would get shadows with more trad. loss functions. Resembling more of an oil painting at this point:

: 50k.jpg (91.47 KiB) Viewed 60083 times

By about 150k or so I was beginning to get insane eye/mouth detail:

: eyes.jpg (8.17 KiB) Viewed 60083 times

: mouth.jpg (2.87 KiB) Viewed 60083 times

By about 260k I was getting eyelashes:

: 2601.jpg (55.83 KiB) Viewed 60083 times

: 2602.jpg (56.18 KiB) Viewed 60083 times

At this point I had to stop my experiment though.

Icarus · Post by **Icarus** » Sat Aug 27, 2022 12:23 am

torzdf wrote: ↑Thu Aug 18, 2022 11:00 pm
LPIPS-Alex 5% - This loss function outputs strong numbers, so it needs to be very low. How low will depend on what you are mixing it with. This function sharpens up the swap more than any other function I've seen. On its own, it is a total disaster zone though!
FFL 100% - How much this helps/does not help I could not say at this stage, but I feel it helps.

That's awesome, I'm going to try LPIPS at 5-10%. I had it at 25% which was also a total disaster. I love how FFL mysteriously just feels right for some reason but no one can't quite put their finger on why.

Curious what your thoughts are on why you chose to lower your L1 to 25% and give more weight to FFL. Also do you think LPIPS-Alex gives better results than LPIPS-VGG16?

Post by **torzdf** » Sat Aug 27, 2022 6:22 am

These were values I "lucked out" on. I was doing a lot of testing of loss functions before pushing the loss update (I wanted to be able to give some guidance prior to adding more options). A lot of this was training loss functions on their own to see what impact they had.

It was proving time consuming and slow, so I just plugged in some numbers I guesstimated based on my observations, and these are what I came up with. I was as surprised as anyone to see them start to get results.

MAE was set at 25% because I know it has a tendency towards the average, and therefore has a tendency towards blurrier results. Whether lowering it to 25% was necessary, or achieves anything, I could not tell you. I just didn't want it to flood the other loss functions I selected.

FFL posts weak numbers, so I boosted it to 100% (you can get an idea of the kinds of impact it will have by just training it on its own for a few thousand iters and looking at the loss values against, say, MSE/MAE on its own).

lpips-alex was chosen purely for vram reasons. I could not train the model with vgg-16. All things being equal I would expect vgg-16 to be better. However, you can glean more from the paper:
https://arxiv.org/abs/1801.03924

tochan · Post by **tochan** » Sun Sep 04, 2022 8:52 am

torzdf wrote: ↑Thu Aug 18, 2022 11:00 pm
Thanks for this. I think a discussion on Loss Functions is well worthwhile.

I have had some success with the following combination:

Many thanks for this information.
I train my disney model (512) 964.000 times in default settings and i miss the details (eyelashes etc.). with your example, there details come now very fast

martinf · Post by **martinf** » Thu Oct 13, 2022 6:00 pm

Anyone start to see 'screen door' artifacts outside of the mask area or are you guys training without a mask?

Post by **bryanlyon** » Thu Oct 13, 2022 6:13 pm

Things outside the mask are just noise. Don't worry about them at all.

martinf · Post by **martinf** » Thu Oct 13, 2022 10:32 pm

I would agree with you, but any occasion to expand the mask a tad and it is a very noticeable thing. I'll mess with it a bit. For sure this ends up in a trained model in which the lpips_alex is added into the training midway.

MaxHunter · Post by **MaxHunter** » Sun Nov 06, 2022 4:30 am

You know, I've heard a lot about these loss functions posted, but not a peep out of all the others. Does anyone have any experience with the other loss functions?

I've been having a heck of a time with NaN warnings when using MS SSIM based on this discussion, but recently switched after 600+it's to SSIM, and my loss dropped and my NaNs seem to as well, but now I'm concerned about losing detail.

Thoughts about the adding detail with the other functions not listed here? What about using SSIM with a weaker secondary or 3rd loss function of MS-SSIM, or is that being redundant?

Post by **torzdf** » Sun Nov 06, 2022 12:32 pm

I would say that ms-ssim and ssim used together would probably be redundant.

I am (in a future update) going to lower the strength of the lpips output to allow more fine-grained control (as it is not possible to go below 1 for the weight), but need to think how best to do this

MaxHunter · Post by **MaxHunter** » Mon Nov 07, 2022 5:16 am

Does it matter where the loss function is placed? For instance, does having VGG-16 in the 2nd loss function give it slightly more weight, than if it were in the third or fourth position?

Post by **torzdf** » Wed Nov 09, 2022 11:38 am

No. The values are just summed for the final loss figure.

MaxHunter · Post by **MaxHunter** » Mon Nov 14, 2022 9:11 pm

I noticed the lpips strength has been lowered by ten. I'm already training my model with VGG-16 at 5% (your suggested weight,) should I raise it to 15, or 50% to maintain the strength?

Post by **torzdf** » Tue Nov 15, 2022 9:54 am

5% would become 50%

It has been updated to give more fine-grained control, as some users were finding that the old "1%" was not always low enough.

I'll update my post(s) with latest figures

Faceswap Forum

[Discussion] Notes on Loss functions

[Discussion] Notes on Loss functions

Re: Notes on Loss functions

Re: Notes on Loss functions

Re: [Discussion] Notes on Loss functions

Re: Notes on Loss functions

Re: [Discussion] Notes on Loss functions

Re: [Discussion] Notes on Loss functions

Re: [Discussion] Notes on Loss functions

Re: [Discussion] Notes on Loss functions

Re: [Discussion] Notes on Loss functions

Re: [Discussion] Notes on Loss functions

Re: [Discussion] Notes on Loss functions

lpips Update

Re: lpips Update