AutoClip: Any Feedback From Users?

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
ianstephens
Posts: 117
Joined: Sun Feb 14, 2021 7:20 pm
Has thanked: 12 times
Been thanked: 15 times

AutoClip: Any Feedback From Users?

Post by ianstephens »

Just playing about with the recently introduced AutoClip feature.

How much effect would this have on NaNs? Strong? Or is it still worth tapering down the EE with mixed-precision training?

Does anyone have any feedback on this feature so far with very complex models while using mixed precision?

I might also add - does anybody notice it affecting training accuracy?

Thank you for any feedback!

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: AutoClip: Any Feedback From Users?

Post by torzdf »

My experience has been that it has not been the NaN saviour that I had hoped it would be, so my battle to combat NaNs continues :(

I have seen no negative impact on accuracy, though my sample set is small.

This is a link to original paper if anyone wishes to investigate the claims themselves:

https://arxiv.org/abs/2007.14469

My word is final

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: AutoClip: Any Feedback From Users?

Post by torzdf »

FWIW I have been doing a lot of debugging around NaNs. They do happen in ML models, but I wanted to get a better understanding of why they occur.

This has been a time-consuming and laborious process. However, my investigations have led me to conclude that the main issue is within Keras/TF implementation of Batch Normalization. This appears to be a known issue in shared layers (which is where our BatchNorm exists... in the shared encoder). But the issue appears to have been closed with no action taken:

https://github.com/keras-team/keras/issues/11927

I am now running more tests to confirm this is the issue.

The main challenge I now face is how to mitigate this, as this is an embedded layer in many of the Keras Encoders this will be non-trivial, if it is even possible at all.

I will update with any further findings.

My word is final

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: AutoClip: Any Feedback From Users?

Post by torzdf »

Just to follow up, and I no longer think this is an issue we are facing. I ran some tests, and could see no noticeable difference between shared and unique layers.

I have, however read this from Nvidia's guide on Mixed Precision:

While many networks match FP32 training results when all tensors are stored in FP16, some require updating an FP32 copy of weights. Furthermore, values computed by large reductions should be left in FP32. Examples of this include statistics (mean and variance) computed by batch-normalization, SoftMax.

So I am currently investigating forcing BN layers to fp32. I, however, do not think that this will solve the issue, as in my recent tests on a NaN model, the NaNs were getting introduced in the Decoder. This was occuring during the forward pass, which AutoClip would not resolve.

It may just be that I need to accept that bigger models need lower learning rates, especially when Mixed Precision is used.

My word is final

User avatar
ianstephens
Posts: 117
Joined: Sun Feb 14, 2021 7:20 pm
Has thanked: 12 times
Been thanked: 15 times

Re: AutoClip: Any Feedback From Users?

Post by ianstephens »

Using mixed precision, I am finding that in some tests even with Learning Rate super low (3e-05 or less) on large models, the only way to mitigate NaN is to set the EE to -4 or less. Learning rate, even super low does not seem to fix NaN but setting the EE at a low number seems to help and mitigate the issue - albeit at a less than perfect model and a lot slower to train.

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: AutoClip: Any Feedback From Users?

Post by MaxHunter »

I've tried looking this up and scanned the "paper," but in layman's terms, what does autoclip do, and how is it applicable?

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: AutoClip: Any Feedback From Users?

Post by torzdf »

Gradient clipping is a mechanism to help prevent exploding/vanishing gradients (that is numbers that go to +/- infinity or to 0). Both of these will cause a model to NaN (Mixed Precision is more prone to this, as infinity in limited precision space is a smaller number than infinity in full precision space... This doesn't sound like it makes sense, but think of infinity as any number that cannot be represented by a certain numerical precision).

There are several methods to clip gradients. You can clip-max (i.e. clip all numbers at 1.0) or you can clip gradients to an adjusted norm. Most ML libraries expect to give you a number to clip the normalization, but it really is data dependant. Auto-clip is a mechanism for scanning the normal distribution of gradients, and auto-adjust the clipping value by what it sees in the data.

This probably still doesn't make a whole lot of sense, but it's the best that I can explain it for now. It's basically adaptive, rather than expecting me/the user to come up with an arbitrary number ahead of time.

I may add the other clipping mechanisms into Faceswap, just because it's an easy add, but I would expect autoclip to work better.

My word is final

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: AutoClip: Any Feedback From Users?

Post by MaxHunter »

Thanks, man. This is how we learn! 😁

Locked