Gradient clipping is a mechanism to help prevent exploding/vanishing gradients (that is numbers that go to +/- infinity or to 0). Both of these will cause a model to NaN (Mixed Precision is more prone to this, as infinity in limited precision space is a smaller number than infinity in full precision space... This doesn't sound like it makes sense, but think of infinity as any number that cannot be represented by a certain numerical precision).

There are several methods to clip gradients. You can clip-max (i.e. clip all numbers at 1.0) or you can clip gradients to an adjusted norm. Most ML libraries expect to give you a number to clip the normalization, but it really is data dependant. Auto-clip is a mechanism for scanning the normal distribution of gradients, and auto-adjust the clipping value by what it sees in the data.

This probably still doesn't make a whole lot of sense, but it's the best that I can explain it for now. It's basically adaptive, rather than expecting me/the user to come up with an arbitrary number ahead of time.

I may add the other clipping mechanisms into Faceswap, just because it's an easy add, but I would expect autoclip to work better.