Page 1 of 1

Is there a soft rule for learning rate

Posted: Sun Oct 09, 2022 5:12 pm
by MaxHunter

I know there are very little "hard" rules in face swapping, but are there any soft rules for learning rates and epsilons?

Generally I've been training with Adam 3e-5, EE -4 (as suggested by Icarus in his Phase A post,) with the DNY512 model, with edits to fc depth ( 1), fc dimensions (4), Dec filter min 128, and output kernal (3), with stable batch loads of 3 (4 seems to OOM after 6-8 hrs.) I'm noticing some weird blocky discoloration anomalies at about 200k its, that I'm hoping will go away. I'm wondering if raising the learning rates will help with these (because as I understand it, the learning rates are related to detail) but I'm not sure I get the math or how it works to implement.

What are your soft rules for the learning rates? Any advice?


Re: Is there a soft rule for learning rate

Posted: Mon Oct 10, 2022 12:10 pm
by torzdf

Generally, a lower batch size needs a lower learning rate. However, whilst learning rate is probably the most important hyper-parameter to set, there isn't really any mechanism for optimizing beyond trial and error.


Re: Is there a soft rule for learning rate

Posted: Mon Oct 10, 2022 8:48 pm
by MaxHunter

With that being said...
If I wanted to experiment with raising the learning rates, how would you suggest going about it? Rising all three number at the same time? One number at the same time, and which one? Then test it for a 100k and raise another number? I'm asking because I'm not sure exactly how to efficiently go about it. 😉😁


Re: Is there a soft rule for learning rate

Posted: Tue Oct 11, 2022 11:58 am
by torzdf

Learning rate (as with many things in ML) is too deep a subject to really be able to go into in a forum post, googling things like tuning learning rate will probably unearth more useful information than I can impart.

As for the specific values. Epsilon exponent should probably be left where it is unless the optimizer you use requires you to adjust it (as with AdaBelief) or you are using mixed precision, when you will probably want to lower it to stay within the numerical range of fp16.