Is there a soft rule for learning rate

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
MaxHunter
Posts: 194
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Is there a soft rule for learning rate

Post by MaxHunter »

I know there are very little "hard" rules in face swapping, but are there any soft rules for learning rates and epsilons?

Generally I've been training with Adam 3e-5, EE -4 (as suggested by Icarus in his Phase A post,) with the DNY512 model, with edits to fc depth ( 1), fc dimensions (4), Dec filter min 128, and output kernal (3), with stable batch loads of 3 (4 seems to OOM after 6-8 hrs.) I'm noticing some weird blocky discoloration anomalies at about 200k its, that I'm hoping will go away. I'm wondering if raising the learning rates will help with these (because as I understand it, the learning rates are related to detail) but I'm not sure I get the math or how it works to implement.

What are your soft rules for the learning rates? Any advice?

User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Is there a soft rule for learning rate

Post by torzdf »

Generally, a lower batch size needs a lower learning rate. However, whilst learning rate is probably the most important hyper-parameter to set, there isn't really any mechanism for optimizing beyond trial and error.

My word is final

User avatar
MaxHunter
Posts: 194
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: Is there a soft rule for learning rate

Post by MaxHunter »

With that being said...
If I wanted to experiment with raising the learning rates, how would you suggest going about it? Rising all three number at the same time? One number at the same time, and which one? Then test it for a 100k and raise another number? I'm asking because I'm not sure exactly how to efficiently go about it. 😉😁

User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Is there a soft rule for learning rate

Post by torzdf »

Learning rate (as with many things in ML) is too deep a subject to really be able to go into in a forum post, googling things like tuning learning rate will probably unearth more useful information than I can impart.

As for the specific values. Epsilon exponent should probably be left where it is unless the optimizer you use requires you to adjust it (as with AdaBelief) or you are using mixed precision, when you will probably want to lower it to stay within the numerical range of fp16.

My word is final

Locked