Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
swapration
Posts: 23
Joined: Thu Sep 10, 2020 1:21 am

Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Post by swapration »

Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Tried twice, training 384 px models. Got over 200k iterations both times until non stop NaNs kicked in. Setting epsilon to -5 or -4 doesn't help so I guess it was a mistake to start with an epsilon of -7. Also I noticed the loss rates where way too low for 200k.

Trying again, a 384 px Phaze-A with mixed precision, but starting with an epsilon of -5 and learning rate at the default 5e -5. Should I go with an epsilon of -4? And around how much of an adjustment should I be making for the learning rate?

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Post by torzdf »

-5 should be ok for mixed precision. As for learning rate, whatever works. There are no hard and fast figures, I'm afraid. If 5e-5 doesn't work, try lowering to 4.5e-5, if that doesn't work, drop again. Etc. etc.

Also, make sure your faceswap is updated. I pushed an update at the beginning of august that should improve the NaN situation.

My word is final

User avatar
swapration
Posts: 23
Joined: Thu Sep 10, 2020 1:21 am

Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Post by swapration »

Do you happen to remember what average loss you were getting at 100k and 200k iterations?

I started a new model that's at 200k iterations now, at an epsilon of -4. But while it seems much more stable - the loss troughs and peaks are no longer crazy far from the average - it also seems to have slowed down a fair bit.

To compare the -4 epsilon model was averaging 0.03 at 100k and 0.024 at 200k.
I think for the -7 epsilon model it was averaging 0.02 at 100k and something like 0.019 at 200k with insanely low troughs at around 0.012, which would explain why it was so NaN happy.

Do you think the model will go NaN crazy if I move the epsilon from -4 to -5 at this point, should be 200k+ iterations, maybe 300k (depending on when you see this)?

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Post by torzdf »

swapration wrote: Sun Aug 22, 2021 7:26 pm

Do you happen to remember what average loss you were getting at 100k and 200k iterations?

I'm afraid I don't, no.

I started a new model that's at 200k iterations now, at an epsilon of -4. But while it seems much more stable - the loss troughs and peaks are no longer crazy far from the average - it also seems to have slowed down a fair bit.

Raising epsilon will slow down training. Sadly it's the price you pay for stability.

Do you think the model will go NaN crazy if I move the epsilon from -4 to -5 at this point, should be 200k+ iterations, maybe 300k (depending on when you see this)?

Honestly don't know. You can but try.

My word is final

Locked