Page 1 of 1
Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Posted: Wed Aug 18, 2021 12:56 am
by swapration
Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Tried twice, training 384 px models. Got over 200k iterations both times until non stop NaNs kicked in. Setting epsilon to -5 or -4 doesn't help so I guess it was a mistake to start with an epsilon of -7. Also I noticed the loss rates where way too low for 200k.
Trying again, a 384 px Phaze-A with mixed precision, but starting with an epsilon of -5 and learning rate at the default 5e -5. Should I go with an epsilon of -4? And around how much of an adjustment should I be making for the learning rate?
Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Posted: Wed Aug 18, 2021 10:08 am
by torzdf
-5 should be ok for mixed precision. As for learning rate, whatever works. There are no hard and fast figures, I'm afraid. If 5e-5 doesn't work, try lowering to 4.5e-5, if that doesn't work, drop again. Etc. etc.
Also, make sure your faceswap is updated. I pushed an update at the beginning of august that should improve the NaN situation.
Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Posted: Sun Aug 22, 2021 7:26 pm
by swapration
Do you happen to remember what average loss you were getting at 100k and 200k iterations?
I started a new model that's at 200k iterations now, at an epsilon of -4. But while it seems much more stable - the loss troughs and peaks are no longer crazy far from the average - it also seems to have slowed down a fair bit.
To compare the -4 epsilon model was averaging 0.03 at 100k and 0.024 at 200k.
I think for the -7 epsilon model it was averaging 0.02 at 100k and something like 0.019 at 200k with insanely low troughs at around 0.012, which would explain why it was so NaN happy.
Do you think the model will go NaN crazy if I move the epsilon from -4 to -5 at this point, should be 200k+ iterations, maybe 300k (depending on when you see this)?
Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Posted: Mon Aug 23, 2021 8:42 am
by torzdf
swapration wrote: ↑Sun Aug 22, 2021 7:26 pm
Do you happen to remember what average loss you were getting at 100k and 200k iterations?
I'm afraid I don't, no.
I started a new model that's at 200k iterations now, at an epsilon of -4. But while it seems much more stable - the loss troughs and peaks are no longer crazy far from the average - it also seems to have slowed down a fair bit.
Raising epsilon will slow down training. Sadly it's the price you pay for stability.
Do you think the model will go NaN crazy if I move the epsilon from -4 to -5 at this point, should be 200k+ iterations, maybe 300k (depending on when you see this)?
Honestly don't know. You can but try.