Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

swapration · Post by **swapration** » Wed Aug 18, 2021 12:56 am

Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Tried twice, training 384 px models. Got over 200k iterations both times until non stop NaNs kicked in. Setting epsilon to -5 or -4 doesn't help so I guess it was a mistake to start with an epsilon of -7. Also I noticed the loss rates where way too low for 200k.

Trying again, a 384 px Phaze-A with mixed precision, but starting with an epsilon of -5 and learning rate at the default 5e -5. Should I go with an epsilon of -4? And around how much of an adjustment should I be making for the learning rate?

Post by **torzdf** » Wed Aug 18, 2021 10:08 am

-5 should be ok for mixed precision. As for learning rate, whatever works. There are no hard and fast figures, I'm afraid. If 5e-5 doesn't work, try lowering to 4.5e-5, if that doesn't work, drop again. Etc. etc.

Also, make sure your faceswap is updated. I pushed an update at the beginning of august that should improve the NaN situation.

swapration · Post by **swapration** » Sun Aug 22, 2021 7:26 pm

Do you happen to remember what average loss you were getting at 100k and 200k iterations?

I started a new model that's at 200k iterations now, at an epsilon of -4. But while it seems much more stable - the loss troughs and peaks are no longer crazy far from the average - it also seems to have slowed down a fair bit.

To compare the -4 epsilon model was averaging ₀.03 at 100k and ₀.024 at 200k.
I think for the -7 epsilon model it was averaging ₀.02 at 100k and something like ₀.019 at 200k with insanely low troughs at around ₀.012, which would explain why it was so NaN happy.

Do you think the model will go NaN crazy if I move the epsilon from -4 to -5 at this point, should be _200k+ iterations, maybe 300k (depending on when you see this)?

Post by **torzdf** » Mon Aug 23, 2021 8:42 am

swapration wrote: ↑Sun Aug 22, 2021 7:26 pm
Do you happen to remember what average loss you were getting at 100k and 200k iterations?

I'm afraid I don't, no.

I started a new model that's at 200k iterations now, at an epsilon of -4. But while it seems much more stable - the loss troughs and peaks are no longer crazy far from the average - it also seems to have slowed down a fair bit.

Raising epsilon will slow down training. Sadly it's the price you pay for stability.

Do you think the model will go NaN crazy if I move the epsilon from -4 to -5 at this point, should be _200k+ iterations, maybe 300k (depending on when you see this)?

Honestly don't know. You can but try.

Faceswap Forum

Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?

Re: Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?