Around what learning rate and epsilon should it be set for 384 px Phaze-A models using mixed precision?
Tried twice, training 384 px models. Got over 200k iterations both times until non stop NaNs kicked in. Setting epsilon to -5 or -4 doesn't help so I guess it was a mistake to start with an epsilon of -7. Also I noticed the loss rates where way too low for 200k.
Trying again, a 384 px Phaze-A with mixed precision, but starting with an epsilon of -5 and learning rate at the default 5e -5. Should I go with an epsilon of -4? And around how much of an adjustment should I be making for the learning rate?