ianstephens wrote: ↑Sat Aug 28, 2021 6:10 pm
Interested to know if the Adabelief optimizer can be used with this model (Phase-A StoJo 256).
We've set EE to -16 but get complete model collapses after 10k iterations.
We've also disabled mixed precision to see if we can fix the issue further into training as described above. Of course, we can't use batch sizes as high as we have been but are willing to give it a try if it enables us to achieve a stable model.
Honestly, I have not tested AdaBelief at all. It is a straight port from an official implementation.
ianstephens wrote: ↑Sat Aug 28, 2021 6:13 pm
One last thought to [mention]torzdf[/mention] - with your StoJo training - did you have Icnr Init and Conv aware init enabled? On the corrupting model, we have both enabled. Do you think it would make any difference either way?
Yes, I always have these enabled. No, they won't be causing your issues.
ianstephens wrote: ↑Sat Aug 28, 2021 9:39 pm
A final question to [mention]torzdf[/mention] - do you still recommend setting the EE to -5 for StoJo even with the latest updates to the preset? This could be where we are going wrong (-7).
You may be able to get away with -7 now. It's not something I have tested. All I know is that I trained a model with latest fixes to 2m iterations with zero issues after latest fix with epsilon at -5. When I next do a train (not for a while, I would imagine), I will try at -7.
As for why this is happening for you, at this point, I am at a total loss, as it's not something I've seen before, nor seen reported, which now pushes me towards data (maybe HDR footage?) or hardware. Unfortunately these are now out of my wheel house. Pinging [mention]bryanlyon[/mention] in case he has any ideas.