Cannot get SYM384 to behave. Using the stock preset. Everything else standard. 3080Ti on a strong machine with tons of RAM. I either end up with bright solid yellow and magenta blocks or solid white blocks. Have no clue where to start to adjust this to make it work.
SYM384 Model Preset yielding solid color blocks after a few thousand iterations
Read the FAQs and search the forum before posting a new topic.
This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.
Please mark any answers that fixed your problems so others can find the solutions.
Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations
Honestly, I'm not sure what to suggest for this. I have heard of one other person having this issue, but it seems spotty at best.
Maybe try enabling/disabling conv-aware/icnr inititialization and restarting the model? Would be interested to hear if this makes any difference.
My word is final
Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations
I encountered the same problem as martinf, 9 times out of 10 I had yellow and white squares when starting a new Sym384 model...
Enabling conv-aware solved the problem !
Sym384 starts without any problem. it's good to know !
May I ask a question though : Sym384 generates very nice results very quickly (a few hours when others get there in several days) but NAN errors also appear much faster !
Would you have any suggestions ? lowering the learning rate or changing mixed precision : crashes immediately the model...
Thank you torzdf for all you do to develop faceswap and respond to the users of this great application.
Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations
Sadly not, no. mixed precision related NaNs are a constant bug bear of mine and I am trying to find solutions, but have not got any yet.
My word is final
Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations
I also ran into this issue on SYM384 and other similarly high parameter-to-resolution-ratio (256-384px w/ 200k+ params) models
After messing around a bunch, here are my observations on starting these models and avoiding Solid Color of Death (SCOD)
General params:
enabling/disabling icnr init has no detectable effect
enabling/disabling conv aware init can go either way - some model/config combinations start better with it on and others with it off. haven't seen a specific pattern
mixed precision is much more likely than FP to fail (no NaNs, just SCOD), regardless of epsilon exponent. some models couldn't start on MP regardless of what I tried
learning rate is important but loss config has more impact: 1e-5 w/ SSIM can SCOD where 3e-5 w/ MAE + 0.5SSIM descends
Loss config:
(MS_)SSIM is great for descent but too unstable at initialization; it SCODs often when starting out a model
MAE generally adds stability: MAE + 0.5SSIM loss descends where SSIM + 0.5MAE SCODs
smooth_loss is even more stable than MAE but a bit too slow to start with, prefer MAE unless it's too unstable in the main loss slot (i.e. at 100%)
with more relative weight these functions stabilize but therefore also slow descent,
so gradually decrease the proportion of the stabilizing function as the output improves, e.g.
MAE -> MAE + 0.25SSIM -> MAE + 0.5SSIM -> SSIM + MAE -> SSIM + 0.5MAE ... etc
don't change too quickly or the model might explode.
if you see it start to SCOD it's likely too late even if you stop and restart with last loss config (will train fine for a bit then NaNs); better to restore from a previous snapshot and train longer at that weight
Other factors:
diverse datasets (esp. with 'difficult' data) seem more likely to fail at the start than more homogeneous ones
e.g. model fails to start on 2 datasets of photos + video frames but descends when video faces (blur/obstruction/crop/pose outliers) are removed from both datasets
so if dataset has many different kinds of outliers, take them out at first and let the model find its footing on 'easier' examplesincreasing fc dropout on a model with large fc relative to other parts seems helpful to start
Hope this helps!