SYM384 Model Preset yielding solid color blocks after a few thousand iterations

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
martinf
Posts: 27
Joined: Thu Sep 29, 2022 7:58 pm
Been thanked: 3 times

SYM384 Model Preset yielding solid color blocks after a few thousand iterations

Post by martinf »

Cannot get SYM384 to behave. Using the stock preset. Everything else standard. 3080Ti on a strong machine with tons of RAM. I either end up with bright solid yellow and magenta blocks or solid white blocks. Have no clue where to start to adjust this to make it work.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations

Post by torzdf »

Honestly, I'm not sure what to suggest for this. I have heard of one other person having this issue, but it seems spotty at best.

Maybe try enabling/disabling conv-aware/icnr inititialization and restarting the model? Would be interested to hear if this makes any difference.

My word is final

User avatar
Barnuble
Posts: 12
Joined: Tue Jan 19, 2021 2:42 pm
Been thanked: 1 time

Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations

Post by Barnuble »

I encountered the same problem as martinf, 9 times out of 10 I had yellow and white squares when starting a new Sym384 model...
Enabling conv-aware solved the problem !
Sym384 starts without any problem. it's good to know !

May I ask a question though : Sym384 generates very nice results very quickly (a few hours when others get there in several days) but NAN errors also appear much faster !
Would you have any suggestions ? lowering the learning rate or changing mixed precision : crashes immediately the model...

Thank you torzdf for all you do to develop faceswap and respond to the users of this great application.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations

Post by torzdf »

Sadly not, no. mixed precision related NaNs are a constant bug bear of mine and I am trying to find solutions, but have not got any yet.

My word is final

User avatar
couleurs
Posts: 9
Joined: Fri Jan 13, 2023 3:09 am
Has thanked: 10 times
Been thanked: 6 times

Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations

Post by couleurs »

I also ran into this issue on SYM384 and other similarly high parameter-to-resolution-ratio (256-384px w/ 200k+ params) models
After messing around a bunch, here are my observations on starting these models and avoiding Solid Color of Death (SCOD) ;)

General params:

  • enabling/disabling icnr init has no detectable effect

  • enabling/disabling conv aware init can go either way - some model/config combinations start better with it on and others with it off. haven't seen a specific pattern

  • mixed precision is much more likely than FP to fail (no NaNs, just SCOD), regardless of epsilon exponent. some models couldn't start on MP regardless of what I tried

  • learning rate is important but loss config has more impact: 1e-5 w/ SSIM can SCOD where 3e-5 w/ MAE + 0.5SSIM descends

Loss config:

  • (MS_)SSIM is great for descent but too unstable at initialization; it SCODs often when starting out a model

  • MAE generally adds stability: MAE + 0.5SSIM loss descends where SSIM + 0.5MAE SCODs

  • smooth_loss is even more stable than MAE but a bit too slow to start with, prefer MAE unless it's too unstable in the main loss slot (i.e. at 100%)

  • with more relative weight these functions stabilize but therefore also slow descent,
    so gradually decrease the proportion of the stabilizing function as the output improves, e.g.
    MAE -> MAE + 0.25SSIM -> MAE + 0.5SSIM -> SSIM + MAE -> SSIM + 0.5MAE ... etc
    don't change too quickly or the model might explode.
    if you see it start to SCOD it's likely too late even if you stop and restart with last loss config (will train fine for a bit then NaNs); better to restore from a previous snapshot and train longer at that weight

Other factors:

  • diverse datasets (esp. with 'difficult' data) seem more likely to fail at the start than more homogeneous ones
    e.g. model fails to start on 2 datasets of photos + video frames but descends when video faces (blur/obstruction/crop/pose outliers) are removed from both datasets
    so if dataset has many different kinds of outliers, take them out at first and let the model find its footing on 'easier' examples

  • increasing fc dropout on a model with large fc relative to other parts seems helpful to start

Hope this helps!

Last edited by couleurs on Fri Jan 13, 2023 6:10 am, edited 1 time in total.
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: SYM384 Model Preset yielding solid color blocks after a few thousand iterations

Post by torzdf »

@couleurs This is a super useful post. Many thanks!

My word is final

Locked