Strange Red Area/Artifact Creeping into Model

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

User avatar
ianstephens
Posts: 117
Joined: Sun Feb 14, 2021 7:20 pm
Has thanked: 12 times
Been thanked: 15 times

Re: Strange Red Area/Artifact Creeping into Model

Post by ianstephens »

Interested to know if the Adabelief optimizer can be used with this model (Phase-A StoJo 256).

We've set EE to -16 but get complete model collapses after 10k iterations.

We've also disabled mixed precision to see if we can fix the issue further into training as described above. Of course, we can't use batch sizes as high as we have been but are willing to give it a try if it enables us to achieve a stable model.

User avatar
ianstephens
Posts: 117
Joined: Sun Feb 14, 2021 7:20 pm
Has thanked: 12 times
Been thanked: 15 times

Re: Strange Red Area/Artifact Creeping into Model

Post by ianstephens »

One last thought to [mention]torzdf[/mention] - with your StoJo training - did you have Icnr Init and Conv aware init enabled? On the corrupting model, we have both enabled. Do you think it would make any difference either way?

User avatar
ianstephens
Posts: 117
Joined: Sun Feb 14, 2021 7:20 pm
Has thanked: 12 times
Been thanked: 15 times

Re: Strange Red Area/Artifact Creeping into Model

Post by ianstephens »

One final thought (as I've exhausted everything else)...

We've upped our power limit on the 3090 to 400w instead of 350w:

eg...

Code: Select all

nvidia-smi -i 0 -pl 400

We thought the extra power may give the GPU a little more oxygen.

We've not touched clock/memory speeds or anything like that.

User avatar
ianstephens
Posts: 117
Joined: Sun Feb 14, 2021 7:20 pm
Has thanked: 12 times
Been thanked: 15 times

Re: Strange Red Area/Artifact Creeping into Model

Post by ianstephens »

A final question to [mention]torzdf[/mention] - do you still recommend setting the EE to -5 for StoJo even with the latest updates to the preset? This could be where we are going wrong (-7).

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Strange Red Area/Artifact Creeping into Model

Post by torzdf »

ianstephens wrote: Sat Aug 28, 2021 6:10 pm

Interested to know if the Adabelief optimizer can be used with this model (Phase-A StoJo 256).

We've set EE to -16 but get complete model collapses after 10k iterations.

We've also disabled mixed precision to see if we can fix the issue further into training as described above. Of course, we can't use batch sizes as high as we have been but are willing to give it a try if it enables us to achieve a stable model.

Honestly, I have not tested AdaBelief at all. It is a straight port from an official implementation.

ianstephens wrote: Sat Aug 28, 2021 6:13 pm

One last thought to [mention]torzdf[/mention] - with your StoJo training - did you have Icnr Init and Conv aware init enabled? On the corrupting model, we have both enabled. Do you think it would make any difference either way?

Yes, I always have these enabled. No, they won't be causing your issues.

ianstephens wrote: Sat Aug 28, 2021 9:39 pm

A final question to [mention]torzdf[/mention] - do you still recommend setting the EE to -5 for StoJo even with the latest updates to the preset? This could be where we are going wrong (-7).

You may be able to get away with -7 now. It's not something I have tested. All I know is that I trained a model with latest fixes to 2m iterations with zero issues after latest fix with epsilon at -5. When I next do a train (not for a while, I would imagine), I will try at -7.

As for why this is happening for you, at this point, I am at a total loss, as it's not something I've seen before, nor seen reported, which now pushes me towards data (maybe HDR footage?) or hardware. Unfortunately these are now out of my wheel house. Pinging [mention]bryanlyon[/mention] in case he has any ideas.

My word is final

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Strange Red Area/Artifact Creeping into Model

Post by bryanlyon »

Generally red areas coming into the image is a collapse of the model, where the data it's getting is not predictable over a long term.

HDR is the main culprit like Torzdf stated. We simply can't use HDR data reliably.

Hardware can also be a cause, where for some reason the model starts collapsing.

In your case I can't say for sure either way which is at fault. It'd be best to post more information (and maybe some screenshots) for us to be able to evaluate.

User avatar
ianstephens
Posts: 117
Joined: Sun Feb 14, 2021 7:20 pm
Has thanked: 12 times
Been thanked: 15 times

Re: Strange Red Area/Artifact Creeping into Model

Post by ianstephens »

Thank you for the response [mention]bryanlyon[/mention].

I can confirm that we are most definitely not using HDR footage on either side (A or B).

Update: We have successfully run training with StoJo with over 400k iterations and counting on a new dataset (both A/B) with no issues. The only change we made was setting the EE to -5.

It's possible the issue is with that particular dataset (not sure why) but that particular dataset has a lot of black/dark backgrounds behind the faces on the A side.

We've dumped that entire model/training so can't grab any screenshots from it but if you look at the start of this post you can see the red examples when we were training with it.

Let me know if there is any other data I can provide to help diagnose - what information you need about our hardware and anything else.

Thank you :)

Locked