okay this totally threw me...

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

okay this totally threw me...

Post by MaxHunter »

So the other day I got a NaN at around 900k. I've done everything to work around it. I've tried different optimizers (the theory was that all optimizers are trying to take you to the same point through different paths/algorithms, and with the bridge resonance example in my head, I figured, if the bridge is out look for another path around the obstacle - but it didn't work.) To turning off mixed precision and lowering anything possible. Still nothing worked. I even started a new model but got a NaN with in 1000its!! (How does a new model get a NaN in the first 1000??!!). I moved on and went to work on a different model, left it running over night, and woke up to it having gone backwards and all the faces gone!! No NaN detected, butt I think it collapsed. What's going on?! I haven't had any problems with any model (I have about ten I move back forth on - because I get bored easily 🤪). Has anyone else been having this problem? The only thing Ive changed is adding more faces (I went from having around 2000 examples, to shooting for 5000.). Could that be causing problems?

Last edited by MaxHunter on Mon Oct 17, 2022 11:42 pm, edited 2 times in total.
User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: okay this totally threw me...

Post by bryanlyon »

Check if your GPU is overclocked (factory overclocks too) and remove them. Often overclocking leads to slight instability that you wont notice in a video game but can be catastrophic to machine learning.

If that still doesn't fix it, try power cycling your device. Turn it off. Unplug it, and leave it unplugged for a few minutes. Your GPU might be stuck in a bad state and need a kick in the pants to get it back working.

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: okay this totally threw me...

Post by MaxHunter »

Thanks Bryan. Ive left the GPU, 3080ti, unclocked since working on machine learning (though, I don't know if my EVGA-hybrid is factory overclocked - being a hybrid it probably is - That's something I'd never thought of. 🤔) However, I haven't powered down in a few days. I'll try that, and check into under-clocking.😉

Locked