Loss Spikes in Middle of Training

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
Hollywood
Posts: 18
Joined: Thu Jun 11, 2020 2:52 pm
Has thanked: 2 times

Loss Spikes in Middle of Training

Post by Hollywood »

I trained my model overnight and the previews were getting better but when I woke up the previews were black and red and when I look at the graph the loss had spiked somewhere in the middle. Is there a way to revert back to that spot in the training so I don't have to do it all again? Thanks!

by torzdf » Fri Jun 12, 2020 8:20 am

[mention]pfakanator[/mention]'s thorough answer unfortunately skipped the first thing you should try.

Use the "restore" tool to restore your model from backup.

Backups are taken every time the loss drops in a model, so most models are recoverable to some extent.

Go to full post
User avatar
pfakanator
Posts: 30
Joined: Thu Jul 18, 2019 5:02 pm
Answers: 1
Has thanked: 3 times
Been thanked: 12 times

Re: Loss Spikes in Middle of Training

Post by pfakanator »

Your model has collapsed. You have 3 options you can perform in this order:

  1. Lower your learning rate and continue training. It's possible to recover your model this way. You'll find it in the gui in Settings -> Configure Train Plugins -> Global tab. The default value is 5e-5. Lower it to 4e-5 and resume training. Let it run for several thousand iterations, keep an eye on previews and cross your fingers.

  2. Restore your model from a snapshot. By default, snapshots of your model are taken every 25000 iterations and are located in the same directory as your model. With your current collapsed model loaded, look at the loss graph in the gui and identify the iteration number where the spikes started, you'll want a snapshot prior to that. Rename your current model folder to model.bak and the snapshot folder prior to the identified pre-loss iteration, to model. You can now resume training. You may want to apply solution 1 before hand to avoid a future collapse.

  3. Train a new model ;( .

Good luck. Let us know how it goes.

User avatar
Hollywood
Posts: 18
Joined: Thu Jun 11, 2020 2:52 pm
Has thanked: 2 times

Re: Loss Spikes in Middle of Training

Post by Hollywood »

Thanks for the help, unfortunately it hadn't run for 25,000 iterations yet when it crashed. I'll try to decrease the learning rate and see if that works.

User avatar
torzdf
Posts: 2672
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: Loss Spikes in Middle of Training

Post by torzdf »

[mention]pfakanator[/mention]'s thorough answer unfortunately skipped the first thing you should try.

Use the "restore" tool to restore your model from backup.

Backups are taken every time the loss drops in a model, so most models are recoverable to some extent.

My word is final

Locked