Page 1 of 1

Training stops after ~2000 iterations without crash or error

Posted: Wed Oct 13, 2021 12:49 pm
by kuklangaren

Hello,

I can't seem to find any similiar topic, but apologies in advance if there is one.

Recently I've been unable to do any training since it just stops after a while. No error message shows up and the program works just fine until I press stop training, which makes it stop responding.

When I first started out it would not stop training until I pressed stop, like it should, with batch size 16. I've tried reinstalling several times and I've tried using smaller batch sizes but it doesn't seem to make a difference. I only tried the original trainer when it worked, but now no trainer does.

My graphics card is GTX 1080 and I have 32 gb RAM, if that helps

Thanks!


Re: Training stops after ~2000 iterations without crash or error

Posted: Wed Oct 13, 2021 2:20 pm
by bryanlyon

I've never seen this before. Sometimes if there are power problems (Noisy, aging, or insufficient power supply) it can cause issues similar though. I'd see if restarting your Nvidia drivers do anything to the freeze ( Win+Ctrl+Shift+B ).

Unfortunately if there are no messages it's very hard to troubleshoot something like this.


Re: Training stops after ~2000 iterations without crash or error

Posted: Wed Oct 13, 2021 5:05 pm
by kuklangaren

I updated my drivers just after posting this and now it works again! It was most likely a problem with the drivers as you suggested.

Many thanks!