Training / Extraction randomly freezes up my PC

Talk about Hardware used for Deep Learning


Locked
User avatar
realracer334
Posts: 5
Joined: Mon Jun 08, 2020 8:54 am
Has thanked: 2 times

Training / Extraction randomly freezes up my PC

Post by realracer334 »

Hello,

So i have a RTX2080 Ti with 11 GB. I turn everything off while training in order to maximize my available VRAM. Also i have 1 TB of free space. However, when i train with any of the available models on minimal parameters and sometimes with optimizer savings or memory saving gradients, it runs smoothly for a few minutes and then the preview window freezes and goes to "Not Responding". And i have to hard reboot my PC, no other choice, no error messages, nothing. I can't even force quit it. I tried multiple solutions but not hing seems to work. I even tried older versions of faceswap gui and tried using command prompt, both have similar results : run smoothly for a while then completely freeze up and crash / blue screen my PC.

I used default facesets and default parameters, only changing the batch size (i reduced it on every try, from 8 to 1). I even tried putting a VRAM limit in main.py, and disabled quick edit and insert mode of the command prompt. Nothing helped

I can't seem to find anyone with my issue, does any body have any idea how to fix this please? The thing is, both used to work on this PC like a year ago, and other deep learning algorithms ran smoothly on my pc lately. But how come now they don't work and neither do the "older" versions.

Thanks for anyone who can help me. If i should post more informations, just let me know ! (I am on windows 10 by the way)

by deephomage » Mon Jun 08, 2020 4:37 pm

Run a stress-test program on your PC (e.g. Aida64 or Prime95) and see if you get freezing or bluescreens. Training a model stresses your entire system, your CPU and GPU need to be 100% stable. You may have a failing video card or faulty RAM. Another common cause of these training failures is an inadequate power supply. Use an online power supply calculator to make sure that your power supply can handle the heavy demands of training a model. if you bought a bargain-basement power supply, consider replacing it with a reputable brand and a higher wattage model.

Go to full post
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: Training / Extraction randomly freezes up my PC

Post by torzdf »

This, unfortunately, sounds like a hardware/power issue.

In the first instance, make sure all GPU overclocks are disabled.

Beyond that, it will be next to impossible to troubleshoot hardware issues remotely.

My word is final

User avatar
realracer334
Posts: 5
Joined: Mon Jun 08, 2020 8:54 am
Has thanked: 2 times

Re: Training / Extraction randomly freezes up my PC

Post by realracer334 »

torzdf wrote: Mon Jun 08, 2020 10:39 am

This, unfortunately, sounds like a hardware/power issue.

In the first instance, make sure all GPU overclocks are disabled.

Beyond that, it will be next to impossible to troubleshoot hardware issues remotely.

Hmmm seems weird that it suddenly starting doing this. But thank you for your answer, i will check if it is overclocking

User avatar
realracer334
Posts: 5
Joined: Mon Jun 08, 2020 8:54 am
Has thanked: 2 times

Re: Training / Extraction randomly freezes up my PC

Post by realracer334 »

torzdf wrote: Mon Jun 08, 2020 10:39 am

This, unfortunately, sounds like a hardware/power issue.

In the first instance, make sure all GPU overclocks are disabled.

Beyond that, it will be next to impossible to troubleshoot hardware issues remotely.

No overclocking was found. But i guess you are correct, it doesn't seem to be faulty algorithm or whatever. I wonder if there are any tips i can try to try and debug this. If not, thank you very much for your answer !

User avatar
deephomage
Posts: 33
Joined: Fri Jul 12, 2019 6:09 pm
Answers: 1
Has thanked: 2 times
Been thanked: 8 times

Re: Training / Extraction randomly freezes up my PC

Post by deephomage »

Run a stress-test program on your PC (e.g. Aida64 or Prime95) and see if you get freezing or bluescreens. Training a model stresses your entire system, your CPU and GPU need to be 100% stable. You may have a failing video card or faulty RAM. Another common cause of these training failures is an inadequate power supply. Use an online power supply calculator to make sure that your power supply can handle the heavy demands of training a model. if you bought a bargain-basement power supply, consider replacing it with a reputable brand and a higher wattage model.

User avatar
realracer334
Posts: 5
Joined: Mon Jun 08, 2020 8:54 am
Has thanked: 2 times

Re: Training / Extraction randomly freezes up my PC

Post by realracer334 »

deephomage wrote: Mon Jun 08, 2020 4:37 pm

Run a stress-test program on your PC (e.g. Aida64 or Prime95) and see if you get freezing or bluescreens. Training a model stresses your entire system, your CPU and GPU need to be 100% stable. You may have a failing video card or faulty RAM. Another common cause of these training failures is an inadequate power supply. Use an online power supply calculator to make sure that your power supply can handle the heavy demands of training a model. if you bought a bargain-basement power supply, consider replacing it with a reputable brand and a higher wattage model.

Thanks for the advice. Yes, after the previous answer, i did some digging and i found out that the power supply can be the issue. I will look into it and try to replace it, thank you very much

User avatar
realracer334
Posts: 5
Joined: Mon Jun 08, 2020 8:54 am
Has thanked: 2 times

Re: Training / Extraction randomly freezes up my PC

Post by realracer334 »

deephomage wrote: Mon Jun 08, 2020 4:37 pm

Run a stress-test program on your PC (e.g. Aida64 or Prime95) and see if you get freezing or bluescreens. Training a model stresses your entire system, your CPU and GPU need to be 100% stable. You may have a failing video card or faulty RAM. Another common cause of these training failures is an inadequate power supply. Use an online power supply calculator to make sure that your power supply can handle the heavy demands of training a model. if you bought a bargain-basement power supply, consider replacing it with a reputable brand and a higher wattage model.

Thank you for your answer, you were indeed right ! I just changed my PSU today and everything seems to be working just fine for now. No lags, stable MHz and temperature and no crashes. I just swapped my old PSU (corsair RM750x) for a corsair RM1000x (because i found one for the same price as a RM750x) and I underclocked by 150, kept the temperature limit at 84 C and closed every program).

I can say that even heavy models like SAE run smoothly (For now) with a batch 8.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: Training / Extraction randomly freezes up my PC

Post by torzdf »

Locking as a happy resolution has been found.

My word is final

Locked