Code: Select all
09/02/2021 23:06:57 INFO Log level set to: INFO
09/02/2021 23:06:58 INFO Model A Directory: 'H:\DOCS\PROJECTS\FACESWAP_PY\proj1\FaceA' (1230 images)
09/02/2021 23:06:58 INFO Model B Directory: 'H:\DOCS\PROJECTS\FACESWAP_PY\proj1\FaceB' (4744 images)
09/02/2021 23:06:58 INFO Training data directory: H:\DOCS\PROJECTS\FACESWAP_PY\proj1\ModelAB
09/02/2021 23:06:58 INFO ===================================================
09/02/2021 23:06:58 INFO Starting
09/02/2021 23:06:58 INFO Press 'Stop' to save and quit
09/02/2021 23:06:58 INFO ===================================================
09/02/2021 23:06:59 INFO Loading data, this may take a while...
09/02/2021 23:06:59 INFO Loading Model from Original plugin...
09/02/2021 23:06:59 INFO Using configuration saved in state file
09/02/2021 23:07:00 INFO Loaded model from disk: 'H:\DOCS\PROJECTS\FACESWAP_PY\proj1\ModelAB\original.h5'
09/02/2021 23:07:00 INFO Loading Trainer from Original plugin...
09/02/2021 23:07:12 INFO [Saved models] - Average loss since last save: face_a: 0.02728, face_b: 0.02731
09/02/2021 23:09:07 INFO [Saved models] - Average loss since last save: face_a: 0.03140, face_b: 0.02689
09/02/2021 23:11:01 INFO [Saved models] - Average loss since last save: face_a: 0.03149, face_b: 0.02689
09/02/2021 23:12:56 INFO [Saved models] - Average loss since last save: face_a: 0.03108, face_b: 0.02679
09/02/2021 23:14:02 INFO Saved snapshot (25000 iterations)
09/02/2021 23:14:51 INFO [Saved models] - Average loss since last save: face_a: 0.03093, face_b: 0.02667
09/02/2021 23:16:46 INFO [Saved models] - Average loss since last save: face_a: 0.03093, face_b: 0.02696
09/02/2021 23:17:38 INFO Saved project to: 'H:/DOCS/PROJECTS/FACESWAP_PY/proj1/facesw1.fsw'
09/02/2021 23:18:40 INFO [Saved models] - Average loss since last save: face_a: 0.03105, face_b: 0.02693
09/02/2021 23:20:34 INFO [Saved models] - Average loss since last save: face_a: 0.03096, face_b: 0.02650
09/02/2021 23:22:29 INFO [Saved models] - Average loss since last save: face_a: 0.03088, face_b: 0.02661
09/02/2021 23:24:23 INFO [Saved models] - Average loss since last save: face_a: 0.03057, face_b: 0.02655
09/02/2021 23:26:18 INFO [Saved models] - Average loss since last save: face_a: 0.03062, face_b: 0.02630
2021-09-02 23:27:46.354831: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2021-09-02 23:27:46.355028: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1
Process exited.
Hi
I've had this error pop up every two minutes to an hour during training sessions, which promptly ends them. It did happen once when extracting was almost finished as well. None of the methods I've come across in my research for dealing with this kind of issue seems to make any difference.
I am running this on Windows 10 installed to an SSD. My GPU is an MSI Geforce GTX 1050 Ti. It has 4 GB of VRAM. Yes, I bought it used. Aside from this, it has not given me any problems. Here are all the methods for fixing this that I've tried so far. I've been troubleshooting this for six hours, so I may be leaving a few out:
Underclock GPU
Overclock GPU
Reduce GPU clock to remove the factory overclock present on some models of my GPU
Underclock CPU (line of thinking was that it may have been a PSU issue; my PSU is only 96w more than required for all of my components and I've been overclocking my processor)
Use lowmem mode for the trainer I'm using, original (made the error happen quicker)
Installed Faceswap.py to OS and non-OS drives, sitting on different areas of the OS drive (ie. in the default file path, in C:\ and in root)
Left the PC completely alone and did not touch it after beginning the training process (seems to help, but gives the error after about an hour)
"OP, your problem sounds a lot like this guy's. Learn to use the search bar lol" Yes, nothing I've tried from there has helped and as far as I'm aware, this person never solved their problem either. The only thing I haven't been able to try yet is upgrading my PSU, which I plan to do somewhat soon anyways, but I strongly suspect I will run into the same problems.
I am running out of ideas. A year ago, I ran this same tool off of an Intel laptop CPU for up to 15 hours at a time, and never once got any kind of error. The only thing I've yet to try is some of the lighter-weight trainers, but I read that the method they use can create significantly lower quality face swaps, and at that point, why bother? Is 4GB of VRAM really not enough for the default method? And if it comes to it, would it be a bad idea to change the trainer I'm using in the middle of face training? I'm already 26k frames in.