I set my input faces to begin training and after an all-day session, it just errored out.
Here's the log. It's something to do with my GPU it seems, but not quite sure how to troubleshoot. Any help would be much appreciated.
Code: Select all
09/01/2019 20:25:12 MainProcess training_0 multithreading __init__ DEBUG Initializing MultiThread: (target: 'save_encoder', thread_count: 1)
09/01/2019 20:25:12 MainProcess training_0 multithreading __init__ DEBUG Initialized MultiThread: 'save_encoder'
09/01/2019 20:25:12 MainProcess training_0 multithreading __init__ DEBUG Initializing MultiThread: (target: 'save_state', thread_count: 1)
09/01/2019 20:25:12 MainProcess training_0 multithreading __init__ DEBUG Initialized MultiThread: 'save_state'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread(s): 'save_decoder_a'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread 1 of 1: 'save_decoder_a_0'
09/01/2019 20:25:12 MainProcess save_decoder_a_0 _base save DEBUG Saving model: 'A:\Videos\Deepfakes\Training Model Dir\original_decoder_A.h5'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Started all threads 'save_decoder_a': 1
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread(s): 'save_decoder_b'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread 1 of 1: 'save_decoder_b_0'
09/01/2019 20:25:12 MainProcess save_decoder_b_0 _base save DEBUG Saving model: 'A:\Videos\Deepfakes\Training Model Dir\original_decoder_B.h5'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Started all threads 'save_decoder_b': 1
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread(s): 'save_encoder'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread 1 of 1: 'save_encoder_0'
09/01/2019 20:25:12 MainProcess save_encoder_0 _base save DEBUG Saving model: 'A:\Videos\Deepfakes\Training Model Dir\original_encoder.h5'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Started all threads 'save_encoder': 1
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread(s): 'save_state'
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Starting thread 1 of 1: 'save_state_0'
09/01/2019 20:25:12 MainProcess save_state_0 _base save DEBUG Saving State
09/01/2019 20:25:12 MainProcess training_0 multithreading start DEBUG Started all threads 'save_state': 1
09/01/2019 20:25:12 MainProcess training_0 multithreading join DEBUG Joining Threads: 'save_decoder_a'
09/01/2019 20:25:12 MainProcess training_0 multithreading join DEBUG Joining Thread: 'save_decoder_a_0'
09/01/2019 20:25:12 MainProcess save_state_0 _base save DEBUG Saved State
09/01/2019 20:25:13 MainProcess training_0 multithreading join DEBUG Joined all Threads: 'save_decoder_a'
09/01/2019 20:25:13 MainProcess training_0 multithreading join DEBUG Joining Threads: 'save_decoder_b'
09/01/2019 20:25:13 MainProcess training_0 multithreading join DEBUG Joining Thread: 'save_decoder_b_0'
09/01/2019 20:25:14 MainProcess training_0 multithreading join DEBUG Joined all Threads: 'save_decoder_b'
09/01/2019 20:25:14 MainProcess training_0 multithreading join DEBUG Joining Threads: 'save_encoder'
09/01/2019 20:25:14 MainProcess training_0 multithreading join DEBUG Joining Thread: 'save_encoder_0'
09/01/2019 20:25:22 MainProcess training_0 multithreading join DEBUG Joined all Threads: 'save_encoder'
09/01/2019 20:25:22 MainProcess training_0 multithreading join DEBUG Joining Threads: 'save_state'
09/01/2019 20:25:22 MainProcess training_0 multithreading join DEBUG Joining Thread: 'save_state_0'
09/01/2019 20:25:22 MainProcess training_0 multithreading join DEBUG Joined all Threads: 'save_state'
09/01/2019 20:25:22 MainProcess training_0 _base save_models INFO [Saved models] - Average since last save: face_loss_A: 0.02880, face_loss_B: 0.02933
09/01/2019 20:25:34 MainProcess training_0 training_data join_subprocess DEBUG Joining FixedProducerDispatcher
09/01/2019 20:25:34 SpawnProcess-2 MainThread training_data load_batches DEBUG Finished batching: (epoch: 2860992, side: 'a', is_display: False)
09/01/2019 20:25:34 SpawnProcess-2 MainThread multithreading _runner DEBUG FixedProducerDispatcher worker for <bound method TrainingDataGenerator.load_batches of <lib.training_data.TrainingDataGenerator object at 0x0000020E6705B128>> shutdown
09/01/2019 20:25:34 MainProcess training_0 training_data join_subprocess DEBUG Joined FixedProducerDispatcher
09/01/2019 20:25:34 MainProcess training_0 training_data join_subprocess DEBUG Joining FixedProducerDispatcher
09/01/2019 20:25:34 SpawnProcess-3 MainThread training_data load_batches DEBUG Finished batching: (epoch: 2860992, side: 'b', is_display: False)
09/01/2019 20:25:34 SpawnProcess-3 MainThread multithreading _runner DEBUG FixedProducerDispatcher worker for <bound method TrainingDataGenerator.load_batches of <lib.training_data.TrainingDataGenerator object at 0x0000013E46EAB128>> shutdown
09/01/2019 20:25:34 MainProcess training_0 training_data join_subprocess DEBUG Joined FixedProducerDispatcher
09/01/2019 20:25:34 MainProcess training_0 multithreading run DEBUG Error in thread (training_0): GPU sync failed
09/01/2019 20:25:35 MainProcess MainThread train monitor DEBUG Thread error detected
09/01/2019 20:25:35 MainProcess MainThread train monitor DEBUG Closed Monitor
09/01/2019 20:25:35 MainProcess MainThread train end_thread DEBUG Ending Training thread
09/01/2019 20:25:35 MainProcess MainThread train end_thread CRITICAL Error caught! Exiting...
09/01/2019 20:25:35 MainProcess MainThread multithreading join DEBUG Joining Threads: 'training'
09/01/2019 20:25:35 MainProcess MainThread multithreading join DEBUG Joining Thread: 'training_0'
09/01/2019 20:25:35 MainProcess MainThread multithreading join ERROR Caught exception in thread: 'training_0'
Traceback (most recent call last):
File "C:\Users\ADMIN\faceswap\lib\cli.py", line 125, in execute_script
process.process()
File "C:\Users\ADMIN\faceswap\scripts\train.py", line 98, in process
self.end_thread(thread, err)
File "C:\Users\ADMIN\faceswap\scripts\train.py", line 124, in end_thread
thread.join()
File "C:\Users\ADMIN\faceswap\lib\multithreading.py", line 461, in join
raise thread.err[1].with_traceback(thread.err[2])
File "C:\Users\ADMIN\faceswap\lib\multithreading.py", line 392, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\ADMIN\faceswap\scripts\train.py", line 149, in training
raise err
File "C:\Users\ADMIN\faceswap\scripts\train.py", line 139, in training
self.run_training_cycle(model, trainer)
File "C:\Users\ADMIN\faceswap\scripts\train.py", line 221, in run_training_cycle
trainer.train_one_step(viewer, timelapse)
File "C:\Users\ADMIN\faceswap\plugins\train\trainer\_base.py", line 213, in train_one_step
raise err
File "C:\Users\ADMIN\faceswap\plugins\train\trainer\_base.py", line 178, in train_one_step
loss[side] = batcher.train_one_batch(do_preview)
File "C:\Users\ADMIN\faceswap\plugins\train\trainer\_base.py", line 278, in train_one_batch
loss = self.model.predictors[self.side].train_on_batch(*batch)
File "C:\Users\ADMIN\MiniConda3\envs\faceswap\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "C:\Users\ADMIN\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "C:\Users\ADMIN\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "C:\Users\ADMIN\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
============ System Information ============
encoding: cp1252
git_branch: master
git_commits: feedd2a More robust Crash Report messaging. 5bf54d9 Add configs and state file to crash report. 7184ae3 Merge branch 'master' of https://github.com/deepfakes/faceswap. 1a18241 Revert "Delete align_eyes.py". e6f17cd Delete align_eyes.py
gpu_cuda: No global version found. Check Conda packages for Conda Cuda
gpu_cudnn: No global version found. Check Conda packages for Conda cuDNN
gpu_devices: GPU_0: GeForce RTX 2060
gpu_devices_active: GPU_0
gpu_driver: 436.15
gpu_vram: GPU_0: 6144MB
os_machine: AMD64
os_platform: Windows-10-10.0.18362-SP0
os_release: 10
py_command: C:\Users\ADMIN\faceswap\faceswap.py train -A A:/Videos/Deepfakes/JP/TRAINING -ala A:/Videos/Deepfakes/JP/TRAINING/alignments.json -B A:/Videos/Deepfakes/ME/TRAINING -alb A:/Videos/Deepfakes/ME/TRAINING/alignments.json -m A:/Videos/Deepfakes/Training Model Dir -t original -bs 64 -it 1000000 -g 1 -s 100 -ss 25000 -ps 50 -L INFO -gui
py_conda_version: conda 4.7.11
py_implementation: CPython
py_version: 3.6.9
py_virtual_env: True
sys_cores: 16
sys_processor: AMD64 Family 23 Model 8 Stepping 2, AuthenticAMD
sys_ram: Total: 32714MB, Available: 9731MB, Used: 22982MB, Free: 9731MB
=============== Pip Packages ===============
absl-py==0.7.1
astor==0.8.0
certifi==2019.6.16
cloudpickle==1.2.1
cycler==0.10.0
cytoolz==0.10.0
dask==2.3.0
decorator==4.4.0
fastcluster==1.1.25
ffmpy==0.2.2
gast==0.2.2
grpcio==1.16.1
h5py==2.9.0
imageio==2.5.0
imageio-ffmpeg==0.3.0
joblib==0.13.2
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==2.2.2
mkl-fft==1.0.14
mkl-random==1.0.2
mkl-service==2.0.2
networkx==2.3
numpy==1.16.2
nvidia-ml-py3==7.352.1
olefile==0.46
opencv-python==4.1.0.25
pathlib==1.0.1
Pillow==6.1.0
protobuf==3.8.0
psutil==5.6.3
pyparsing==2.4.2
pyreadline==2.1
python-dateutil==2.8.0
pytz==2019.2
PyWavelets==1.0.3
pywin32==223
PyYAML==5.1.2
scikit-image==0.15.0
scikit-learn==0.21.2
scipy==1.3.1
six==1.12.0
tensorboard==1.14.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0
termcolor==1.1.0
toolz==0.10.0
toposort==1.5
tornado==6.0.3
tqdm==4.32.1
Werkzeug==0.15.5
wincertstore==0.2
wrapt==1.11.2