Problem while training

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
simplo
Posts: 1
Joined: Sun Apr 05, 2020 5:30 pm

Problem while training

Post by simplo »

Hi all, can you please help me?
I've got a problem as soon as the train starts.
I'm using the Windows interface with all standard settings.
The generated command is this:

C:\Users\asusI7\anaconda3\envs\deepFake\python.exe C:\Users\asusI7\Documents\td\faceswap\faceswap.py train -A C:/Users/asusI7/Documents/td/faceswap/facceSimplo -B C:/Users/asusI7/Documents/td/faceswap/facceBritney -m C:/Users/asusI7/Documents/td/faceswap/originals -t original -bs 64 -it 1000000 -g 1 -s 100 -ss 25000 -ps 50 -L INFO

I would like to attach the log, but I don't know how. It is too long to copy here.
Please let me know how can I send it to you, it it helps.
Here's the message I received on the GUI:

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
04/05/2020 19:35:03 INFO     Log level set to: INFO
Using TensorFlow backend.
04/05/2020 19:35:05 INFO     Model A Directory: C:\Users\asusI7\Documents\td\faceswap\facceSimplo
04/05/2020 19:35:05 INFO     Model B Directory: C:\Users\asusI7\Documents\td\faceswap\facceBritney
04/05/2020 19:35:05 INFO     Training data directory: C:\Users\asusI7\Documents\td\faceswap\originals
04/05/2020 19:35:05 INFO     ===================================================
04/05/2020 19:35:05 INFO       Starting
04/05/2020 19:35:05 INFO       Press 'Stop' to save and quit
04/05/2020 19:35:05 INFO     ===================================================
04/05/2020 19:35:06 INFO     Loading data, this may take a while...
04/05/2020 19:35:06 INFO     Loading Model from Original plugin...
04/05/2020 19:35:06 INFO     No existing state file found. Generating.
04/05/2020 19:35:08 INFO     Creating new 'original' model in folder: 'C:\Users\asusI7\Documents\td\faceswap\originals'
04/05/2020 19:35:08 INFO     Loading Trainer from Original plugin...
04/05/2020 19:35:10 INFO     Enabled TensorBoard Logging
2020-04-05 19:35:19.647184: E tensorflow/stream_executor/cuda/cuda_driver.cc:1006] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-05 19:35:19.647484: E tensorflow/stream_executor/gpu/gpu_timer.cc:55] Internal: error destroying CUDA event in context 0x22f0970d220: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-05 19:35:19.647676: E tensorflow/stream_executor/gpu/gpu_timer.cc:60] Internal: error destroying CUDA event in context 0x22f0970d220: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-05 19:35:19.647895: E tensorflow/stream_executor/cuda/cuda_driver.cc:704] failed to enqueue async memset operation: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-05 19:35:19.648098: E tensorflow/stream_executor/cuda/cuda_driver.cc:625] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-05 19:35:19.648286: E tensorflow/stream_executor/cuda/cuda_driver.cc:630] error log buffer (1024 bytes):
2020-04-05 19:35:19.648477: E tensorflow/stream_executor/cuda/cuda_driver.cc:625] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-05 19:35:19.648654: E tensorflow/stream_executor/cuda/cuda_driver.cc:630] error log buffer (1024 bytes):
04/05/2020 19:35:20 CRITICAL Error caught! Exiting...
04/05/2020 19:35:20 ERROR    Caught exception in thread: '_training_0'
04/05/2020 19:35:23 ERROR    Got Exception on main handler:
Traceback (most recent call last):
File "C:\Users\asusI7\Documents\td\faceswap\lib\cli.py", line 128, in execute_script
process.process()
File "C:\Users\asusI7\Documents\td\faceswap\scripts\train.py", line 159, in process
self._end_thread(thread, err)
File "C:\Users\asusI7\Documents\td\faceswap\scripts\train.py", line 199, in _end_thread
thread.join()
File "C:\Users\asusI7\Documents\td\faceswap\lib\multithreading.py", line 121, in join
raise thread.err[1].with_traceback(thread.err[2])
File "C:\Users\asusI7\Documents\td\faceswap\lib\multithreading.py", line 37, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\asusI7\Documents\td\faceswap\scripts\train.py", line 224, in _training
raise err
File "C:\Users\asusI7\Documents\td\faceswap\scripts\train.py", line 214, in _training
self._run_training_cycle(model, trainer)
File "C:\Users\asusI7\Documents\td\faceswap\scripts\train.py", line 303, in _run_training_cycle
trainer.train_one_step(viewer, timelapse)
File "C:\Users\asusI7\Documents\td\faceswap\plugins\train\trainer\_base.py", line 316, in train_one_step
raise err
File "C:\Users\asusI7\Documents\td\faceswap\plugins\train\trainer\_base.py", line 283, in train_one_step
loss[side] = batcher.train_one_batch()
File "C:\Users\asusI7\Documents\td\faceswap\plugins\train\trainer\_base.py", line 424, in train_one_batch
loss = self._model.predictors[self._side].train_on_batch(model_inputs, model_targets)
File "C:\Users\asusI7\anaconda3\envs\deepFake\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "C:\Users\asusI7\anaconda3\envs\deepFake\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "C:\Users\asusI7\anaconda3\envs\deepFake\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "C:\Users\asusI7\anaconda3\envs\deepFake\lib\site-packages\tensorflow_core\python\client\session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cuDNN launch failure : input shape([64,1024,4,4]) filter shape([3,3,1024,2048])
[[{{node encoder/upscale_4_0_conv2d/convolution}}]]
[[loss/mul/_295]]
(1) Internal: cuDNN launch failure : input shape([64,1024,4,4]) filter shape([3,3,1024,2048])
[[{{node encoder/upscale_4_0_conv2d/convolution}}]]
0 successful operations.
0 derived errors ignored.
04/05/2020 19:35:23 CRITICAL An unexpected crash has occurred. Crash report written to 'C:\Users\asusI7\Documents\td\faceswap\crash_report.2020.04.05.193523012010.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.
User avatar
torzdf
Posts: 2665
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: Problem while training

Post by torzdf »

Try enabling the "Allow Growth" option. If that doesn't work, put your log in Pastebin and link here

My word is final

Locked