crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
dedude
Posts: 2
Joined: Sun Jan 24, 2021 1:05 am

crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

Post by dedude »

I did everything like ( except I entered the alignments directory manualy, but then I got a crash report.

Here the crashreport i received.

Code: Select all

01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000005_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000411_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000085_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000399_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000305_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000128_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000026_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000173_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000032_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000215_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000015_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000404_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000088_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000137_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_0                         training_data   _expand_partials               DEBUG    Generating mask. side: 'b', filename: 'C:\Users\hanse\Desktop\df1\faceB\generated(3)(1)_000035_0.png'
01/24/2021 01:59:03 MainProcess     _run_0                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Mask already generated. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000189_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000023_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000246_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Generating mask. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000089_0.png'
01/24/2021 01:59:03 MainProcess     _run_1                         aligned_face    extract_face                   DEBUG    _extract_face called without a loaded image. Returning empty face.
01/24/2021 01:59:03 MainProcess     _run_1                         training_data   _expand_partials               DEBUG    Mask already generated. side: 'a', filename: 'C:\Users\hanse\Desktop\df1\faceA\jp tiktok_000071_0.png'
01/24/2021 01:59:03 MainProcess     _training_0                    ag_logging      warn                           DEBUG    AutoGraph could not transform <bound method Logger.isEnabledFor of <FaceswapLogger lib.model.losses_tf (DEBUG)>> and will run it as-is.\nPlease report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\nCause: module 'gast' has no attribute 'Index'\nTo silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
01/24/2021 01:59:03 MainProcess     _training_0                    ag_logging      warn                           DEBUG    AutoGraph could not transform <bound method Logger.findCaller of <FaceswapLogger lib.model.losses_tf (DEBUG)>> and will run it as-is.\nPlease report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\nCause: module 'gast' has no attribute 'Index'\nTo silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
01/24/2021 01:59:03 MainProcess     _training_0                    ag_logging      warn                           DEBUG    AutoGraph could not transform <bound method Logger.makeRecord of <FaceswapLogger lib.model.losses_tf (DEBUG)>> and will run it as-is.\nPlease report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\nCause: module 'gast' has no attribute 'Index'\nTo silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
01/24/2021 01:59:03 MainProcess     _training_0                    ag_logging      warn                           DEBUG    AutoGraph could not transform <bound method FaceswapFormatter.format of <lib.logger.FaceswapFormatter object at 0x000001856AC57F70>> and will run it as-is.\nPlease report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\nCause: module 'gast' has no attribute 'Index'\nTo silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
01/24/2021 01:59:03 MainProcess     _training_0                    api             converted_call                 DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x0000018512090D30>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:04 MainProcess     _training_0                    ag_logging      warn                           DEBUG    AutoGraph could not transform <bound method LossWrapper._apply_mask of <class 'lib.model.losses_tf.LossWrapper'>> and will run it as-is.\nPlease report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\nCause: module 'gast' has no attribute 'Index'\nTo silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:04 MainProcess     _training_0                    ag_logging      warn                           DEBUG    AutoGraph could not transform <bound method DSSIMObjective.call of <lib.model.losses_tf.DSSIMObjective object at 0x0000018575AAA910>> and will run it as-is.\nPlease report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\nCause: module 'gast' has no attribute 'Index'\nTo silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x000001851208F820>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185120A21C0>, weight: 3.0, mask_channel: 4)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185120A2AF0>, weight: 1.0, mask_channel: 1)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101C3070>, weight: 2.0, mask_channel: 5)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101C3AC0>, weight: 1.0, mask_channel: 2)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101CF070>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101CFAC0>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101B0040>, weight: 3.0, mask_channel: 4)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101B0B20>, weight: 1.0, mask_channel: 1)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101920A0>, weight: 2.0, mask_channel: 5)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
01/24/2021 01:59:04 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x0000018510192AF0>, weight: 1.0, mask_channel: 2)
01/24/2021 01:59:04 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x0000018512090D30>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x000001851208F820>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185120A21C0>, weight: 3.0, mask_channel: 4)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185120A2AF0>, weight: 1.0, mask_channel: 1)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101C3070>, weight: 2.0, mask_channel: 5)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101C3AC0>, weight: 1.0, mask_channel: 2)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101CF070>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101CFAC0>, weight: 1.0, mask_channel: 3)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101B0040>, weight: 3.0, mask_channel: 4)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101B0B20>, weight: 1.0, mask_channel: 1)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
01/24/2021 01:59:06 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x00000185101920A0>, weight: 2.0, mask_channel: 5)
01/24/2021 01:59:06 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
01/24/2021 01:59:07 MainProcess     _training_0                    tmp4mdlxfxe     if_body                        DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x0000018510192AF0>, weight: 1.0, mask_channel: 2)
01/24/2021 01:59:07 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
01/24/2021 01:59:10 MainProcess     _training_0                    multithreading  run                            DEBUG    Error in thread (_training_0): 2 root error(s) found.\n  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.\n	 [[node original/encoder/conv_128_0_conv2d/Conv2D (defined at Software\faceswap\plugins\train\trainer\_base.py:238) ]]\n	 [[Func/cond/then/_0/input/_32/_46]]\n  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.\n	 [[node original/encoder/conv_128_0_conv2d/Conv2D (defined at Software\faceswap\plugins\train\trainer\_base.py:238) ]]\n0 successful operations.\n0 derived errors ignored. [Op:__inference_train_function_8926]\n\nFunction call stack:\ntrain_function -> train_function\n
01/24/2021 01:59:11 MainProcess     MainThread                     train           _monitor                       DEBUG    Thread error detected
01/24/2021 01:59:11 MainProcess     MainThread                     train           _monitor                       DEBUG    Closed Monitor
01/24/2021 01:59:11 MainProcess     MainThread                     train           _end_thread                    DEBUG    Ending Training thread
01/24/2021 01:59:11 MainProcess     MainThread                     train           _end_thread                    CRITICAL Error caught! Exiting...
01/24/2021 01:59:11 MainProcess     MainThread                     multithreading  join                           DEBUG    Joining Threads: '_training'
01/24/2021 01:59:11 MainProcess     MainThread                     multithreading  join                           DEBUG    Joining Thread: '_training_0'
01/24/2021 01:59:11 MainProcess     MainThread                     multithreading  join                           ERROR    Caught exception in thread: '_training_0'
Traceback (most recent call last):
  File "C:\Software\faceswap\lib\cli\launcher.py", line 182, in execute_script
    process.process()
  File "C:\Software\faceswap\scripts\train.py", line 170, in process
    self._end_thread(thread, err)
  File "C:\Software\faceswap\scripts\train.py", line 210, in _end_thread
    thread.join()
  File "C:\Software\faceswap\lib\multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "C:\Software\faceswap\lib\multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Software\faceswap\scripts\train.py", line 232, in _training
    raise err
  File "C:\Software\faceswap\scripts\train.py", line 222, in _training
    self._run_training_cycle(model, trainer)
  File "C:\Software\faceswap\scripts\train.py", line 302, in _run_training_cycle
    trainer.train_one_step(viewer, timelapse)
  File "C:\Software\faceswap\plugins\train\trainer\_base.py", line 238, in train_one_step
    loss = self._model.model.train_on_batch(model_inputs, y=model_targets)
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1695, in train_on_batch
    logs = train_function(iterator)
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
    outputs = execute.execute(
  File "C:\Users\hanse\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node original/encoder/conv_128_0_conv2d/Conv2D (defined at Software\faceswap\plugins\train\trainer\_base.py:238) ]]
	 [[Func/cond/then/_0/input/_32/_46]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node original/encoder/conv_128_0_conv2d/Conv2D (defined at Software\faceswap\plugins\train\trainer\_base.py:238) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_8926]

Function call stack:
train_function -> train_function


============ System Information ============
encoding:            cp1252
git_branch:          Not Found
git_commits:         Not Found
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: GeForce RTX 2070
gpu_devices_active:  GPU_0
gpu_driver:          461.09
gpu_vram:            GPU_0: 8192MB
os_machine:          AMD64
os_platform:         Windows-10-10.0.18362-SP0
os_release:          10
py_command:          C:\Software\faceswap\faceswap.py train -A C:/Users/hanse/Desktop/df1/faceA -ala C:/Users/hanse/Desktop/df1/jp tiktok_alignments.fsa -B C:/Users/hanse/Desktop/df1/faceB -alb C:/Users/hanse/Desktop/df1/generated(3)(1)_alignments.fsa -m C:/Users/hanse/Desktop/df1/faceAB -t original -bs 12 -it 1000000 -s 250 -ss 25000 -tia C:/Users/hanse/Desktop/df1/faceA -tib C:/Users/hanse/Desktop/df1/faceB -to C:/Users/hanse/Desktop/df1/tl -ps 50 -L INFO -gui
py_conda_version:    conda 4.9.2
py_implementation:   CPython
py_version:          3.8.5
py_virtual_env:      True
sys_cores:           6
sys_processor:       Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
sys_ram:             Total: 16319MB, Available: 7017MB, Used: 9301MB, Free: 7017MB

=============== Pip Packages ===============
absl-py @ file:///tmp/build/80754af9/absl-py_1607439979954/work
aiohttp @ file:///C:/ci/aiohttp_1607109697839/work
astunparse==1.6.3
async-timeout==3.0.1
attrs @ file:///tmp/build/80754af9/attrs_1604765588209/work
blinker==1.4
brotlipy==0.7.0
cachetools @ file:///tmp/build/80754af9/cachetools_1607706694405/work
certifi==2020.12.5
cffi @ file:///C:/ci/cffi_1606255208697/work
chardet @ file:///C:/ci/chardet_1605303225733/work
click @ file:///home/linux1/recipes/ci/click_1610990599742/work
cryptography==2.9.2
cycler==0.10.0
fastcluster==1.1.26
ffmpy==0.2.3
gast @ file:///tmp/build/80754af9/gast_1597433534803/work
google-auth @ file:///tmp/build/80754af9/google-auth_1607969906642/work
google-auth-oauthlib @ file:///tmp/build/80754af9/google-auth-oauthlib_1603929124518/work
google-pasta==0.2.0
grpcio @ file:///C:/ci/grpcio_1597406462198/work
h5py==2.10.0
idna @ file:///home/linux1/recipes/ci/idna_1610986105248/work
imageio @ file:///tmp/build/80754af9/imageio_1594161405741/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1609799311556/work
importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work
joblib @ file:///tmp/build/80754af9/joblib_1607970656719/work
Keras-Applications @ file:///tmp/build/80754af9/keras-applications_1594366238411/work
Keras-Preprocessing==1.1.0
kiwisolver @ file:///C:/ci/kiwisolver_1604014703538/work
Markdown @ file:///C:/ci/markdown_1605111189761/work
matplotlib @ file:///C:/ci/matplotlib-base_1592837548929/work
mkl-fft==1.2.0
mkl-random==1.1.1
mkl-service==2.3.0
multidict @ file:///C:/ci/multidict_1600456481656/work
numpy @ file:///C:/ci/numpy_and_numpy_base_1603466732592/work
nvidia-ml-py3 @ git+https://github.com/deepfakes/nvidia-ml-py3.git@6fc29ac84b32bad877f078cb4a777c1548a00bf6
oauthlib==3.1.0
olefile==0.46
opencv-python==4.5.1.48
opt-einsum==3.1.0
pathlib==1.0.1
Pillow @ file:///C:/ci/pillow_1609786840597/work
protobuf==3.13.0
psutil @ file:///C:/ci/psutil_1598370330503/work
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
PyJWT @ file:///C:/ci/pyjwt_1610893382614/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work
pyparsing @ file:///home/linux1/recipes/ci/pyparsing_1610983426697/work
pyreadline==2.1
PySocks @ file:///C:/ci/pysocks_1605287845585/work
python-dateutil==2.8.1
pywin32==227
requests @ file:///tmp/build/80754af9/requests_1608241421344/work
requests-oauthlib==1.3.0
rsa @ file:///tmp/build/80754af9/rsa_1610483308194/work
scikit-learn @ file:///C:/ci/scikit-learn_1598377018496/work
scipy @ file:///C:/ci/scipy_1604596260408/work
sip==4.19.13
six @ file:///C:/ci/six_1605187374963/work
tensorboard @ file:///home/builder/ktietz/conda/conda-bld/tensorboard_1604313476433/work/tmp_pip_dir
tensorboard-plugin-wit==1.6.0
tensorflow==2.3.0
tensorflow-estimator @ file:///tmp/build/80754af9/tensorflow-estimator_1599136169057/work/whl_temp/tensorflow_estimator-2.3.0-py2.py3-none-any.whl
termcolor==1.1.0
threadpoolctl @ file:///tmp/tmp9twdgx9k/threadpoolctl-2.1.0-py3-none-any.whl
tornado @ file:///C:/ci/tornado_1606942392901/work
tqdm @ file:///tmp/build/80754af9/tqdm_1609788246169/work
typing-extensions @ file:///tmp/build/80754af9/typing_extensions_1598376058250/work
urllib3 @ file:///tmp/build/80754af9/urllib3_1606938623459/work
Werkzeug==1.0.1
win-inet-pton @ file:///C:/ci/win_inet_pton_1605306167264/work
wincertstore==0.2
wrapt==1.12.1
yarl @ file:///C:/ci/yarl_1598045274898/work
zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work

============== Conda Packages ==============
# packages in environment at C:\Users\hanse\MiniConda3\envs\faceswap:
#
# Name                    Version                   Build  Channel
_tflow_select             2.3.0                       gpu  
absl-py                   0.11.0             pyhd3eb1b0_1  
aiohttp                   3.7.3            py38h2bbff1b_1  
astunparse                1.6.3                      py_0  
async-timeout             3.0.1                    py38_0  
attrs                     20.3.0             pyhd3eb1b0_0  
blas                      1.0                         mkl  
blinker                   1.4                      py38_0  
brotlipy                  0.7.0           py38h2bbff1b_1003  
ca-certificates           2021.1.19            haa95532_0  
cachetools                4.2.0              pyhd3eb1b0_0  
certifi                   2020.12.5        py38haa95532_0  
cffi                      1.14.4           py38hcd4344a_0  
chardet                   3.0.4           py38haa95532_1003  
click                     7.1.2              pyhd3eb1b0_0  
cryptography              2.9.2            py38h7a1dbc1_0  
cudatoolkit               10.1.243             h74a9793_0  
cudnn                     7.6.5                cuda10.1_0  
cycler                    0.10.0                   py38_0  
fastcluster               1.1.26           py38h251f6bf_2    conda-forge
ffmpeg                    4.3.1                ha925a31_0    conda-forge
ffmpy                     0.2.3                    pypi_0    pypi
freetype                  2.10.4               hd328e21_0  
gast                      0.4.0                      py_0  
git                       2.23.0               h6bb4b03_0  
google-auth               1.24.0             pyhd3eb1b0_0  
google-auth-oauthlib      0.4.2              pyhd3eb1b0_2  
google-pasta              0.2.0                      py_0  
grpcio                    1.31.0           py38he7da953_0  
h5py                      2.10.0           py38h5e291fa_0  
hdf5                      1.10.4               h7ebc959_0  
icc_rt                    2019.0.0             h0cc432a_1  
icu                       58.2                 ha925a31_3  
idna                      2.10               pyhd3eb1b0_0  
imageio                   2.9.0                      py_0  
imageio-ffmpeg            0.4.3              pyhd8ed1ab_0    conda-forge
importlib-metadata        2.0.0                      py_1  
intel-openmp              2020.2                      254  
joblib                    1.0.0              pyhd3eb1b0_0  
jpeg                      9b                   hb83a4c4_2  
keras-applications        1.0.8                      py_1  
keras-preprocessing       1.1.0                      py_1  
kiwisolver                1.3.0            py38hd77b12b_0  
libpng                    1.6.37               h2a8f88b_0  
libprotobuf               3.13.0.1             h200bbdf_0  
libtiff                   4.1.0                h56a325e_1  
lz4-c                     1.9.3                h2bbff1b_0  
markdown                  3.3.3            py38haa95532_0  
matplotlib                3.2.2                         0  
matplotlib-base           3.2.2            py38h64f37c6_0  
mkl                       2020.2                      256  
mkl-service               2.3.0            py38h196d8e1_0  
mkl_fft                   1.2.0            py38h45dec08_0  
mkl_random                1.1.1            py38h47e9c7a_0  
multidict                 4.7.6            py38he774522_1  
numpy                     1.19.2           py38hadc3359_0  
numpy-base                1.19.2           py38ha3acd2a_0  
nvidia-ml-py3             7.352.1                  pypi_0    pypi
oauthlib                  3.1.0                      py_0  
olefile                   0.46                       py_0  
opencv-python             4.5.1.48                 pypi_0    pypi
openssl                   1.1.1i               h2bbff1b_0  
opt_einsum                3.1.0                      py_0  
pathlib                   1.0.1                      py_1  
pillow                    8.1.0            py38h4fa10fc_0  
pip                       20.3.3           py38haa95532_0  
protobuf                  3.13.0.1         py38ha925a31_1  
psutil                    5.7.2            py38he774522_0  
pyasn1                    0.4.8                      py_0  
pyasn1-modules            0.2.8                      py_0  
pycparser                 2.20                       py_2  
pyjwt                     2.0.1            py38haa95532_0  
pyopenssl                 20.0.1             pyhd3eb1b0_1  
pyparsing                 2.4.7              pyhd3eb1b0_0  
pyqt                      5.9.2            py38ha925a31_4  
pyreadline                2.1                      py38_1  
pysocks                   1.7.1            py38haa95532_0  
python                    3.8.5                h5fd99cc_1  
python-dateutil           2.8.1                      py_0  
python_abi                3.8                      1_cp38    conda-forge
pywin32                   227              py38he774522_1  
qt                        5.9.7            vc14h73c81de_0  
requests                  2.25.1             pyhd3eb1b0_0  
requests-oauthlib         1.3.0                      py_0  
rsa                       4.7                pyhd3eb1b0_1  
scikit-learn              0.23.2           py38h47e9c7a_0  
scipy                     1.5.2            py38h14eb087_0  
setuptools                51.3.3           py38haa95532_4  
sip                       4.19.13          py38ha925a31_0  
six                       1.15.0           py38haa95532_0  
sqlite                    3.33.0               h2a8f88b_0  
tensorboard               2.3.0              pyh4dce500_0  
tensorboard-plugin-wit    1.6.0                      py_0  
tensorflow                2.3.0           mkl_py38h1fcfbd6_0  
tensorflow-base           2.3.0           gpu_py38h7339f5a_0  
tensorflow-estimator      2.3.0              pyheb71bc4_0  
tensorflow-gpu            2.3.0                he13fc11_0  
termcolor                 1.1.0                    py38_1  
threadpoolctl             2.1.0              pyh5ca1d4c_0  
tk                        8.6.10               he774522_0  
tornado                   6.1              py38h2bbff1b_0  
tqdm                      4.55.1             pyhd3eb1b0_0  
typing-extensions         3.7.4.3                       0  
typing_extensions         3.7.4.3                    py_0  
urllib3                   1.26.2             pyhd3eb1b0_0  
vc                        14.2                 h21ff451_1  
vs2015_runtime            14.27.29016          h5e58377_2  
werkzeug                  1.0.1                      py_0  
wheel                     0.36.2             pyhd3eb1b0_0  
win_inet_pton             1.1.0            py38haa95532_0  
wincertstore              0.2                      py38_0  
wrapt                     1.12.1           py38he774522_1  
xz                        5.2.5                h62dcd97_0  
yarl                      1.5.1            py38he774522_0  
zipp                      3.4.0              pyhd3eb1b0_0  
zlib                      1.2.11               h62dcd97_4  
zstd                      1.4.5                h04227a9_0  

================= Configs ==================
--------- .faceswap ---------
backend:                  nvidia

--------- convert.ini ---------

[color.color_transfer]
clip:                     True
preserve_paper:           True

[color.manual_balance]
colorspace:               HSV
balance_1:                0.0
balance_2:                0.0
balance_3:                0.0
contrast:                 0.0
brightness:               0.0

[color.match_hist]
threshold:                99.0

[mask.box_blend]
type:                     gaussian
distance:                 11.0
radius:                   5.0
passes:                   1

[mask.mask_blend]
type:                     normalized
kernel_size:              3
passes:                   4
threshold:                4
erosion:                  0.0

[scaling.sharpen]
method:                   none
amount:                   150
radius:                   0.3
threshold:                5.0

[writer.ffmpeg]
container:                mp4
codec:                    libx264
crf:                      23
preset:                   medium
tune:                     none
profile:                  auto
level:                    auto
skip_mux:                 False

[writer.gif]
fps:                      25
loop:                     0
palettesize:              256
subrectangles:            False

[writer.opencv]
format:                   png
draw_transparent:         False
jpg_quality:              75
png_compress_level:       3

[writer.pillow]
format:                   png
draw_transparent:         False
optimize:                 False
gif_interlace:            True
jpg_quality:              75
png_compress_level:       3
tif_compression:          tiff_deflate

--------- extract.ini ---------

[global]
allow_growth:             True

[align.fan]
batch-size:               12

[detect.cv2_dnn]
confidence:               50

[detect.mtcnn]
minsize:                  20
threshold_1:              0.6
threshold_2:              0.7
threshold_3:              0.7
scalefactor:              0.709
batch-size:               8

[detect.s3fd]
confidence:               70
batch-size:               1

[mask.unet_dfl]
batch-size:               8

[mask.vgg_clear]
batch-size:               6

[mask.vgg_obstructed]
batch-size:               2

--------- gui.ini ---------

[global]
fullscreen:               False
tab:                      extract
options_panel_width:      30
console_panel_height:     20
icon_size:                14
font:                     default
font_size:                9
autosave_last_session:    prompt
timeout:                  120
auto_load_model_stats:    True

--------- train.ini ---------

[global]
centering:                face
coverage:                 68.75
icnr_init:                False
conv_aware_init:          False
optimizer:                adam
learning_rate:            5e-05
reflect_padding:          False
allow_growth:             False
mixed_precision:          False
convert_batchsize:        16

[global.loss]
loss_function:            ssim
mask_loss_function:       mse
l2_reg_term:              100
eye_multiplier:           3
mouth_multiplier:         2
penalized_mask_loss:      True
mask_type:                extended
mask_blur_kernel:         3
mask_threshold:           4
learn_mask:               False

[model.dfaker]
output_size:              128

[model.dfl_h128]
lowmem:                   False

[model.dfl_sae]
input_size:               128
clipnorm:                 True
architecture:             df
autoencoder_dims:         0
encoder_dims:             42
decoder_dims:             21
multiscale_decoder:       False

[model.dlight]
features:                 best
details:                  good
output_size:              256

[model.original]
lowmem:                   False

[model.realface]
input_size:               64
output_size:              128
dense_nodes:              1536
complexity_encoder:       128
complexity_decoder:       512

[model.unbalanced]
input_size:               128
lowmem:                   False
clipnorm:                 True
nodes:                    1024
complexity_encoder:       128
complexity_decoder_a:     384
complexity_decoder_b:     512

[model.villain]
lowmem:                   False

[trainer.original]
preview_images:           14
zoom_amount:              5
rotation_range:           10
shift_range:              5
flip_chance:              50
disable_warp:             False
color_lightness:          30
color_ab:                 8
color_clahe_chance:       50
color_clahe_max_size:     4
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by torzdf »

Go to Tools > Settings > Train and enable "Allow Growth".

My word is final

User avatar
lamakaha
Posts: 4
Joined: Sun Jul 25, 2021 7:11 am

Caught exception in thread: '_training_0'

Post by lamakaha »

Hi folks, doing first steps with a faceswap and hitting the wall here

Starting the training process, getting following error

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
07/25/2021 09:14:05 INFO     Log level set to: INFO
07/25/2021 09:14:07 INFO     Model A Directory: '/media/lamakaha/work/projects/deepfake/06_Renders/SanneFace' (156 images)
07/25/2021 09:14:07 INFO     Model B Directory: '/media/lamakaha/work/projects/deepfake/06_Renders/AlexeyFace' (758 images)
07/25/2021 09:14:07 WARNING  At least one of your input folders contains fewer than 250 images. Results are likely to be poor.
07/25/2021 09:14:07 WARNING  You need to provide a significant number of images to successfully train a Neural Network. Aim for between 500 - 5000 images per side.
07/25/2021 09:14:07 INFO     Training data directory: /media/lamakaha/work/projects/deepfake/03_Data/model_v001
07/25/2021 09:14:07 INFO     ===================================================
07/25/2021 09:14:07 INFO       Starting
07/25/2021 09:14:07 INFO       Press 'Stop' to save and quit
07/25/2021 09:14:07 INFO     ===================================================
07/25/2021 09:14:08 INFO     Loading data, this may take a while...
07/25/2021 09:14:08 INFO     Loading Model from Original plugin...
07/25/2021 09:14:08 INFO     No existing state file found. Generating.
07/25/2021 09:14:10 INFO     Loading Trainer from Original plugin...
2021-07-25 09:14:17.532879: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 09:14:17.534480: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 09:14:17.535888: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 09:14:17.537152: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
07/25/2021 09:14:18 CRITICAL Error caught! Exiting...
07/25/2021 09:14:18 ERROR    Caught exception in thread: '_training_0'
07/25/2021 09:14:20 ERROR    Got Exception on main handler:
Traceback (most recent call last):
  File "/home/lamakaha/faceswap/lib/cli/launcher.py", line 182, in execute_script
    process.process()
  File "/home/lamakaha/faceswap/scripts/train.py", line 190, in process
    self._end_thread(thread, err)
  File "/home/lamakaha/faceswap/scripts/train.py", line 230, in _end_thread
    thread.join()
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lamakaha/faceswap/scripts/train.py", line 252, in _training
    raise err
  File "/home/lamakaha/faceswap/scripts/train.py", line 242, in _training
    self._run_training_cycle(model, trainer)
  File "/home/lamakaha/faceswap/scripts/train.py", line 327, in _run_training_cycle
    trainer.train_one_step(viewer, timelapse)
  File "/home/lamakaha/faceswap/plugins/train/trainer/_base.py", line 193, in train_one_step
    loss = self._model.model.train_on_batch(model_inputs, y=model_targets)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1348, in train_on_batch
    logs = train_function(iterator)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node original/encoder_1/conv_128_0_conv2d/Conv2D (defined at /faceswap/plugins/train/trainer/_base.py:193) ]] [Op:__inference_train_function_8730]

Function call stack:
train_function

07/25/2021 09:14:20 CRITICAL An unexpected crash has occurred. Crash report written to '/home/lamakaha/faceswap/crash_report.2021.07.25.091418406632.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.

full log is here

Code: Select all

07/25/2021 09:14:10 MainProcess     _training_0                    _base           _set_preview_feed              DEBUG    Setting preview feed: (side: 'b')
07/25/2021 09:14:10 MainProcess     _training_0                    _base           _load_generator                DEBUG    Loading generator
07/25/2021 09:14:10 MainProcess     _training_0                    _base           _load_generator                DEBUG    input_size: 64, output_shapes: [(64, 64, 3)]
07/25/2021 09:14:10 MainProcess     _training_0                    generator       __init__                       DEBUG    Initializing TrainingDataGenerator: (model_input_size: 64, model_output_shapes: [(64, 64, 3)], coverage_ratio: 0.6875, color_order: bgr, augment_color: True, no_flip: False, no_warp: False, warp_to_landmarks: False, config: {'centering': 'face', 'coverage': 68.75, 'icnr_init': False, 'conv_aware_init': False, 'optimizer': 'adam', 'learning_rate': 5e-05, 'epsilon_exponent': -7, 'reflect_padding': False, 'allow_growth': False, 'mixed_precision': False, 'nan_protection': True, 'convert_batchsize': 16, 'loss_function': 'ssim', 'mask_loss_function': 'mse', 'l2_reg_term': 100, 'eye_multiplier': 3, 'mouth_multiplier': 2, 'penalized_mask_loss': True, 'mask_type': 'extended', 'mask_blur_kernel': 3, 'mask_threshold': 4, 'learn_mask': False, 'preview_images': 14, 'zoom_amount': 5, 'rotation_range': 10, 'shift_range': 5, 'flip_chance': 50, 'color_lightness': 30, 'color_ab': 8, 'color_clahe_chance': 50, 'color_clahe_max_size': 4})
07/25/2021 09:14:10 MainProcess     _training_0                    generator       __init__                       DEBUG    Initialized TrainingDataGenerator
07/25/2021 09:14:10 MainProcess     _training_0                    generator       minibatch_ab                   DEBUG    Queue batches: (image_count: 758, batchsize: 14, side: 'b', do_shuffle: True, is_preview, True, is_timelapse: False)
07/25/2021 09:14:10 MainProcess     _training_0                    augmentation    __init__                       DEBUG    Initializing ImageAugmentation: (batchsize: 14, is_display: True, input_size: 64, output_shapes: [(64, 64, 3)], coverage_ratio: 0.6875, config: {'centering': 'face', 'coverage': 68.75, 'icnr_init': False, 'conv_aware_init': False, 'optimizer': 'adam', 'learning_rate': 5e-05, 'epsilon_exponent': -7, 'reflect_padding': False, 'allow_growth': False, 'mixed_precision': False, 'nan_protection': True, 'convert_batchsize': 16, 'loss_function': 'ssim', 'mask_loss_function': 'mse', 'l2_reg_term': 100, 'eye_multiplier': 3, 'mouth_multiplier': 2, 'penalized_mask_loss': True, 'mask_type': 'extended', 'mask_blur_kernel': 3, 'mask_threshold': 4, 'learn_mask': False, 'preview_images': 14, 'zoom_amount': 5, 'rotation_range': 10, 'shift_range': 5, 'flip_chance': 50, 'color_lightness': 30, 'color_ab': 8, 'color_clahe_chance': 50, 'color_clahe_max_size': 4})
07/25/2021 09:14:10 MainProcess     _training_0                    augmentation    __init__                       DEBUG    Output sizes: [64]
07/25/2021 09:14:10 MainProcess     _training_0                    augmentation    __init__                       DEBUG    Initialized ImageAugmentation
07/25/2021 09:14:10 MainProcess     _training_0                    multithreading  __init__                       DEBUG    Initializing BackgroundGenerator: (target: '_run', thread_count: 2)
07/25/2021 09:14:10 MainProcess     _training_0                    multithreading  __init__                       DEBUG    Initialized BackgroundGenerator: '_run'
07/25/2021 09:14:10 MainProcess     _training_0                    multithreading  start                          DEBUG    Starting thread(s): '_run'
07/25/2021 09:14:10 MainProcess     _training_0                    multithreading  start                          DEBUG    Starting thread 1 of 2: '_run_0'
07/25/2021 09:14:10 MainProcess     _run_0                         generator       _minibatch                     DEBUG    Loading minibatch generator: (image_count: 758, side: 'b', do_shuffle: True)
07/25/2021 09:14:10 MainProcess     _training_0                    multithreading  start                          DEBUG    Starting thread 2 of 2: '_run_1'
07/25/2021 09:14:10 MainProcess     _run_1                         generator       _minibatch                     DEBUG    Loading minibatch generator: (image_count: 758, side: 'b', do_shuffle: True)
07/25/2021 09:14:10 MainProcess     _training_0                    multithreading  start                          DEBUG    Started all threads '_run': 2
07/25/2021 09:14:10 MainProcess     _training_0                    _base           _set_preview_feed              DEBUG    Set preview feed. Batchsize: 14
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initialized _Feeder:
07/25/2021 09:14:10 MainProcess     _training_0                    _base           _set_tensorboard               DEBUG    Enabling TensorBoard Logging
07/25/2021 09:14:10 MainProcess     _training_0                    _base           _set_tensorboard               DEBUG    Setting up TensorBoard Logging
07/25/2021 09:14:10 MainProcess     _run_0                         generator       _validate_version              DEBUG    Setting initial extract version: 2.2
07/25/2021 09:14:10 MainProcess     _training_0                    _base           _set_tensorboard               VERBOSE  Enabled TensorBoard Logging
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initializing _Samples: model: '<plugins.train.model.original.Model object at 0x7f5157a24df0>', coverage_ratio: 0.6875)
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initialized _Samples
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initializing _Timelapse: model: <plugins.train.model.original.Model object at 0x7f5157a24df0>, coverage_ratio: 0.6875, image_count: 14, feeder: '<plugins.train.trainer._base._Feeder object at 0x7f51571a49d0>', image_paths: 2)
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initializing _Samples: model: '<plugins.train.model.original.Model object at 0x7f5157a24df0>', coverage_ratio: 0.6875)
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initialized _Samples
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initialized _Timelapse
07/25/2021 09:14:10 MainProcess     _training_0                    _base           __init__                       DEBUG    Initialized Trainer
07/25/2021 09:14:10 MainProcess     _training_0                    train           _load_trainer                  DEBUG    Loaded Trainer
07/25/2021 09:14:10 MainProcess     _training_0                    train           _run_training_cycle            DEBUG    Running Training Cycle
07/25/2021 09:14:10 MainProcess     _run_0                         augmentation    initialize                     DEBUG    Initializing constants. training_size: 384
07/25/2021 09:14:10 MainProcess     _run_0                         augmentation    initialize                     DEBUG    Initialized constants: {'clahe_base_contrast': 3, 'tgt_slices': slice(60, 324, None), 'warp_mapx': '[[[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]]', 'warp_mapy': '[[[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]]', 'warp_pad': 80, 'warp_slices': slice(8, -8, None), 'warp_lm_edge_anchors': '[[[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]]', 'warp_lm_grids': '[[[  0.   0.   0. ...   0.   0.   0.]\n  [  1.   1.   1. ...   1.   1.   1.]\n  [  2.   2.   2. ...   2.   2.   2.]\n  ...\n  [381. 381. 381. ... 381. 381. 381.]\n  [382. 382. 382. ... 382. 382. 382.]\n  [383. 383. 383. ... 383. 383. 383.]]\n\n [[  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  ...\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]]]'}
07/25/2021 09:14:10 MainProcess     _run_0                         generator       cache_metadata                 DEBUG    All metadata already cached for: ['Sanne_000104_0.png', 'Sanne_000152_0.png']
07/25/2021 09:14:10 MainProcess     _run_0                         generator       _validate_version              DEBUG    Setting initial extract version: 2.2
07/25/2021 09:14:10 MainProcess     _run_1                         augmentation    initialize                     DEBUG    Initializing constants. training_size: 384
07/25/2021 09:14:10 MainProcess     _run_1                         augmentation    initialize                     DEBUG    Initialized constants: {'clahe_base_contrast': 3, 'tgt_slices': slice(60, 324, None), 'warp_mapx': '[[[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]]', 'warp_mapy': '[[[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]]', 'warp_pad': 80, 'warp_slices': slice(8, -8, None), 'warp_lm_edge_anchors': '[[[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]]', 'warp_lm_grids': '[[[  0.   0.   0. ...   0.   0.   0.]\n  [  1.   1.   1. ...   1.   1.   1.]\n  [  2.   2.   2. ...   2.   2.   2.]\n  ...\n  [381. 381. 381. ... 381. 381. 381.]\n  [382. 382. 382. ... 382. 382. 382.]\n  [383. 383. 383. ... 383. 383. 383.]]\n\n [[  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  ...\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]]]'}
07/25/2021 09:14:10 MainProcess     _run_0                         augmentation    initialize                     DEBUG    Initializing constants. training_size: 384
07/25/2021 09:14:10 MainProcess     _run_0                         augmentation    initialize                     DEBUG    Initialized constants: {'clahe_base_contrast': 3, 'tgt_slices': slice(60, 324, None), 'warp_mapx': '[[[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]]', 'warp_mapy': '[[[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]]', 'warp_pad': 80, 'warp_slices': slice(8, -8, None), 'warp_lm_edge_anchors': '[[[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]]', 'warp_lm_grids': '[[[  0.   0.   0. ...   0.   0.   0.]\n  [  1.   1.   1. ...   1.   1.   1.]\n  [  2.   2.   2. ...   2.   2.   2.]\n  ...\n  [381. 381. 381. ... 381. 381. 381.]\n  [382. 382. 382. ... 382. 382. 382.]\n  [383. 383. 383. ... 383. 383. 383.]]\n\n [[  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  ...\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]]]'}
07/25/2021 09:14:10 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541a33a0>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:10 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:11 MainProcess     _run_1                         generator       cache_metadata                 DEBUG    All metadata already cached for: ['Sanne_000104_0.png', 'Sanne_000152_0.png']
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541a34c0>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f5154202760>, weight: 3.0, mask_channel: 4)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f5154202e80>, weight: 1.0, mask_channel: 1)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515421aa60>, weight: 2.0, mask_channel: 5)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515421a970>, weight: 1.0, mask_channel: 2)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
07/25/2021 09:14:11 MainProcess     _run_0                         augmentation    initialize                     DEBUG    Initializing constants. training_size: 384
07/25/2021 09:14:11 MainProcess     _run_0                         augmentation    initialize                     DEBUG    Initialized constants: {'clahe_base_contrast': 3, 'tgt_slices': slice(60, 324, None), 'warp_mapx': '[[[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]\n\n [[ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]\n  [ 60. 126. 192. 258. 324.]]]', 'warp_mapy': '[[[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]\n\n [[ 60.  60.  60.  60.  60.]\n  [126. 126. 126. 126. 126.]\n  [192. 192. 192. 192. 192.]\n  [258. 258. 258. 258. 258.]\n  [324. 324. 324. 324. 324.]]]', 'warp_pad': 80, 'warp_slices': slice(8, -8, None), 'warp_lm_edge_anchors': '[[[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]\n\n [[  0   0]\n  [  0 383]\n  [383 383]\n  [383   0]\n  [191   0]\n  [191 383]\n  [383 191]\n  [  0 191]]]', 'warp_lm_grids': '[[[  0.   0.   0. ...   0.   0.   0.]\n  [  1.   1.   1. ...   1.   1.   1.]\n  [  2.   2.   2. ...   2.   2.   2.]\n  ...\n  [381. 381. 381. ... 381. 381. 381.]\n  [382. 382. 382. ... 382. 382. 382.]\n  [383. 383. 383. ... 383. 383. 383.]]\n\n [[  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  ...\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]\n  [  0.   1.   2. ... 381. 382. 383.]]]'}
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541c1520>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541c12e0>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:11 MainProcess     _run_1                         generator       cache_metadata                 DEBUG    All metadata already cached for: ['Alexey_000731_0.png', 'Alexey_000708_0.png']
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515414e5e0>, weight: 3.0, mask_channel: 4)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515414ea90>, weight: 1.0, mask_channel: 1)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515415a0a0>, weight: 2.0, mask_channel: 5)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515415a5b0>, weight: 1.0, mask_channel: 2)
07/25/2021 09:14:11 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541a33a0>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541a34c0>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f5154202760>, weight: 3.0, mask_channel: 4)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f5154202e80>, weight: 1.0, mask_channel: 1)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515421aa60>, weight: 2.0, mask_channel: 5)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515421a970>, weight: 1.0, mask_channel: 2)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541c1520>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f51541c12e0>, weight: 1.0, mask_channel: 3)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 3
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515414e5e0>, weight: 3.0, mask_channel: 4)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 4
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515414ea90>, weight: 1.0, mask_channel: 1)
07/25/2021 09:14:13 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 1
07/25/2021 09:14:14 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515415a0a0>, weight: 2.0, mask_channel: 5)
07/25/2021 09:14:14 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 5
07/25/2021 09:14:14 MainProcess     _training_0                    losses_tf       call                           DEBUG    Processing loss function: (func: <tensorflow.python.keras.engine.compile_utils.LossesContainer object at 0x7f515415a5b0>, weight: 1.0, mask_channel: 2)
07/25/2021 09:14:14 MainProcess     _training_0                    losses_tf       _apply_mask                    DEBUG    Applying mask from channel 2
07/25/2021 09:14:17 MainProcess     _training_0                    multithreading  run                            DEBUG    Error in thread (_training_0):  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.\n	 [[node original/encoder_1/conv_128_0_conv2d/Conv2D (defined at /faceswap/plugins/train/trainer/_base.py:193) ]] [Op:__inference_train_function_8730]\n\nFunction call stack:\ntrain_function\n
07/25/2021 09:14:18 MainProcess     MainThread                     train           _monitor                       DEBUG    Thread error detected
07/25/2021 09:14:18 MainProcess     MainThread                     train           _monitor                       DEBUG    Closed Monitor
07/25/2021 09:14:18 MainProcess     MainThread                     train           _end_thread                    DEBUG    Ending Training thread
07/25/2021 09:14:18 MainProcess     MainThread                     train           _end_thread                    CRITICAL Error caught! Exiting...
07/25/2021 09:14:18 MainProcess     MainThread                     multithreading  join                           DEBUG    Joining Threads: '_training'
07/25/2021 09:14:18 MainProcess     MainThread                     multithreading  join                           DEBUG    Joining Thread: '_training_0'
07/25/2021 09:14:18 MainProcess     MainThread                     multithreading  join                           ERROR    Caught exception in thread: '_training_0'
Traceback (most recent call last):
  File "/home/lamakaha/faceswap/lib/cli/launcher.py", line 182, in execute_script
    process.process()
  File "/home/lamakaha/faceswap/scripts/train.py", line 190, in process
    self._end_thread(thread, err)
  File "/home/lamakaha/faceswap/scripts/train.py", line 230, in _end_thread
    thread.join()
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lamakaha/faceswap/scripts/train.py", line 252, in _training
    raise err
  File "/home/lamakaha/faceswap/scripts/train.py", line 242, in _training
    self._run_training_cycle(model, trainer)
  File "/home/lamakaha/faceswap/scripts/train.py", line 327, in _run_training_cycle
    trainer.train_one_step(viewer, timelapse)
  File "/home/lamakaha/faceswap/plugins/train/trainer/_base.py", line 193, in train_one_step
    loss = self._model.model.train_on_batch(model_inputs, y=model_targets)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1348, in train_on_batch
    logs = train_function(iterator)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node original/encoder_1/conv_128_0_conv2d/Conv2D (defined at /faceswap/plugins/train/trainer/_base.py:193) ]] [Op:__inference_train_function_8730]

Function call stack:
train_function


============ System Information ============
encoding:            UTF-8
git_branch:          Not Found
git_commits:         Not Found
gpu_cuda:            9.1
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: GeForce RTX 2060
gpu_devices_active:  GPU_0
gpu_driver:          440.82
gpu_vram:            GPU_0: 5933MB
os_machine:          x86_64
os_platform:         Linux-5.3.0-42-generic-x86_64-with-glibc2.17
os_release:          5.3.0-42-generic
py_command:          /home/lamakaha/faceswap/faceswap.py train -A /media/lamakaha/work/projects/deepfake/06_Renders/SanneFace -B /media/lamakaha/work/projects/deepfake/06_Renders/AlexeyFace -m /media/lamakaha/work/projects/deepfake/03_Data/model_v001 -t original -bs 2 -it 1000000 -s 250 -ss 25000 -ps 100 -L INFO -gui
py_conda_version:    conda 4.10.3
py_implementation:   CPython
py_version:          3.8.10
py_virtual_env:      True
sys_cores:           12
sys_processor:       x86_64
sys_ram:             Total: 64263MB, Available: 58550MB, Used: 4752MB, Free: 54832MB

=============== Pip Packages ===============
absl-py @ file:///tmp/build/80754af9/absl-py_1623867230185/work
aiohttp @ file:///tmp/build/80754af9/aiohttp_1614360992924/work
astor==0.8.1
astunparse==1.6.3
async-timeout==3.0.1
attrs @ file:///tmp/build/80754af9/attrs_1620827162558/work
blinker==1.4
brotlipy==0.7.0
cachetools @ file:///tmp/build/80754af9/cachetools_1619597386817/work
certifi==2021.5.30
cffi @ file:///tmp/build/80754af9/cffi_1625807838443/work
chardet @ file:///tmp/build/80754af9/chardet_1605303185383/work
click @ file:///tmp/build/80754af9/click_1621604852318/work
coverage @ file:///tmp/build/80754af9/coverage_1614613670853/work
cryptography @ file:///tmp/build/80754af9/cryptography_1616769286105/work
cycler==0.10.0
Cython @ file:///tmp/build/80754af9/cython_1626256955500/work
fastcluster==1.1.26
ffmpy==0.2.3
gast==0.3.3
google-auth @ file:///tmp/build/80754af9/google-auth_1626320605116/work
google-auth-oauthlib @ file:///tmp/build/80754af9/google-auth-oauthlib_1617120569401/work
google-pasta==0.2.0
grpcio @ file:///tmp/build/80754af9/grpcio_1614884175859/work
h5py @ file:///tmp/build/80754af9/h5py_1593454122442/work
idna @ file:///home/linux1/recipes/ci/idna_1610986105248/work
imageio @ file:///tmp/build/80754af9/imageio_1617700267927/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1621542018480/work
importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1617874469820/work
joblib @ file:///tmp/build/80754af9/joblib_1613502643832/work
Keras-Preprocessing @ file:///tmp/build/80754af9/keras-preprocessing_1612283640596/work
kiwisolver @ file:///tmp/build/80754af9/kiwisolver_1612282420641/work
Markdown @ file:///tmp/build/80754af9/markdown_1614363528767/work
matplotlib @ file:///tmp/build/80754af9/matplotlib-base_1592846008246/work
mkl-fft==1.3.0
mkl-random==1.1.1
mkl-service==2.3.0
multidict @ file:///tmp/build/80754af9/multidict_1607367757617/work
numpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1603570489231/work
nvidia-ml-py3 @ git+https://github.com/deepfakes/nvidia-ml-py3.git@6fc29ac84b32bad877f078cb4a777c1548a00bf6
oauthlib @ file:///tmp/build/80754af9/oauthlib_1623060228408/work
olefile==0.46
opencv-python==4.5.3.56
opt-einsum @ file:///tmp/build/80754af9/opt_einsum_1621500238896/work
pathlib==1.0.1
Pillow @ file:///tmp/build/80754af9/pillow_1625655817137/work
protobuf==3.17.2
psutil @ file:///tmp/build/80754af9/psutil_1612298023621/work
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
PyJWT @ file:///tmp/build/80754af9/pyjwt_1619651636675/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work
pyparsing @ file:///home/linux1/recipes/ci/pyparsing_1610983426697/work
PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
requests @ file:///tmp/build/80754af9/requests_1608241421344/work
requests-oauthlib==1.3.0
rsa @ file:///tmp/build/80754af9/rsa_1614366226499/work
scikit-learn @ file:///tmp/build/80754af9/scikit-learn_1621370412049/work
scipy @ file:///tmp/build/80754af9/scipy_1616703172749/work
sip==4.19.13
six @ file:///tmp/build/80754af9/six_1623709665295/work
tensorboard @ file:///home/builder/ktietz/aggregate/tensorflow_recipes/ci_te/tensorboard_1614593728657/work/tmp_pip_dir
tensorboard-plugin-wit==1.6.0
tensorflow==2.2.0
tensorflow-estimator @ file:///home/builder/ktietz/aggregate/tensorflow_recipes/ci_baze37/tensorflow-estimator_1622026529081/work/tensorflow_estimator-2.5.0-py2.py3-none-any.whl
termcolor==1.1.0
threadpoolctl @ file:///tmp/build/80754af9/threadpoolctl_1626115094421/work
tornado @ file:///tmp/build/80754af9/tornado_1606942300299/work
tqdm @ file:///tmp/build/80754af9/tqdm_1625563689033/work
typing-extensions @ file:///tmp/build/80754af9/typing_extensions_1624965014186/work
urllib3 @ file:///tmp/build/80754af9/urllib3_1625084269274/work
Werkzeug @ file:///home/ktietz/src/ci/werkzeug_1611932622770/work
wrapt==1.12.1
yarl @ file:///tmp/build/80754af9/yarl_1606939922162/work
zipp @ file:///tmp/build/80754af9/zipp_1625570634446/work

============== Conda Packages ==============
# packages in environment at /home/lamakaha/miniconda3/envs/faceswap:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex 4.5 1_gnu
_tflow_select 2.1.0 gpu
absl-py 0.13.0 py38h06a4308_0
aiohttp 3.7.4 py38h27cfd23_1
astor 0.8.1 py38h06a4308_0
astunparse 1.6.3 py_0
async-timeout 3.0.1 py38h06a4308_0
attrs 21.2.0 pyhd3eb1b0_0
blas 1.0 mkl
blinker 1.4 py38h06a4308_0
brotlipy 0.7.0 py38h27cfd23_1003
bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.17.1 h27cfd23_0
ca-certificates 2021.7.5 h06a4308_1
cachetools 4.2.2 pyhd3eb1b0_0
certifi 2021.5.30 py38h06a4308_0
cffi 1.14.6 py38h400218f_0
chardet 3.0.4 py38h06a4308_1003
click 8.0.1 pyhd3eb1b0_0
coverage 5.5 py38h27cfd23_2
cryptography 3.4.7 py38hd23ed53_0
cudatoolkit 10.1.243 h6bb024c_0
cudnn 7.6.5 cuda10.1_0
cupti 10.1.168 0
cycler 0.10.0 py38_0
cython 0.29.24 py38h295c915_0
dbus 1.13.18 hb2f20db_0
expat 2.4.1 h2531618_2
fastcluster 1.1.26 py38hc5bc63f_2 conda-forge ffmpeg 4.3.1 hca11adc_2 conda-forge ffmpy 0.2.3 pypi_0 pypi fontconfig 2.13.1 h6c09931_0
freetype 2.10.4 h5ab3b9f_0
gast 0.3.3 py_0
git 2.23.0 pl526hacde149_0
glib 2.69.0 h5202010_0
gmp 6.2.1 h58526e2_0 conda-forge gnutls 3.6.13 h85f3911_1 conda-forge google-auth 1.33.0 pyhd3eb1b0_0
google-auth-oauthlib 0.4.4 pyhd3eb1b0_0
google-pasta 0.2.0 py_0
grpcio 1.36.1 py38h2157cd5_1
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
h5py 2.10.0 py38hd6299e0_1
hdf5 1.10.6 hb1b8bf9_0
icu 58.2 he6710b0_3
idna 2.10 pyhd3eb1b0_0
imageio 2.9.0 pyhd3eb1b0_0
imageio-ffmpeg 0.4.4 pyhd8ed1ab_0 conda-forge importlib-metadata 3.10.0 py38h06a4308_0
intel-openmp 2021.3.0 h06a4308_3350
joblib 1.0.1 pyhd3eb1b0_0
jpeg 9b h024ee3a_2
keras-preprocessing 1.1.2 pyhd3eb1b0_0
kiwisolver 1.3.1 py38h2531618_0
krb5 1.19.1 h3535a68_0
lame 3.100 h7f98852_1001 conda-forge lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libcurl 7.71.1 h303737a_2
libedit 3.1.20210216 h27cfd23_1
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
libpng 1.6.37 hbc83047_0
libprotobuf 3.17.2 h4ff587b_1
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.2.0 h85742a9_0
libuuid 1.0.3 h1bed415_2
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.12 h03d6c58_0
lz4-c 1.9.3 h2531618_0
markdown 3.3.4 py38h06a4308_0
matplotlib 3.2.2 0
matplotlib-base 3.2.2 py38hef1b27d_0
mkl 2020.2 256
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.3.0 py38h54f3939_0
mkl_random 1.1.1 py38h0573a6f_0
multidict 5.1.0 py38h27cfd23_2
ncurses 6.2 he6710b0_1
nettle 3.6 he412f7d_0 conda-forge numpy 1.19.2 py38h54aff64_0
numpy-base 1.19.2 py38hfa32c7d_0
nvidia-ml-py3 7.352.1 pypi_0 pypi oauthlib 3.1.1 pyhd3eb1b0_0
olefile 0.46 py_0
opencv-python 4.5.3.56 pypi_0 pypi openh264 2.1.1 h780b84a_0 conda-forge openjpeg 2.3.0 h05c96fa_1
openssl 1.1.1k h27cfd23_0
opt_einsum 3.3.0 pyhd3eb1b0_1
pathlib 1.0.1 py_1
pcre 8.45 h295c915_0
perl 5.26.2 h14c3975_0
pillow 8.3.1 py38h2c7a002_0
pip 21.1.3 py38h06a4308_0
protobuf 3.17.2 py38h295c915_0
psutil 5.8.0 py38h27cfd23_1
pyasn1 0.4.8 py_0
pyasn1-modules 0.2.8 py_0
pycparser 2.20 py_2
pyjwt 2.1.0 py38h06a4308_0
pyopenssl 20.0.1 pyhd3eb1b0_1
pyparsing 2.4.7 pyhd3eb1b0_0
pyqt 5.9.2 py38h05f1152_4
pysocks 1.7.1 py38h06a4308_0
python 3.8.10 h12debd9_8
python-dateutil 2.8.2 pyhd3eb1b0_0
python_abi 3.8 2_cp38 conda-forge qt 5.9.7 h5867ecd_1
readline 8.1 h27cfd23_0
requests 2.25.1 pyhd3eb1b0_0
requests-oauthlib 1.3.0 py_0
rsa 4.7.2 pyhd3eb1b0_1
scikit-learn 0.24.2 py38ha9443f7_0
scipy 1.6.2 py38h91f5cce_0
setuptools 52.0.0 py38h06a4308_0
sip 4.19.13 py38he6710b0_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 hc218d9a_0
tensorboard 2.4.0 pyhc547734_0
tensorboard-plugin-wit 1.6.0 py_0
tensorflow 2.2.0 gpu_py38hb782248_0
tensorflow-base 2.2.0 gpu_py38h83e3d50_0
tensorflow-estimator 2.5.0 pyh7b7c402_0
tensorflow-gpu 2.2.0 h0d30ee6_0
termcolor 1.1.0 py38h06a4308_1
threadpoolctl 2.2.0 pyhb85f177_0
tk 8.6.10 hbc83047_0
tornado 6.1 py38h27cfd23_0
tqdm 4.61.2 pyhd3eb1b0_1
typing-extensions 3.10.0.0 hd3eb1b0_0
typing_extensions 3.10.0.0 pyh06a4308_0
urllib3 1.26.6 pyhd3eb1b0_1
werkzeug 1.0.1 pyhd3eb1b0_0
wheel 0.36.2 pyhd3eb1b0_0
wrapt 1.12.1 py38h7b6447c_1
x264 1!161.3030 h7f98852_1 conda-forge xz 5.2.5 h7b6447c_0
yarl 1.6.3 py38h27cfd23_0
zipp 3.5.0 pyhd3eb1b0_0
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0 ================= Configs ================== --------- convert.ini --------- [scaling.sharpen] method: none amount: 150 radius: 0.3 threshold: 5.0 [mask.mask_blend] type: normalized kernel_size: 3 passes: 4 threshold: 4 erosion: 0.0 [mask.box_blend] type: gaussian distance: 11.0 radius: 5.0 passes: 1 [writer.ffmpeg] container: mp4 codec: libx264 crf: 23 preset: medium tune: none profile: auto level: auto skip_mux: False [writer.pillow] format: png draw_transparent: False optimize: False gif_interlace: True jpg_quality: 75 png_compress_level: 3 tif_compression: tiff_deflate [writer.opencv] format: png draw_transparent: False jpg_quality: 75 png_compress_level: 3 [writer.gif] fps: 25 loop: 0 palettesize: 256 subrectangles: False [color.manual_balance] colorspace: HSV balance_1: 0.0 balance_2: 0.0 balance_3: 0.0 contrast: 0.0 brightness: 0.0 [color.color_transfer] clip: True preserve_paper: True [color.match_hist] threshold: 99.0 --------- gui.ini --------- [global] fullscreen: False tab: extract options_panel_width: 30 console_panel_height: 20 icon_size: 14 font: default font_size: 9 autosave_last_session: prompt timeout: 120 auto_load_model_stats: True --------- train.ini --------- [global] centering: face coverage: 68.75 icnr_init: False conv_aware_init: False optimizer: adam learning_rate: 5e-05 epsilon_exponent: -7 reflect_padding: False allow_growth: False mixed_precision: False nan_protection: True convert_batchsize: 16 [global.loss] loss_function: ssim mask_loss_function: mse l2_reg_term: 100 eye_multiplier: 3 mouth_multiplier: 2 penalized_mask_loss: True mask_type: extended mask_blur_kernel: 3 mask_threshold: 4 learn_mask: False [model.realface] input_size: 64 output_size: 128 dense_nodes: 1536 complexity_encoder: 128 complexity_decoder: 512 [model.phaze_a] output_size: 128 shared_fc: none enable_gblock: True split_fc: True split_gblock: False split_decoders: False enc_architecture: fs_original enc_scaling: 40 enc_load_weights: True bottleneck_type: dense bottleneck_norm: none bottleneck_size: 1024 bottleneck_in_encoder: True fc_depth: 1 fc_min_filters: 1024 fc_max_filters: 1024 fc_dimensions: 4 fc_filter_slope: -0.5 fc_dropout: 0.0 fc_upsampler: upsample2d fc_upsamples: 1 fc_upsample_filters: 512 fc_gblock_depth: 3 fc_gblock_min_nodes: 512 fc_gblock_max_nodes: 512 fc_gblock_filter_slope: -0.5 fc_gblock_dropout: 0.0 dec_upscale_method: subpixel dec_norm: none dec_min_filters: 64 dec_max_filters: 512 dec_filter_slope: -0.45 dec_res_blocks: 1 dec_output_kernel: 5 dec_gaussian: True dec_skip_last_residual: True freeze_layers: keras_encoder load_layers: encoder fs_original_depth: 4 fs_original_min_filters: 128 fs_original_max_filters: 1024 mobilenet_width: 1.0 mobilenet_depth: 1 mobilenet_dropout: 0.001 [model.dlight] features: best details: good output_size: 256 [model.dfaker] output_size: 128 [model.dfl_sae] input_size: 128 clipnorm: True architecture: df autoencoder_dims: 0 encoder_dims: 42 decoder_dims: 21 multiscale_decoder: False [model.villain] lowmem: False [model.original] lowmem: False [model.dfl_h128] lowmem: False [model.unbalanced] input_size: 128 lowmem: False clipnorm: True nodes: 1024 complexity_encoder: 128 complexity_decoder_a: 384 complexity_decoder_b: 512 [trainer.original] preview_images: 14 zoom_amount: 5 rotation_range: 10 shift_range: 5 flip_chance: 50 color_lightness: 30 color_ab: 8 color_clahe_chance: 50 color_clahe_max_size: 4 --------- .faceswap --------- backend: nvidia --------- extract.ini --------- [global] allow_growth: False [detect.mtcnn] minsize: 20 scalefactor: 0.709 batch-size: 8 threshold_1: 0.6 threshold_2: 0.7 threshold_3: 0.7 [detect.cv2_dnn] confidence: 50 [detect.s3fd] confidence: 70 batch-size: 4 [align.fan] batch-size: 12 [mask.vgg_clear] batch-size: 6 [mask.unet_dfl] batch-size: 8 [mask.vgg_obstructed] batch-size: 2 [mask.bisenet_fp] batch-size: 8 include_ears: False include_hair: False include_glasses: True

i am out of my depth here, please advise.
Thanks you.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by torzdf »

See above

My word is final

User avatar
lamakaha
Posts: 4
Joined: Sun Jul 25, 2021 7:11 am

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by lamakaha »

Hi, thank you for a prompt reply,
I did enable Growth, did not help, have same issue, same error.

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
07/25/2021 13:07:26 INFO     Log level set to: INFO
07/25/2021 13:07:27 INFO     Model A Directory: '/media/lamakaha/work/projects/deepfake/06_Renders/SanneFace' (156 images)
07/25/2021 13:07:27 INFO     Model B Directory: '/media/lamakaha/work/projects/deepfake/06_Renders/AlexeyFace' (758 images)
07/25/2021 13:07:27 WARNING  At least one of your input folders contains fewer than 250 images. Results are likely to be poor.
07/25/2021 13:07:27 WARNING  You need to provide a significant number of images to successfully train a Neural Network. Aim for between 500 - 5000 images per side.
07/25/2021 13:07:27 INFO     Training data directory: /media/lamakaha/work/projects/deepfake/03_Data/model_v001
07/25/2021 13:07:27 INFO     ===================================================
07/25/2021 13:07:27 INFO       Starting
07/25/2021 13:07:27 INFO       Press 'Stop' to save and quit
07/25/2021 13:07:27 INFO     ===================================================
07/25/2021 13:07:28 INFO     Loading data, this may take a while...
07/25/2021 13:07:28 INFO     Loading Model from Original plugin...
07/25/2021 13:07:28 INFO     No existing state file found. Generating.
07/25/2021 13:07:30 INFO     Loading Trainer from Original plugin...
2021-07-25 13:07:38.632840: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 13:07:38.634598: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 13:07:38.636056: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 13:07:38.637628: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
07/25/2021 13:07:39 CRITICAL Error caught! Exiting...
07/25/2021 13:07:39 ERROR    Caught exception in thread: '_training_0'
07/25/2021 13:07:42 ERROR    Got Exception on main handler:
Traceback (most recent call last):
  File "/home/lamakaha/faceswap/lib/cli/launcher.py", line 182, in execute_script
    process.process()
  File "/home/lamakaha/faceswap/scripts/train.py", line 190, in process
    self._end_thread(thread, err)
  File "/home/lamakaha/faceswap/scripts/train.py", line 230, in _end_thread
    thread.join()
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lamakaha/faceswap/scripts/train.py", line 252, in _training
    raise err
  File "/home/lamakaha/faceswap/scripts/train.py", line 242, in _training
    self._run_training_cycle(model, trainer)
  File "/home/lamakaha/faceswap/scripts/train.py", line 327, in _run_training_cycle
    trainer.train_one_step(viewer, timelapse)
  File "/home/lamakaha/faceswap/plugins/train/trainer/_base.py", line 193, in train_one_step
    loss = self._model.model.train_on_batch(model_inputs, y=model_targets)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1348, in train_on_batch
    logs = train_function(iterator)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node original/encoder_1/conv_128_0_conv2d/Conv2D (defined at /faceswap/plugins/train/trainer/_base.py:193) ]] [Op:__inference_train_function_8730]

Attachments
crash_report.2021.07.25.133116708543.log
(66.16 KiB) Downloaded 261 times
Last edited by lamakaha on Sun Jul 25, 2021 11:32 am, edited 1 time in total.
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by torzdf »

The next thing to do is to uninstall the global Cuda 9.1 you have installed. It is incompatible.

Reboot and try again.

My word is final

User avatar
lamakaha
Posts: 4
Joined: Sun Jul 25, 2021 7:11 am

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by lamakaha »

Thanks, did following:

uninstalled CUDA using

Code: Select all

sudo apt-get remove nvidia-cuda-toolkit

checking if have some leftovers

Code: Select all

lamakaha@lamakaha-HOME:~$ locate cuda | grep /cuda$
/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/lamakaha/miniconda3/pkgs/tensorflow-base-2.2.0-gpu_py38h83e3d50_0/lib/python3.8/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/lamakaha/miniconda3/pkgs/tensorflow-base-2.2.0-gpu_py38h83e3d50_0/lib/python3.8/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/lamakaha/miniconda3/pkgs/tensorflow-base-2.2.0-gpu_py38h83e3d50_0/lib/python3.8/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/usr/share/blender/scripts/addons/cycles/source/kernel/kernels/cuda

if i am not mistaken i have cuda 10.2

Code: Select all

lamakaha@lamakaha-HOME:~$ nvidia-smi
Sun Jul 25 13:43:40 2021       
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A | | 32% 54C P0 41W / 160W | 5910MiB / 5933MiB | 6% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1266 G /usr/lib/xorg/Xorg 331MiB | | 0 2123 G cinnamon 86MiB | | 0 29814 G ...AAgAAAAAAAAACAAAAAAAAAA= --shared-files 114MiB | | 0 30310 C - 5363MiB | +-----------------------------------------------------------------------------+

console print

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
07/25/2021 13:37:51 INFO     Log level set to: VERBOSE
07/25/2021 13:37:52 INFO     Model A Directory: '/media/lamakaha/work/projects/deepfake/06_Renders/SanneFace' (156 images)
07/25/2021 13:37:52 INFO     Model B Directory: '/media/lamakaha/work/projects/deepfake/06_Renders/AlexeyFace' (758 images)
07/25/2021 13:37:52 WARNING  At least one of your input folders contains fewer than 250 images. Results are likely to be poor.
07/25/2021 13:37:52 WARNING  You need to provide a significant number of images to successfully train a Neural Network. Aim for between 500 - 5000 images per side.
07/25/2021 13:37:52 INFO     Training data directory: /media/lamakaha/work/projects/deepfake/03_Data/model_v001
07/25/2021 13:37:52 INFO     ===================================================
07/25/2021 13:37:52 INFO       Starting
07/25/2021 13:37:52 INFO       Press 'Stop' to save and quit
07/25/2021 13:37:52 INFO     ===================================================
07/25/2021 13:37:53 INFO     Loading data, this may take a while...
07/25/2021 13:37:53 INFO     Loading Model from Original plugin...
07/25/2021 13:37:53 VERBOSE  Loading config: '/home/lamakaha/faceswap/config/train.ini'
07/25/2021 13:37:53 VERBOSE  Loading config: '/home/lamakaha/faceswap/config/train.ini'
07/25/2021 13:37:53 INFO     No existing state file found. Generating.
2021-07-25 13:37:53.997644: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-07-25 13:37:54.024059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 13:37:54.024653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.68GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-07-25 13:37:54.024882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-07-25 13:37:54.026989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-07-25 13:37:54.028399: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-07-25 13:37:54.028666: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-07-25 13:37:54.030886: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-07-25 13:37:54.032780: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-07-25 13:37:54.038215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-07-25 13:37:54.038373: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 13:37:54.038783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-07-25 13:37:54.039071: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2021-07-25 13:37:54.069915: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3201885000 Hz
2021-07-25 13:37:54.070756: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fef0476e0e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-25 13:37:54.070807: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-25 13:37:54.071093: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 13:37:54.071764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.68GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-07-25 13:37:54.071838: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-07-25 13:37:54.071874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-07-25 13:37:54.071908: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-07-25 13:37:54.071942: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-07-25 13:37:54.071975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-07-25 13:37:54.072010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-07-25 13:37:54.072044: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-07-25 13:37:54.072136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 13:37:54.072638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-07-25 13:37:54.072694: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-07-25 13:37:54.187843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-25 13:37:54.187876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2021-07-25 13:37:54.187885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2021-07-25 13:37:54.188065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 13:37:54.188515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 13:37:54.188978: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-25 13:37:54.189960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5023 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-07-25 13:37:54.192277: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fef04f952c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-25 13:37:54.192312: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
07/25/2021 13:37:55 VERBOSE  Using Adam optimizer
07/25/2021 13:37:55 VERBOSE  Model: "original"
07/25/2021 13:37:55 VERBOSE  __________________________________________________________________________________________________
07/25/2021 13:37:55 VERBOSE  Layer (type)                    Output Shape         Param #     Connected to
07/25/2021 13:37:55 VERBOSE  ==================================================================================================
07/25/2021 13:37:55 VERBOSE  face_in_a (InputLayer)          [(None, 64, 64, 3)]  0
07/25/2021 13:37:55 VERBOSE  __________________________________________________________________________________________________
07/25/2021 13:37:55 VERBOSE  face_in_b (InputLayer)          [(None, 64, 64, 3)]  0
07/25/2021 13:37:55 VERBOSE  __________________________________________________________________________________________________
07/25/2021 13:37:55 VERBOSE  encoder (Model)                 (None, 8, 8, 512)    69662976    face_in_a[0][0]
07/25/2021 13:37:55 VERBOSE                                                                   face_in_b[0][0]
07/25/2021 13:37:55 VERBOSE  __________________________________________________________________________________________________
07/25/2021 13:37:55 VERBOSE  decoder_a (Model)               (None, 64, 64, 3)    6199747     encoder[1][0]
07/25/2021 13:37:55 VERBOSE  __________________________________________________________________________________________________
07/25/2021 13:37:55 VERBOSE  decoder_b (Model)               (None, 64, 64, 3)    6199747     encoder[2][0]
07/25/2021 13:37:55 VERBOSE  ==================================================================================================
07/25/2021 13:37:55 VERBOSE  Total params: 82,062,470
07/25/2021 13:37:55 VERBOSE  Trainable params: 82,062,470
07/25/2021 13:37:55 VERBOSE  Non-trainable params: 0
07/25/2021 13:37:55 VERBOSE  __________________________________________________________________________________________________
07/25/2021 13:37:55 VERBOSE  Model: "encoder"
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  Layer (type)                 Output Shape              Param #
07/25/2021 13:37:55 VERBOSE  =================================================================
07/25/2021 13:37:55 VERBOSE  input_1 (InputLayer)         [(None, 64, 64, 3)]       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_128_0_conv2d (Conv2D)   (None, 32, 32, 128)       9728
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_128_0_leakyrelu (LeakyR (None, 32, 32, 128)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_256_0_conv2d (Conv2D)   (None, 16, 16, 256)       819456
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_256_0_leakyrelu (LeakyR (None, 16, 16, 256)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_512_0_conv2d (Conv2D)   (None, 8, 8, 512)         3277312
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_512_0_leakyrelu (LeakyR (None, 8, 8, 512)         0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_1024_0_conv2d (Conv2D)  (None, 4, 4, 1024)        13108224
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  conv_1024_0_leakyrelu (Leaky (None, 4, 4, 1024)        0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  flatten (Flatten)            (None, 16384)             0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  dense (Dense)                (None, 1024)              16778240
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  dense_1 (Dense)              (None, 16384)             16793600
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  reshape (Reshape)            (None, 4, 4, 1024)        0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_512_0_conv2d_conv2d  (None, 4, 4, 2048)        18876416
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_512_0_conv2d_leakyre (None, 4, 4, 2048)        0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_512_0_pixelshuffler  (None, 8, 8, 512)         0
07/25/2021 13:37:55 VERBOSE  =================================================================
07/25/2021 13:37:55 VERBOSE  Total params: 69,662,976
07/25/2021 13:37:55 VERBOSE  Trainable params: 69,662,976
07/25/2021 13:37:55 VERBOSE  Non-trainable params: 0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  Model: "decoder_a"
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  Layer (type)                 Output Shape              Param #
07/25/2021 13:37:55 VERBOSE  =================================================================
07/25/2021 13:37:55 VERBOSE  input_2 (InputLayer)         [(None, 8, 8, 512)]       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_256_0_conv2d_conv2d  (None, 8, 8, 1024)        4719616
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_256_0_conv2d_leakyre (None, 8, 8, 1024)        0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_256_0_pixelshuffler  (None, 16, 16, 256)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_128_0_conv2d_conv2d  (None, 16, 16, 512)       1180160
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_128_0_conv2d_leakyre (None, 16, 16, 512)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_128_0_pixelshuffler  (None, 32, 32, 128)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_64_0_conv2d_conv2d ( (None, 32, 32, 256)       295168
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_64_0_conv2d_leakyrel (None, 32, 32, 256)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_64_0_pixelshuffler ( (None, 64, 64, 64)        0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  face_out_a_conv2d (Conv2D)   (None, 64, 64, 3)         4803
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  face_out_a (Activation)      (None, 64, 64, 3)         0
07/25/2021 13:37:55 VERBOSE  =================================================================
07/25/2021 13:37:55 VERBOSE  Total params: 6,199,747
07/25/2021 13:37:55 VERBOSE  Trainable params: 6,199,747
07/25/2021 13:37:55 VERBOSE  Non-trainable params: 0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  Model: "decoder_b"
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  Layer (type)                 Output Shape              Param #
07/25/2021 13:37:55 VERBOSE  =================================================================
07/25/2021 13:37:55 VERBOSE  input_3 (InputLayer)         [(None, 8, 8, 512)]       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_256_1_conv2d_conv2d  (None, 8, 8, 1024)        4719616
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_256_1_conv2d_leakyre (None, 8, 8, 1024)        0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_256_1_pixelshuffler  (None, 16, 16, 256)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_128_1_conv2d_conv2d  (None, 16, 16, 512)       1180160
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_128_1_conv2d_leakyre (None, 16, 16, 512)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_128_1_pixelshuffler  (None, 32, 32, 128)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_64_1_conv2d_conv2d ( (None, 32, 32, 256)       295168
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_64_1_conv2d_leakyrel (None, 32, 32, 256)       0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  upscale_64_1_pixelshuffler ( (None, 64, 64, 64)        0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  face_out_b_conv2d (Conv2D)   (None, 64, 64, 3)         4803
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 VERBOSE  face_out_b (Activation)      (None, 64, 64, 3)         0
07/25/2021 13:37:55 VERBOSE  =================================================================
07/25/2021 13:37:55 VERBOSE  Total params: 6,199,747
07/25/2021 13:37:55 VERBOSE  Trainable params: 6,199,747
07/25/2021 13:37:55 VERBOSE  Non-trainable params: 0
07/25/2021 13:37:55 VERBOSE  _________________________________________________________________
07/25/2021 13:37:55 INFO     Loading Trainer from Original plugin...
07/25/2021 13:37:55 VERBOSE  Loading config: '/home/lamakaha/faceswap/config/train.ini'
07/25/2021 13:37:55 VERBOSE  Enabled TensorBoard Logging
2021-07-25 13:38:03.000349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-07-25 13:38:03.286694: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-07-25 13:38:03.740044: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 13:38:03.767359: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 13:38:03.789992: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-07-25 13:38:03.813004: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
07/25/2021 13:38:03 CRITICAL Error caught! Exiting...
07/25/2021 13:38:03 ERROR    Caught exception in thread: '_training_0'
07/25/2021 13:38:06 ERROR    Got Exception on main handler:
Traceback (most recent call last):
  File "/home/lamakaha/faceswap/lib/cli/launcher.py", line 182, in execute_script
    process.process()
  File "/home/lamakaha/faceswap/scripts/train.py", line 190, in process
    self._end_thread(thread, err)
  File "/home/lamakaha/faceswap/scripts/train.py", line 230, in _end_thread
    thread.join()
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/lamakaha/faceswap/lib/multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lamakaha/faceswap/scripts/train.py", line 252, in _training
    raise err
  File "/home/lamakaha/faceswap/scripts/train.py", line 242, in _training
    self._run_training_cycle(model, trainer)
  File "/home/lamakaha/faceswap/scripts/train.py", line 327, in _run_training_cycle
    trainer.train_one_step(viewer, timelapse)
  File "/home/lamakaha/faceswap/plugins/train/trainer/_base.py", line 193, in train_one_step
    loss = self._model.model.train_on_batch(model_inputs, y=model_targets)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1348, in train_on_batch
    logs = train_function(iterator)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/lamakaha/miniconda3/envs/faceswap/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node original/encoder_1/conv_128_0_conv2d/Conv2D (defined at /faceswap/plugins/train/trainer/_base.py:193) ]] [Op:__inference_train_function_8730]

Function call stack:
train_function

07/25/2021 13:38:06 CRITICAL An unexpected crash has occurred. Crash report written to '/home/lamakaha/faceswap/crash_report.2021.07.25.133803951583.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.

new crash error is attached

Attachments
crash_report.2021.07.25.133803951583.log
(65.94 KiB) Downloaded 230 times
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by torzdf »

You still have Cuda installed:

gpu_cuda: 9.1

The version listed by nvidia-smi is the maximum version supported by your driver.

The version that should be being used is the conda package cudatoolkit 10.1.243 h6bb024c_0

Having a system install + a conda install can conflict.

Work on getting 9.1 fully removed from your system.

My word is final

User avatar
lamakaha
Posts: 4
Joined: Sun Jul 25, 2021 7:11 am

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by lamakaha »

completely crashed my computer by uninstalling nvidia ,
runned in circles installing nvidia/cuda,

managed to get 10.1 installed using this guide
https://malukas.lt/blog/cuda-10-1-anaco ... mint-19-3/

still was getting same error....

entered train.ini located in /home/lamakaha/faceswap/config
and set
allow_growth = True

doing this in UI did not help, maybe i did in wrong place.

Anyway - it looks like the system is up and running, thank you very much for your help!!!
now fun begins!!!

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: crash report while training: Failed to get convolution algorithm. This is probably because cuDNN failed to initializ

Post by torzdf »

Glad to hear it.

You shouldn't have any global Cuda installed though. Whilst it may work now, it will probably break again in future if/when we upgrade.

We install Cuda locally in the Faceswap environment. Any global install may conflict.

My word is final

Locked