Hi,
i want to start a new Dlight try but it crash when i use 2 GPU's... (2x2080 RTX)
One GPU works...Some Ideas?
Code: Select all
2020-01-24 23:16:13.477575: E tensorflow/stream_executor/cuda/cuda_dnn.cc:82] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(3765): 'cudnnPoolingForward( cudnn.handle(), pooling_desc.handle(), &alpha, src_desc.handle(), input_data.opaque(), &beta, dest_desc.handle(), output_data->opaque())'
01/24/2020 23:16:14 CRITICAL Error caught! Exiting...
01/24/2020 23:16:14 ERROR Caught exception in thread: '_training_0'
Could not parse requirement: -umpy
Could not parse requirement: -pencv-python
01/24/2020 23:16:15 ERROR Got Exception on main handler:
Traceback (most recent call last):
File "C:\Users\denni\faceswap\lib\cli.py", line 128, in execute_script
process.process()
File "C:\Users\denni\faceswap\scripts\train.py", line 159, in process
self._end_thread(thread, err)
File "C:\Users\denni\faceswap\scripts\train.py", line 199, in _end_thread
thread.join()
File "C:\Users\denni\faceswap\lib\multithreading.py", line 121, in join
raise thread.err[1].with_traceback(thread.err[2])
File "C:\Users\denni\faceswap\lib\multithreading.py", line 37, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\denni\faceswap\scripts\train.py", line 224, in _training
raise err
File "C:\Users\denni\faceswap\scripts\train.py", line 214, in _training
self._run_training_cycle(model, trainer)
File "C:\Users\denni\faceswap\scripts\train.py", line 303, in _run_training_cycle
trainer.train_one_step(viewer, timelapse)
File "C:\Users\denni\faceswap\plugins\train\trainer\_base.py", line 316, in train_one_step
raise err
File "C:\Users\denni\faceswap\plugins\train\trainer\_base.py", line 283, in train_one_step
loss[side] = batcher.train_one_batch()
File "C:\Users\denni\faceswap\plugins\train\trainer\_base.py", line 424, in train_one_batch
loss = self._model.predictors[self._side].train_on_batch(model_inputs, model_targets)
File "C:\Users\denni\MiniConda3\envs\faceswap\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "C:\Users\denni\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "C:\Users\denni\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "C:\Users\denni\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
run_metadata_ptr)
File "C:\Users\denni\MiniConda3\envs\faceswap\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed
[[{{node replica_1/model_1/encoder/average_pooling2d_1/AvgPool}} = AvgPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:1"](training/Adam/gradients/replica_1/model_1/encoder/conv_128_0_conv2d/convolution_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer)]]
[[{{node loss/mul/_1041}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3971_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]