Dual p106-6gb

Post by **abigflea** » Mon Mar 23, 2020 5:15 pm

I have been running slowly but fine on a single P106-6gb Mining card.
Bought a second identical card, but causes a hard windows lock as soon as training begins, no logs.
Loads training images, then "Enabled TensorBoard logging" then after a few seconds, freeze.

I've used them separately, and both work fine with Faceswap.
I've gone thru various iterations of changing drivers and hardware setups to no avail.
I'm wondering why if course.

My main card is a Rx570 , cards 2 & 3 are the mining cards.

suggestions?

Post by **deephomage** » Mon Mar 23, 2020 8:40 pm

Are you sure your motherboard can support three graphics cards? Hard lockups usually indicate an IRQ/hardware conflict. If you're not still mining, the simplest solution is to sell the mining cards and use the proceeds to buy an Nvidia 10x or 20x series card with 6 Gb VRAM or more. Hardware advice is here: viewtopic.php?f=16&t=10.

Post by **abigflea** » Mon Mar 23, 2020 9:23 pm

Motherboard checks out. I had 2x RX570 and a 1050Ti before when I was playing around with mining last year.
Even tried the procedure of modifying the driver so they just looked like 2x 1060 - 6gb to the OS, which works, allowing directx. Thinking about trying with Linux..

Post by **bryanlyon** » Mon Mar 23, 2020 9:30 pm

Chances are this is due to a slow response by your PSU to the sudden demand of 2 GPUs. This is probably creating a "brown out" type situation where due to a temporary voltage drop your system hangs. Faceswap (like most GPU compute tasks) can cause a sudden spike in voltage draw when first starting.

You might want to try limiting the GPU power during the starting of the train and then raise the power limit after it starts. Another option might be to load the GPUs with something like Unigine Heaven before you turn on the compute and then stop the benchmark. Both of these options try to reduce the sudden spike of power draw, but the real solution would be to ensure that your GPU can handle spikes of demand fast enough. (Bigger power supplies actually take longer to respond to spikes than smaller ones, so don't go saying "But my PSU is 1200w" or something lke that.)

Hacking the drivers or running Linux wont solve this problem as it's almost definitely a hardware issue.

Post by **abigflea** » Mon Mar 23, 2020 9:45 pm

1000 Watt I've reverted all the drivers.
Ill give it a shot with the benchmark Idea. Thank you.

Post by **abigflea** » Tue Mar 24, 2020 2:52 am

Because the cards have no display I used "geekbench" to benchmark the GPU for CUDA.
Runs fine on the second card while the first card is running faceswap. (Also gets HOT).
Even overclocked both about 2% to see if I could cause a failure.
I can clearly run both.

Is their a way to manually configure which GPU's are in use?
for instance a individual has 3x 1060,
Tells Faceswap to use 2 of them.
How does/ where does the Faceswap package determine which ones to use or ignore when you select 'use 2 gpu'?
Just trying to narrow the list.

Post by **bryanlyon** » Tue Mar 24, 2020 4:08 am

When I say use the benchmark, I just mean use it for a short time while you start the training process, but you should still use both for Faceswap. It's just there to "clear the hurdle" of the demand spike. You'll probably have an easier time of lowering the power needs.

Post by **abigflea** » Tue Mar 24, 2020 6:05 am

I did reduce the power needs. used the benchmark to ramp things up first. Pretty sure there is no 'brownout' hurtle. Everything is closed so I have had time today to fool around.

I did notice the GPU are numbered:
#0 p106
#1 Rx570
#2 p106

Now I am running FaceSwap on GPU 0, folding@home on GPU 1 & 2, no problems.

I'm sure ill have time in the next couple days and try a shuffling of cards.
Will see if that will work w/o making Win10 very angry.

Post by **abigflea** » Tue Mar 24, 2020 11:05 pm

OK. did some rearranging, received a log and error!

Both P106-90 show up in device manager. Working correctly.

Part of Crash log

Code: Select all

03/24/2020 18:59:57 MainProcess     _training_0     _base           name                      DEBUG    model name: 'dfl_sae'
03/24/2020 18:59:57 MainProcess     _training_0     _base           add_network               DEBUG    name: 'decoder_b', filename: 'dfl_sae_decoder_B.h5'
03/24/2020 18:59:57 MainProcess     _training_0     _base           __init__                  DEBUG    Initializing NNMeta: (filename: 'T:\Goodnight-Alice2\dfl_sae_decoder_B.h5', network_type: 'decoder', side: 'b', network: <keras.engine.training.Model object at 0x000001E1DCAD00C8>, is_output: True
03/24/2020 18:59:57 MainProcess     _training_0     _base           __init__                  DEBUG    Initialized NNMeta
03/24/2020 18:59:57 MainProcess     _training_0     dfl_sae         add_networks              DEBUG    Added networks
03/24/2020 18:59:57 MainProcess     _training_0     _base           load_models               DEBUG    Load model: (swapped: False)
03/24/2020 18:59:57 MainProcess     _training_0     _base           models_exist              DEBUG    Pre-existing models exist: False
03/24/2020 18:59:57 MainProcess     _training_0     _base           name                      DEBUG    model name: 'dfl_sae'
03/24/2020 18:59:57 MainProcess     _training_0     _base           load_models               INFO     Creating new 'dfl_sae' model in folder: 'T:\Goodnight-Alice2'
03/24/2020 18:59:57 MainProcess     _training_0     _base           get_inputs                DEBUG    Getting inputs
03/24/2020 18:59:57 MainProcess     _training_0     _base           get_inputs                DEBUG    Got inputs: [<tf.Tensor 'face_in:0' shape=(?, 128, 128, 3) dtype=float32>]
03/24/2020 18:59:57 MainProcess     _training_0     dfl_sae         build_autoencoders        DEBUG    Initializing model
03/24/2020 18:59:57 MainProcess     _training_0     dfl_sae         build_df_autoencoder      DEBUG    Adding Autoencoder. Side: a
03/24/2020 18:59:57 MainProcess     _training_0     _base           add_predictor             DEBUG    Adding predictor: (side: 'a', model: <keras.engine.training.Model object at 0x000001E1DCAD0B08>)
03/24/2020 18:59:57 MainProcess     _training_0     _base           add_predictor             DEBUG    Converting to multi-gpu: side a
03/24/2020 18:59:57 MainProcess     _training_0     multithreading  run                       DEBUG    Error in thread (_training_0): To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.
03/24/2020 18:59:58 MainProcess     MainThread      train           _monitor                  DEBUG    Thread error detected
03/24/2020 18:59:58 MainProcess     MainThread      train           _monitor                  DEBUG    Closed Monitor
03/24/2020 18:59:58 MainProcess     MainThread      train           _end_thread               DEBUG    Ending Training thread
03/24/2020 18:59:58 MainProcess     MainThread      train           _end_thread               CRITICAL Error caught! Exiting...
03/24/2020 18:59:58 MainProcess     MainThread      multithreading  join                      DEBUG    Joining Threads: '_training'
03/24/2020 18:59:58 MainProcess     MainThread      multithreading  join                      DEBUG    Joining Thread: '_training_0'
03/24/2020 18:59:58 MainProcess     MainThread      multithreading  join                      ERROR    Caught exception in thread: '_training_0'
03/24/2020 18:59:58 MainProcess     MainThread      cli             execute_script            ERROR    To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.
Traceback (most recent call last):
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 248, in build
    self.build_autoencoders(inputs)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 70, in build_autoencoders
    getattr(self, "build_{}_autoencoder".format(self.architecture))(inputs)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 94, in build_df_autoencoder
    self.add_predictor(side, autoencoder)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 326, in add_predictor
    model = multi_gpu_model(model, self.gpus)
  File "C:\Users\AbigFlea\MiniConda3\envs\faceswap\lib\site-packages\keras\utils\multi_gpu_utils.py", line 181, in multi_gpu_model
    available_devices))
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "s:\Users\AbigFlea\faceswap\lib\cli.py", line 128, in execute_script
    process.process()
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 159, in process
    self._end_thread(thread, err)
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 199, in _end_thread
    thread.join()
  File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 224, in _training
    raise err
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 212, in _training
    model = self._load_model()
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 253, in _load_model
    predict=False)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 23, in __init__
    super().__init__(*args, **kwargs)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 126, in __init__
    self.build()
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 257, in build
    raise FaceswapError(str(err)) from err
lib.utils.FaceswapError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.

Post by **deephomage** » Wed Mar 25, 2020 12:43 am

The entire crash log is needed for troubleshooting. Please post it [Editted]Here, or[/edit] in the appropriate discord channel.

Post by **abigflea** » Wed Mar 25, 2020 1:31 am

03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:13 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:14 MainProcess 03/24/2020 19:12:15 MainProcess 03/24/2020 19:12:15 MainProcess 03/24/2020 19:12:15 MainProcess 03/24/2020 19:12:15 MainProcess 03/24/2020 19:12:15 MainProcess 03/24/2020 19:12:15 MainProcess 03/24/2020 19:12:15 MainProcess 03/24/2020 19:12:15 MainProcess Traceback (most recent call last):
File "s:\Users\AbigFlea\faces self.build_autoencoders(inputs)
File "s:\Users\AbigFlea\faces getattr(self, "build{}< File "s:\Users\AbigFlea\faces self.add_predictor(side, File "s:\Users\AbigFlea\faces model = multi_gpu_model(model, File "C:\Users\AbigFlea\MiniC available_devices))
ValueError: To call multi _training_0 _base name DEBUG model name: 'dfl_sae' _training_0 _base add_network DEBUG name: 'decoder_a', filename: 'dfl_sae_decoder_A.h5' _training_0 _base init DEBUG Initializing NNMeta: (filename: 'T:\Goodnight-Alice2\dfl_sae_decoder_A.h5', network_type: 'decoder', side: 'a', network: <keras.engine.training.Model object at 0x0000025AE5B63308>, is_output: True _training_0 _base init DEBUG Initialized NNMeta _training_0 nn_blocks upscale DEBUG inp: Tensor("input_3:0", shape=(?, 16, 16, 512), dtype=float32), filters: 504, kernel_size: 3, use_instance_norm: False, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: upscale_16_1 _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5B631C8> _training_0 nn_blocks conv2d DEBUG inp: Tensor("input_3:0", shape=(?, 16, 16, 512), dtype=float32), filters: 2016, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'upscale_16_1_conv2d', 'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5B631C8>}) _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5B631C8> _training_0 nn_blocks res_block DEBUG inp: Tensor("upscale_16_1_pixelshuffler/Reshape_1:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: residual_32_2 _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_2_leakyrelu_0/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_32_2_conv2d_0'}) _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5B9A408> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_2_leakyrelu_1/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8>}) _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_32_2 _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8> to None _training_0 nn_blocks res_block DEBUG inp: Tensor("residual_32_2_leakyrelu_3/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: residual_32_3 _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_3_leakyrelu_0/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_32_3_conv2d_0'}) _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5BB1E48> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_3_leakyrelu_1/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48>}) _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_32_3 _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48> to None _training_0 nn_blocks upscale DEBUG inp: Tensor("residual_32_3_leakyrelu_3/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 252, kernel_size: 3, use_instance_norm: False, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: upscale_32_1 _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5BD4108> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_3_leakyrelu_3/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 1008, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'upscale_32_1_conv2d', 'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BD4108>}) _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BD4108> _training_0 nn_blocks res_block DEBUG inp: Tensor("upscale_32_1_pixelshuffler/Reshape_1:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: residual_64_2 _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_2_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_2_conv2d_0'}) _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5BE0DC8> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_2_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8>}) _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_64_2 _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8> to None _training_0 nn_blocks res_block DEBUG inp: Tensor("residual_64_2_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: residual_64_3 _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_3_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_3_conv2d_0'}) _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C01C48> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5C06108> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_3_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C06108>}) _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_64_3 _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C06108> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5C06108> to None _training_0 nn_blocks upscale DEBUG inp: Tensor("residual_64_3_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 126, kernel_size: 3, use_instance_norm: False, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: upscale_64_1 _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C0DF88> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_3_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'upscale_64_1_conv2d', 'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C0DF88>}) _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C0DF88> _training_0 nn_blocks res_block DEBUG inp: Tensor("upscale_64_1_pixelshuffler/Reshape_1:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: residual_128_2 _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_2_leakyrelu_0/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_128_2_conv2d_0'}) _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C3CAC8> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_2_leakyrelu_1/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448>}) _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_128_2 _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448> to None _training_0 nn_blocks res_block DEBUG inp: Tensor("residual_128_2_leakyrelu_3/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, kwargs: {}) _training_0 nn_blocks get_name DEBUG Generating block name: residual_128_3 _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_3_leakyrelu_0/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_128_3_conv2d_0'}) _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C43888> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88> _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_3_leakyrelu_1/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88>}) _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_128_3 _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88> _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88> to None _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_3_leakyrelu_3/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 3, kernel_size: 5, strides: (1, 1), padding: same, kwargs: {'activation': 'sigmoid', 'name': 'face_out_128'}) _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C62DC8> _training_0 _base add_network DEBUG network_type: 'decoder', side: 'b', network: '<keras.engine.training.Model object at 0x0000025AE5C810C8>', is_output: True _training_0 _base name DEBUG model name: 'dfl_sae' _training_0 _base add_network DEBUG name: 'decoder_b', filename: 'dfl_sae_decoder_B.h5' _training_0 _base init DEBUG Initializing NNMeta: (filename: 'T:\Goodnight-Alice2\dfl_sae_decoder_B.h5', network_type: 'decoder', side: 'b', network: <keras.engine.training.Model object at 0x0000025AE5C810C8>, is_output: True _training_0 _base init DEBUG Initialized NNMeta _training_0 dfl_sae add_networks DEBUG Added networks _training_0 _base load_models DEBUG Load model: (swapped: False) _training_0 _base models_exist DEBUG Pre-existing models exist: False _training_0 _base name DEBUG model name: 'dfl_sae' _training_0 _base load_models INFO Creating new 'dfl_sae' model in folder: 'T:\Goodnight-Alice2' _training_0 _base get_inputs DEBUG Getting inputs _training_0 _base get_inputs DEBUG Got inputs: [<tf.Tensor 'face_in:0' shape=(?, 128, 128, 3) dtype=float32>] _training_0 dfl_sae build_autoencoders DEBUG Initializing model _training_0 dfl_sae build_df_autoencoder DEBUG Adding Autoencoder. Side: a _training_0 _base add_predictor DEBUG Adding predictor: (side: 'a', model: <keras.engine.training.Model object at 0x0000025AE5C81B88>) _training_0 _base add_predictor DEBUG Converting to multi-gpu: side a training_0 multithreading run DEBUG Error in thread (training_0): To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing gpus. MainThread train _monitor DEBUG Thread error detected MainThread train _monitor DEBUG Closed Monitor MainThread train _end_thread DEBUG Ending Training thread MainThread train end_thread CRITICAL Error caught! Exiting... MainThread multithreading join DEBUG Joining Threads: 'training' MainThread multithreading join DEBUG Joining Thread: 'training_0' MainThread multithreading join ERROR Caught exception in thread: 'training_0' MainThread cli execute_script ERROR To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing gpus. wap\plugins\train\model\_base.py", line 248, in build wap\plugins\train\model\dfl_sae.py", line 70, in build_autoencoders /em>autoencoder".format(self.architecture))(inputs) wap\plugins\train\model\dfl_sae.py", line 94, in build_df_autoencoder autoencoder) wap\plugins\train\model\_base.py", line 326, in add_predictor self.gpus) onda3\envs\faceswap\lib\site-packages\keras\utils\multi_gpu_utils.py", line 181, in multi_gpu_model _gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing gpus.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "s:\Users\AbigFlea\faceswap\lib\cli.py", line 128, in execute_script
process.process()
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 159, in process
self._end_thread(thread, err)
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 199, in end_thread
thread.join()
File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 121, in join
raise thread.err[1].with_traceback(thread.err[2])
File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 37, in run
self.target(self.args, **self.kwargs)
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 224, in _training
raise err
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 212, in training
model = self.load_model()
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 253, in _load_model
predict=False)
File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 23, in init
super().init(args, **kwargs)
File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 126, in init
self.build()
File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 257, in build
raise FaceswapError(str(err)) from err
lib.utils.FaceswapError: To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing gpus.

============ System Information ============
encoding: cp1252
git_branch: master
git_commits: 4153a7e Tools Restructure (#990)
gpu_cuda: 10.2
gpu_cudnn: No global version found. Check Conda packages for Conda cuDNN
gpu_devices: GPU_0: P106-090, GPU_1: P106-090
gpu_devices_active: GPU_0, GPU_1
gpu_driver: 441.22
gpu_vram: GPU_0: 6077MB, GPU_1: 6077MB
os_machine: AMD64
os_platform: Windows-10-10.0.18362-SP0
os_release: 10
py_command: s:\Users\AbigFlea\faceswap\faceswap.py train -A S:/Extracted/Alice/Extract -ala S:/Extracted/Alice/origional/alignments.fsa -B S:/Extracted/GoodNight/Extracted -alb S:/Extracted/GoodNight/Origional/alignments.fsa -m T:/Goodnight-Alice2 -t dfl-sae -bs 10 -it 1000000 -g 2 -s 100 -ss 25000 -ps 50 -ag -L INFO -gui
py_conda_version: conda 4.8.3
py_implementation: CPython
py_version: 3.7.7
py_virtual_env: True
sys_cores: 8
sys_processor: AMD64 Family 21 Model 2 Stepping 0, AuthenticAMD
sys_ram: Total: 16341MB, Available: 9730MB, Used: 6611MB, Free: 9730MB

=============== Pip Packages ===============
absl-py==0.9.0
asn1crypto==1.3.0
astor==0.8.0
blinker==1.4
cachetools==3.1.1
certifi==2019.11.28
cffi==1.14.0
chardet==3.0.4
click==7.1.1
cloudpickle==1.3.0
cryptography==2.8
cycler==0.10.0
cytoolz==0.10.1
dask==2.12.0
decorator==4.4.2
fastcluster==1.1.26
ffmpy==0.2.2
gast==0.2.2
google-auth==1.11.2
google-auth-oauthlib==0.4.1
google-pasta==0.1.8
grpcio==1.27.2
h5py==2.9.0
idna==2.9
imageio==2.6.1
imageio-ffmpeg==0.4.1
joblib==0.14.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==3.1.3
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
networkx==2.4
numpy==1.17.4
nvidia-ml-py3==7.352.1
oauthlib==3.1.0
olefile==0.46
opencv-python==4.1.2.30
opt-einsum==3.1.0
pathlib==1.0.1
Pillow==6.2.1
protobuf==3.11.4
psutil==5.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.7
pycparser==2.20
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyreadline==2.1
PySocks==1.7.1
python-dateutil==2.8.1
pytz==2019.3
PyWavelets==1.1.1
pywin32==227
PyYAML==5.3.1
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
scikit-image==0.16.2
scikit-learn==0.22.1
scipy==1.4.1
six==1.14.0
tensorboard==2.1.0
tensorflow==1.15.0
tensorflow-estimator==1.15.1
termcolor==1.1.0
toolz==0.10.0
toposort==1.5
tornado==6.0.4
tqdm==4.43.0
urllib3==1.25.8
Werkzeug==0.16.1
win-inet-pton==1.1.0
wincertstore==0.2
wrapt==1.12.1

============== Conda Packages ==============

packages in environment at C:\Users\AbigFlea\MiniConda3\envs\faceswap:

#

Name Version Build Channel

_tflow_select 2.1.0 gpu
absl-py 0.9.0 py37_0
asn1crypto 1.3.0 py37_0
astor 0.8.0 py37_0
blas 1.0 mkl
blinker 1.4 py37_0
ca-certificates 2020.1.1 0
cachetools 3.1.1 py_0
certifi 2019.11.28 py37_1
cffi 1.14.0 py37h7a1dbc1_0
chardet 3.0.4 py37_1003
click 7.1.1 py_0
cloudpickle 1.3.0 py_0
cryptography 2.8 py37h7a1dbc1_0
cudatoolkit 10.0.130 0
cudnn 7.6.5 cuda10.0_0
cycler 0.10.0 py37_0
cytoolz 0.10.1 py37he774522_0
dask-core 2.12.0 py_0
decorator 4.4.2 py_0
fastcluster 1.1.26 py37he350917_0 conda-forge
ffmpeg 4.2 h6538335_0 conda-forge
ffmpy 0.2.2 pypi_0 pypi
freetype 2.9.1 ha9979f8_1
gast 0.2.2 py37_0
git 2.23.0 h6bb4b03_0
google-auth 1.11.2 py_0
google-auth-oauthlib 0.4.1 py_2
google-pasta 0.1.8 py_0
grpcio 1.27.2 py37h351948d_0
h5py 2.9.0 py37h5e291fa_0
hdf5 1.10.4 h7ebc959_0
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha66f8fd_1
idna 2.9 py_1
imageio 2.6.1 py37_0
imageio-ffmpeg 0.4.1 py_0 conda-forge
intel-openmp 2020.0 166
joblib 0.14.1 py_0
jpeg 9b hb83a4c4_2
keras 2.2.4 0
keras-applications 1.0.8 py_0
keras-base 2.2.4 py37_0
keras-preprocessing 1.1.0 py_1
kiwisolver 1.1.0 py37ha925a31_0
libpng 1.6.37 h2a8f88b_0
libprotobuf 3.11.4 h7bd577a_0
libtiff 4.1.0 h56a325e_0
markdown 3.1.1 py37_0
matplotlib 3.1.1 py37hc8f65d3_0
matplotlib-base 3.1.3 py37h64f37c6_0
mkl 2020.0 166
mkl-service 2.3.0 py37hb782905_0
mkl_fft 1.0.15 py37h14836fe_0
mkl_random 1.1.0 py37h675688f_0
networkx 2.4 py_0
numpy 1.17.4 py37h4320e6b_0
numpy-base 1.17.4 py37hc3f5095_0
nvidia-ml-py3 7.352.1 pypi_0 pypi
oauthlib 3.1.0 py_0
olefile 0.46 py37_0
opencv-python 4.1.2.30 pypi_0 pypi
openssl 1.1.1e he774522_0
opt_einsum 3.1.0 py_0
pathlib 1.0.1 py37_1
pillow 6.2.1 py37hdc69c19_0
pip 20.0.2 py37_1
protobuf 3.11.4 py37h33f27b4_0
psutil 5.7.0 py37he774522_0
pyasn1 0.4.8 py_0
pyasn1-modules 0.2.7 py_0
pycparser 2.20 py_0
pyjwt 1.7.1 py37_0
pyopenssl 19.1.0 py37_0
pyparsing 2.4.6 py_0
pyqt 5.9.2 py37h6538335_2
pyreadline 2.1 py37_1
pysocks 1.7.1 py37_0
python 3.7.7 h60c2a47_0_cpython
python-dateutil 2.8.1 py_0
python_abi 3.7 1_cp37m conda-forge
pytz 2019.3 py_0
pywavelets 1.1.1 py37he774522_0
pywin32 227 py37he774522_1
pyyaml 5.3.1 py37he774522_0
qt 5.9.7 vc14h73c81de_0
requests 2.23.0 py37_0
requests-oauthlib 1.3.0 py_0
rsa 4.0 py_0
scikit-image 0.16.2 py37h47e9c7a_0
scikit-learn 0.22.1 py37h6288b17_0
scipy 1.4.1 py37h9439919_0
setuptools 46.1.1 py37_0
sip 4.19.8 py37h6538335_0
six 1.14.0 py37_0
sqlite 3.31.1 he774522_0
tensorboard 2.1.0 py3_0
tensorflow 1.15.0 gpu_py37hc3743a6_0
tensorflow-base 1.15.0 gpu_py37h1afeea4_0
tensorflow-estimator 1.15.1 pyh2649769_0
tensorflow-gpu 1.15.0 h0d30ee6_0
termcolor 1.1.0 py37_1
tk 8.6.8 hfa6e2cd_0
toolz 0.10.0 py_0
toposort 1.5 py_3 conda-forge
tornado 6.0.4 py37he774522_1
tqdm 4.43.0 py_0
urllib3 1.25.8 py37_0
vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_1
werkzeug 0.16.1 py_0
wheel 0.34.2 py37_0
win_inet_pton 1.1.0 py37_0
wincertstore 0.2 py37_0
wrapt 1.12.1 py37he774522_1
xz 5.2.4 h2fa13f4_4
yaml 0.1.7 hc54c509_2
zlib 1.2.11 h62dcd97_3
zstd 1.3.7 h508b16e_0

================= Configs ==================
--------- .faceswap ---------
backend: nvidia

--------- convert.ini ---------

[color.color_transfer]
clip: True
preserve_paper: True

[color.manual_balance]
colorspace: HSV
balance_1: 0.0
balance_2: 0.0
balance_3: 0.0
contrast: 0.0
brightness: 0.0

[color.match_hist]
threshold: 99.0

[mask.box_blend]
type: gaussian
distance: 11.0
radius: 5.0
passes: 1

[mask.mask_blend]
type: normalized
kernel_size: 3
passes: 4
threshold: 4
erosion: 0.0

[scaling.sharpen]
method: unsharp_mask
amount: 150
radius: 0.3
threshold: 5.0

[writer.ffmpeg]
container: mp4
codec: libx264
crf: 23
preset: medium
tune: none
profile: auto
level: auto

[writer.gif]
fps: 25
loop: 0
palettesize: 256
subrectangles: False

[writer.opencv]
format: png
draw_transparent: False
jpg_quality: 75
png_compress_level: 3

[writer.pillow]
format: png
draw_transparent: False
optimize: False
gif_interlace: True
jpg_quality: 75
png_compress_level: 3
tif_compression: tiff_deflate

--------- extract.ini ---------

[global]
allow_growth: False

[align.fan]
batch-size: 12

[detect.cv2_dnn]
confidence: 50

[detect.mtcnn]
minsize: 20
threshold_1: 0.6
threshold_2: 0.7
threshold_3: 0.7
scalefactor: 0.709
batch-size: 8

[detect.s3fd]
confidence: 70
batch-size: 4

[mask.unet_dfl]
batch-size: 8

[mask.vgg_clear]
batch-size: 6

[mask.vgg_obstructed]
batch-size: 2

--------- gui.ini ---------

[global]
fullscreen: False
tab: extract
options_panel_width: 30
console_panel_height: 20
icon_size: 14
font: default
font_size: 9
autosave_last_session: prompt
timeout: 120
auto_load_model_stats: True

--------- train.ini ---------

[global]
coverage: 68.75
mask_type: none
mask_blur_kernel: 3
mask_threshold: 4
learn_mask: False
icnr_init: False
conv_aware_init: False
subpixel_upscaling: False
reflect_padding: False
penalized_mask_loss: True
loss_function: mae
learning_rate: 5e-05

[model.dfl_h128]
lowmem: False

[model.dfl_sae]
input_size: 128
clipnorm: True
architecture: df
autoencoder_dims: 0
encoder_dims: 42
decoder_dims: 21
multiscale_decoder: False

[model.dlight]
features: best
details: good
output_size: 256

[model.original]
lowmem: False

[model.realface]
input_size: 64
output_size: 128
dense_nodes: 1536
complexity_encoder: 128
complexity_decoder: 512

[model.unbalanced]
input_size: 128
lowmem: False
clipnorm: True
nodes: 1024
complexity_encoder: 128
complexity_decoder_a: 384
complexity_decoder_b: 512

[model.villain]
lowmem: False

[trainer.original]
preview_images: 14
zoom_amount: 5
rotation_range: 10
shift_range: 5
flip_chance: 50
color_lightness: 30
color_ab: 8
color_clahe_chance: 50
color_clahe_max_size: 4

Post by **torzdf** » Wed Mar 25, 2020 3:01 pm

The "Allow Growth" option forces Single GPU usage

Faceswap Forum

Dual p106-6gb

Dual p106-6gb

Re: Dual p90-6gb

Re: Dual p90-6gb

Re: Dual p90-6gb

Re: Dual p90-6gb

Re: Dual p90-6gb

Re: Dual p90-6gb

Re: Dual p90-6gb

Re: Dual p106-6gb

Re: Dual p106-6gb

Re: Dual p106-6gb

packages in environment at C:\Users\AbigFlea\MiniConda3\envs\faceswap:

Name Version Build Channel

Re: Dual p106-6gb