Dual p106-6gb

Talk about Hardware used for Deep Learning
Post Reply
User avatar
abigflea
Posts: 28
Joined: Sat Feb 22, 2020 10:59 pm
Has thanked: 3 times
Been thanked: 3 times

Dual p106-6gb

Post by abigflea »

I have been running slowly but fine on a single P106-6gb Mining card.
Bought a second identical card, but causes a hard windows lock as soon as training begins, no logs.
Loads training images, then "Enabled TensorBoard logging" then after a few seconds, freeze.

I've used them separately, and both work fine with Faceswap.
I've gone thru various iterations of changing drivers and hardware setups to no avail.
I'm wondering why if course.

My main card is a Rx570 , cards 2 & 3 are the mining cards.

suggestions?
Last edited by abigflea on Tue Mar 24, 2020 7:52 pm, edited 1 time in total.

User avatar
deephomage
Posts: 26
Joined: Fri Jul 12, 2019 6:09 pm
Answers: 1
Has thanked: 2 times
Been thanked: 6 times

Re: Dual p90-6gb

Post by deephomage »

Are you sure your motherboard can support three graphics cards? Hard lockups usually indicate an IRQ/hardware conflict. If you're not still mining, the simplest solution is to sell the mining cards and use the proceeds to buy an Nvidia 10x or 20x series card with 6 Gb VRAM or more. Hardware advice is here: viewtopic.php?f=16&t=10.

User avatar
abigflea
Posts: 28
Joined: Sat Feb 22, 2020 10:59 pm
Has thanked: 3 times
Been thanked: 3 times

Re: Dual p90-6gb

Post by abigflea »

Motherboard checks out. I had 2x RX570 and a 1050Ti before when I was playing around with mining last year.
Even tried the procedure of modifying the driver so they just looked like 2x 1060 - 6gb to the OS, which works, allowing directx. Thinking about trying with Linux..

User avatar
bryanlyon
Site Admin
Posts: 325
Joined: Fri Jul 12, 2019 12:49 am
Answers: 27
Location: San Francisco
Has thanked: 3 times
Been thanked: 87 times
Contact:

Re: Dual p90-6gb

Post by bryanlyon »

Chances are this is due to a slow response by your PSU to the sudden demand of 2 GPUs. This is probably creating a "brown out" type situation where due to a temporary voltage drop your system hangs. Faceswap (like most GPU compute tasks) can cause a sudden spike in voltage draw when first starting.

You might want to try limiting the GPU power during the starting of the train and then raise the power limit after it starts. Another option might be to load the GPUs with something like Unigine Heaven before you turn on the compute and then stop the benchmark. Both of these options try to reduce the sudden spike of power draw, but the real solution would be to ensure that your GPU can handle spikes of demand fast enough. (Bigger power supplies actually take longer to respond to spikes than smaller ones, so don't go saying "But my PSU is 1200w" or something lke that.)

Hacking the drivers or running Linux wont solve this problem as it's almost definitely a hardware issue.

User avatar
abigflea
Posts: 28
Joined: Sat Feb 22, 2020 10:59 pm
Has thanked: 3 times
Been thanked: 3 times

Re: Dual p90-6gb

Post by abigflea »

1000 Watt ;-) I've reverted all the drivers.
Ill give it a shot with the benchmark Idea. Thank you.

User avatar
abigflea
Posts: 28
Joined: Sat Feb 22, 2020 10:59 pm
Has thanked: 3 times
Been thanked: 3 times

Re: Dual p90-6gb

Post by abigflea »

Because the cards have no display I used "geekbench" to benchmark the GPU for CUDA.
Runs fine on the second card while the first card is running faceswap. (Also gets HOT).
Even overclocked both about 2% to see if I could cause a failure.
I can clearly run both.

Is their a way to manually configure which GPU's are in use?
for instance a individual has 3x 1060,
Tells Faceswap to use 2 of them.
How does/ where does the Faceswap package determine which ones to use or ignore when you select 'use 2 gpu'?
Just trying to narrow the list.

User avatar
bryanlyon
Site Admin
Posts: 325
Joined: Fri Jul 12, 2019 12:49 am
Answers: 27
Location: San Francisco
Has thanked: 3 times
Been thanked: 87 times
Contact:

Re: Dual p90-6gb

Post by bryanlyon »

When I say use the benchmark, I just mean use it for a short time while you start the training process, but you should still use both for Faceswap. It's just there to "clear the hurdle" of the demand spike. You'll probably have an easier time of lowering the power needs.

User avatar
abigflea
Posts: 28
Joined: Sat Feb 22, 2020 10:59 pm
Has thanked: 3 times
Been thanked: 3 times

Re: Dual p90-6gb

Post by abigflea »

I did reduce the power needs. used the benchmark to ramp things up first. Pretty sure there is no 'brownout' hurtle. Everything is closed so I have had time today to fool around.

I did notice the GPU are numbered:
#0 p106
#1 Rx570
#2 p106

Now I am running FaceSwap on GPU 0, folding@home on GPU 1 & 2, no problems.

I'm sure ill have time in the next couple days and try a shuffling of cards.
Will see if that will work w/o making Win10 very angry.

User avatar
abigflea
Posts: 28
Joined: Sat Feb 22, 2020 10:59 pm
Has thanked: 3 times
Been thanked: 3 times

Re: Dual p106-6gb

Post by abigflea »

OK. did some rearranging, received a log and error!

Both P106-90 show up in device manager. Working correctly.

Part of Crash log

Code: Select all

03/24/2020 18:59:57 MainProcess     _training_0     _base           name                      DEBUG    model name: 'dfl_sae'
03/24/2020 18:59:57 MainProcess     _training_0     _base           add_network               DEBUG    name: 'decoder_b', filename: 'dfl_sae_decoder_B.h5'
03/24/2020 18:59:57 MainProcess     _training_0     _base           __init__                  DEBUG    Initializing NNMeta: (filename: 'T:\Goodnight-Alice2\dfl_sae_decoder_B.h5', network_type: 'decoder', side: 'b', network: <keras.engine.training.Model object at 0x000001E1DCAD00C8>, is_output: True
03/24/2020 18:59:57 MainProcess     _training_0     _base           __init__                  DEBUG    Initialized NNMeta
03/24/2020 18:59:57 MainProcess     _training_0     dfl_sae         add_networks              DEBUG    Added networks
03/24/2020 18:59:57 MainProcess     _training_0     _base           load_models               DEBUG    Load model: (swapped: False)
03/24/2020 18:59:57 MainProcess     _training_0     _base           models_exist              DEBUG    Pre-existing models exist: False
03/24/2020 18:59:57 MainProcess     _training_0     _base           name                      DEBUG    model name: 'dfl_sae'
03/24/2020 18:59:57 MainProcess     _training_0     _base           load_models               INFO     Creating new 'dfl_sae' model in folder: 'T:\Goodnight-Alice2'
03/24/2020 18:59:57 MainProcess     _training_0     _base           get_inputs                DEBUG    Getting inputs
03/24/2020 18:59:57 MainProcess     _training_0     _base           get_inputs                DEBUG    Got inputs: [<tf.Tensor 'face_in:0' shape=(?, 128, 128, 3) dtype=float32>]
03/24/2020 18:59:57 MainProcess     _training_0     dfl_sae         build_autoencoders        DEBUG    Initializing model
03/24/2020 18:59:57 MainProcess     _training_0     dfl_sae         build_df_autoencoder      DEBUG    Adding Autoencoder. Side: a
03/24/2020 18:59:57 MainProcess     _training_0     _base           add_predictor             DEBUG    Adding predictor: (side: 'a', model: <keras.engine.training.Model object at 0x000001E1DCAD0B08>)
03/24/2020 18:59:57 MainProcess     _training_0     _base           add_predictor             DEBUG    Converting to multi-gpu: side a
03/24/2020 18:59:57 MainProcess     _training_0     multithreading  run                       DEBUG    Error in thread (_training_0): To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.
03/24/2020 18:59:58 MainProcess     MainThread      train           _monitor                  DEBUG    Thread error detected
03/24/2020 18:59:58 MainProcess     MainThread      train           _monitor                  DEBUG    Closed Monitor
03/24/2020 18:59:58 MainProcess     MainThread      train           _end_thread               DEBUG    Ending Training thread
03/24/2020 18:59:58 MainProcess     MainThread      train           _end_thread               CRITICAL Error caught! Exiting...
03/24/2020 18:59:58 MainProcess     MainThread      multithreading  join                      DEBUG    Joining Threads: '_training'
03/24/2020 18:59:58 MainProcess     MainThread      multithreading  join                      DEBUG    Joining Thread: '_training_0'
03/24/2020 18:59:58 MainProcess     MainThread      multithreading  join                      ERROR    Caught exception in thread: '_training_0'
03/24/2020 18:59:58 MainProcess     MainThread      cli             execute_script            ERROR    To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.
Traceback (most recent call last):
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 248, in build
    self.build_autoencoders(inputs)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 70, in build_autoencoders
    getattr(self, "build_{}_autoencoder".format(self.architecture))(inputs)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 94, in build_df_autoencoder
    self.add_predictor(side, autoencoder)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 326, in add_predictor
    model = multi_gpu_model(model, self.gpus)
  File "C:\Users\AbigFlea\MiniConda3\envs\faceswap\lib\site-packages\keras\utils\multi_gpu_utils.py", line 181, in multi_gpu_model
    available_devices))
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "s:\Users\AbigFlea\faceswap\lib\cli.py", line 128, in execute_script
    process.process()
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 159, in process
    self._end_thread(thread, err)
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 199, in _end_thread
    thread.join()
  File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 224, in _training
    raise err
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 212, in _training
    model = self._load_model()
  File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 253, in _load_model
    predict=False)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 23, in __init__
    super().__init__(*args, **kwargs)
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 126, in __init__
    self.build()
  File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 257, in build
    raise FaceswapError(str(err)) from err
lib.utils.FaceswapError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.


User avatar
deephomage
Posts: 26
Joined: Fri Jul 12, 2019 6:09 pm
Answers: 1
Has thanked: 2 times
Been thanked: 6 times

Re: Dual p106-6gb

Post by deephomage »

The entire crash log is needed for troubleshooting. Please post it [Editted]Here, or[/edit] in the appropriate discord channel.
Last edited by bryanlyon on Wed Mar 25, 2020 12:44 am, edited 1 time in total.
Reason: Editted

User avatar
abigflea
Posts: 28
Joined: Sat Feb 22, 2020 10:59 pm
Has thanked: 3 times
Been thanked: 3 times

Re: Dual p106-6gb

Post by abigflea »

03/24/2020 19:12:13 MainProcess _training_0 _base name DEBUG model name: 'dfl_sae'
03/24/2020 19:12:13 MainProcess _training_0 _base add_network DEBUG name: 'decoder_a', filename: 'dfl_sae_decoder_A.h5'
03/24/2020 19:12:13 MainProcess _training_0 _base __init__ DEBUG Initializing NNMeta: (filename: 'T:\Goodnight-Alice2\dfl_sae_decoder_A.h5', network_type: 'decoder', side: 'a', network: <keras.engine.training.Model object at 0x0000025AE5B63308>, is_output: True
03/24/2020 19:12:13 MainProcess _training_0 _base __init__ DEBUG Initialized NNMeta
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks upscale DEBUG inp: Tensor("input_3:0", shape=(?, 16, 16, 512), dtype=float32), filters: 504, kernel_size: 3, use_instance_norm: False, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: upscale_16_1
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5B631C8>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("input_3:0", shape=(?, 16, 16, 512), dtype=float32), filters: 2016, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'upscale_16_1_conv2d', 'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5B631C8>})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5B631C8>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks res_block DEBUG inp: Tensor("upscale_16_1_pixelshuffler/Reshape_1:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: residual_32_2
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_2_leakyrelu_0/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_32_2_conv2d_0'})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5B9A408>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_2_leakyrelu_1/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8>})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_32_2
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5BB27C8> to None
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks res_block DEBUG inp: Tensor("residual_32_2_leakyrelu_3/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: residual_32_3
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_3_leakyrelu_0/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_32_3_conv2d_0'})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5BB1E48>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_3_leakyrelu_1/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48>})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_32_3
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5BC1D48> to None
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks upscale DEBUG inp: Tensor("residual_32_3_leakyrelu_3/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 252, kernel_size: 3, use_instance_norm: False, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: upscale_32_1
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5BD4108>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_32_3_leakyrelu_3/LeakyRelu:0", shape=(?, 32, 32, 504), dtype=float32), filters: 1008, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'upscale_32_1_conv2d', 'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BD4108>})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BD4108>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks res_block DEBUG inp: Tensor("upscale_32_1_pixelshuffler/Reshape_1:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: residual_64_2
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_2_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_2_conv2d_0'})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5BE0DC8>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_2_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8>})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_64_2
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5BEC0C8> to None
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks res_block DEBUG inp: Tensor("residual_64_2_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: residual_64_3
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_3_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_3_conv2d_0'})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C01C48>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5C06108>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_3_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 252, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C06108>})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_64_3
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C06108>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5C06108> to None
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks upscale DEBUG inp: Tensor("residual_64_3_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 126, kernel_size: 3, use_instance_norm: False, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: upscale_64_1
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C0DF88>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_64_3_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 252), dtype=float32), filters: 504, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'upscale_64_1_conv2d', 'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C0DF88>})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C0DF88>
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks res_block DEBUG inp: Tensor("upscale_64_1_pixelshuffler/Reshape_1:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, kwargs: {})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: residual_128_2
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_2_leakyrelu_0/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_128_2_conv2d_0'})
03/24/2020 19:12:13 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C3CAC8>
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448>
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_2_leakyrelu_1/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448>})
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_128_2
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448>
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5C4C448> to None
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks res_block DEBUG inp: Tensor("residual_128_2_leakyrelu_3/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, kwargs: {})
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: residual_128_3
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_3_leakyrelu_0/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_128_3_conv2d_0'})
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C43888>
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from None to <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88>
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_3_leakyrelu_1/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 126, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88>})
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks get_name DEBUG Generating block name: conv2d_128_3
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Using model specified initializer: <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88>
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks switch_kernel_initializer DEBUG Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x0000025AE5C6DD88> to None
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks conv2d DEBUG inp: Tensor("residual_128_3_leakyrelu_3/LeakyRelu:0", shape=(?, 128, 128, 126), dtype=float32), filters: 3, kernel_size: 5, strides: (1, 1), padding: same, kwargs: {'activation': 'sigmoid', 'name': 'face_out_128'})
03/24/2020 19:12:14 MainProcess _training_0 nn_blocks set_default_initializer DEBUG Set default kernel_initializer to: <keras.initializers.VarianceScaling object at 0x0000025AE5C62DC8>
03/24/2020 19:12:14 MainProcess _training_0 _base add_network DEBUG network_type: 'decoder', side: 'b', network: '<keras.engine.training.Model object at 0x0000025AE5C810C8>', is_output: True
03/24/2020 19:12:14 MainProcess _training_0 _base name DEBUG model name: 'dfl_sae'
03/24/2020 19:12:14 MainProcess _training_0 _base add_network DEBUG name: 'decoder_b', filename: 'dfl_sae_decoder_B.h5'
03/24/2020 19:12:14 MainProcess _training_0 _base __init__ DEBUG Initializing NNMeta: (filename: 'T:\Goodnight-Alice2\dfl_sae_decoder_B.h5', network_type: 'decoder', side: 'b', network: <keras.engine.training.Model object at 0x0000025AE5C810C8>, is_output: True
03/24/2020 19:12:14 MainProcess _training_0 _base __init__ DEBUG Initialized NNMeta
03/24/2020 19:12:14 MainProcess _training_0 dfl_sae add_networks DEBUG Added networks
03/24/2020 19:12:14 MainProcess _training_0 _base load_models DEBUG Load model: (swapped: False)
03/24/2020 19:12:14 MainProcess _training_0 _base models_exist DEBUG Pre-existing models exist: False
03/24/2020 19:12:14 MainProcess _training_0 _base name DEBUG model name: 'dfl_sae'
03/24/2020 19:12:14 MainProcess _training_0 _base load_models INFO Creating new 'dfl_sae' model in folder: 'T:\Goodnight-Alice2'
03/24/2020 19:12:14 MainProcess _training_0 _base get_inputs DEBUG Getting inputs
03/24/2020 19:12:14 MainProcess _training_0 _base get_inputs DEBUG Got inputs: [<tf.Tensor 'face_in:0' shape=(?, 128, 128, 3) dtype=float32>]
03/24/2020 19:12:14 MainProcess _training_0 dfl_sae build_autoencoders DEBUG Initializing model
03/24/2020 19:12:14 MainProcess _training_0 dfl_sae build_df_autoencoder DEBUG Adding Autoencoder. Side: a
03/24/2020 19:12:14 MainProcess _training_0 _base add_predictor DEBUG Adding predictor: (side: 'a', model: <keras.engine.training.Model object at 0x0000025AE5C81B88>)
03/24/2020 19:12:14 MainProcess _training_0 _base add_predictor DEBUG Converting to multi-gpu: side a
03/24/2020 19:12:14 MainProcess _training_0 multithreading run DEBUG Error in thread (_training_0): To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.
03/24/2020 19:12:15 MainProcess MainThread train _monitor DEBUG Thread error detected
03/24/2020 19:12:15 MainProcess MainThread train _monitor DEBUG Closed Monitor
03/24/2020 19:12:15 MainProcess MainThread train _end_thread DEBUG Ending Training thread
03/24/2020 19:12:15 MainProcess MainThread train _end_thread CRITICAL Error caught! Exiting...
03/24/2020 19:12:15 MainProcess MainThread multithreading join DEBUG Joining Threads: '_training'
03/24/2020 19:12:15 MainProcess MainThread multithreading join DEBUG Joining Thread: '_training_0'
03/24/2020 19:12:15 MainProcess MainThread multithreading join ERROR Caught exception in thread: '_training_0'
03/24/2020 19:12:15 MainProcess MainThread cli execute_script ERROR To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.
Traceback (most recent call last):
File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 248, in build
self.build_autoencoders(inputs)
File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 70, in build_autoencoders
getattr(self, "build_{}_autoencoder".format(self.architecture))(inputs)
File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 94, in build_df_autoencoder
self.add_predictor(side, autoencoder)
File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 326, in add_predictor
model = multi_gpu_model(model, self.gpus)
File "C:\Users\AbigFlea\MiniConda3\envs\faceswap\lib\site-packages\keras\utils\multi_gpu_utils.py", line 181, in multi_gpu_model
available_devices))
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "s:\Users\AbigFlea\faceswap\lib\cli.py", line 128, in execute_script
process.process()
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 159, in process
self._end_thread(thread, err)
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 199, in _end_thread
thread.join()
File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 121, in join
raise thread.err[1].with_traceback(thread.err[2])
File "s:\Users\AbigFlea\faceswap\lib\multithreading.py", line 37, in run
self._target(*self._args, **self._kwargs)
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 224, in _training
raise err
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 212, in _training
model = self._load_model()
File "s:\Users\AbigFlea\faceswap\scripts\train.py", line 253, in _load_model
predict=False)
File "s:\Users\AbigFlea\faceswap\plugins\train\model\dfl_sae.py", line 23, in __init__
super().__init__(*args, **kwargs)
File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 126, in __init__
self.build()
File "s:\Users\AbigFlea\faceswap\plugins\train\model\_base.py", line 257, in build
raise FaceswapError(str(err)) from err
lib.utils.FaceswapError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing `gpus`.

============ System Information ============
encoding: cp1252
git_branch: master
git_commits: 4153a7e Tools Restructure (#990)
gpu_cuda: 10.2
gpu_cudnn: No global version found. Check Conda packages for Conda cuDNN
gpu_devices: GPU_0: P106-090, GPU_1: P106-090
gpu_devices_active: GPU_0, GPU_1
gpu_driver: 441.22
gpu_vram: GPU_0: 6077MB, GPU_1: 6077MB
os_machine: AMD64
os_platform: Windows-10-10.0.18362-SP0
os_release: 10
py_command: s:\Users\AbigFlea\faceswap\faceswap.py train -A S:/Extracted/Alice/Extract -ala S:/Extracted/Alice/origional/alignments.fsa -B S:/Extracted/GoodNight/Extracted -alb S:/Extracted/GoodNight/Origional/alignments.fsa -m T:/Goodnight-Alice2 -t dfl-sae -bs 10 -it 1000000 -g 2 -s 100 -ss 25000 -ps 50 -ag -L INFO -gui
py_conda_version: conda 4.8.3
py_implementation: CPython
py_version: 3.7.7
py_virtual_env: True
sys_cores: 8
sys_processor: AMD64 Family 21 Model 2 Stepping 0, AuthenticAMD
sys_ram: Total: 16341MB, Available: 9730MB, Used: 6611MB, Free: 9730MB

=============== Pip Packages ===============
absl-py==0.9.0
asn1crypto==1.3.0
astor==0.8.0
blinker==1.4
cachetools==3.1.1
certifi==2019.11.28
cffi==1.14.0
chardet==3.0.4
click==7.1.1
cloudpickle==1.3.0
cryptography==2.8
cycler==0.10.0
cytoolz==0.10.1
dask==2.12.0
decorator==4.4.2
fastcluster==1.1.26
ffmpy==0.2.2
gast==0.2.2
google-auth==1.11.2
google-auth-oauthlib==0.4.1
google-pasta==0.1.8
grpcio==1.27.2
h5py==2.9.0
idna==2.9
imageio==2.6.1
imageio-ffmpeg==0.4.1
joblib==0.14.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==3.1.3
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
networkx==2.4
numpy==1.17.4
nvidia-ml-py3==7.352.1
oauthlib==3.1.0
olefile==0.46
opencv-python==4.1.2.30
opt-einsum==3.1.0
pathlib==1.0.1
Pillow==6.2.1
protobuf==3.11.4
psutil==5.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.7
pycparser==2.20
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyreadline==2.1
PySocks==1.7.1
python-dateutil==2.8.1
pytz==2019.3
PyWavelets==1.1.1
pywin32==227
PyYAML==5.3.1
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
scikit-image==0.16.2
scikit-learn==0.22.1
scipy==1.4.1
six==1.14.0
tensorboard==2.1.0
tensorflow==1.15.0
tensorflow-estimator==1.15.1
termcolor==1.1.0
toolz==0.10.0
toposort==1.5
tornado==6.0.4
tqdm==4.43.0
urllib3==1.25.8
Werkzeug==0.16.1
win-inet-pton==1.1.0
wincertstore==0.2
wrapt==1.12.1

============== Conda Packages ==============
# packages in environment at C:\Users\AbigFlea\MiniConda3\envs\faceswap:
#
# Name Version Build Channel
_tflow_select 2.1.0 gpu
absl-py 0.9.0 py37_0
asn1crypto 1.3.0 py37_0
astor 0.8.0 py37_0
blas 1.0 mkl
blinker 1.4 py37_0
ca-certificates 2020.1.1 0
cachetools 3.1.1 py_0
certifi 2019.11.28 py37_1
cffi 1.14.0 py37h7a1dbc1_0
chardet 3.0.4 py37_1003
click 7.1.1 py_0
cloudpickle 1.3.0 py_0
cryptography 2.8 py37h7a1dbc1_0
cudatoolkit 10.0.130 0
cudnn 7.6.5 cuda10.0_0
cycler 0.10.0 py37_0
cytoolz 0.10.1 py37he774522_0
dask-core 2.12.0 py_0
decorator 4.4.2 py_0
fastcluster 1.1.26 py37he350917_0 conda-forge
ffmpeg 4.2 h6538335_0 conda-forge
ffmpy 0.2.2 pypi_0 pypi
freetype 2.9.1 ha9979f8_1
gast 0.2.2 py37_0
git 2.23.0 h6bb4b03_0
google-auth 1.11.2 py_0
google-auth-oauthlib 0.4.1 py_2
google-pasta 0.1.8 py_0
grpcio 1.27.2 py37h351948d_0
h5py 2.9.0 py37h5e291fa_0
hdf5 1.10.4 h7ebc959_0
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha66f8fd_1
idna 2.9 py_1
imageio 2.6.1 py37_0
imageio-ffmpeg 0.4.1 py_0 conda-forge
intel-openmp 2020.0 166
joblib 0.14.1 py_0
jpeg 9b hb83a4c4_2
keras 2.2.4 0
keras-applications 1.0.8 py_0
keras-base 2.2.4 py37_0
keras-preprocessing 1.1.0 py_1
kiwisolver 1.1.0 py37ha925a31_0
libpng 1.6.37 h2a8f88b_0
libprotobuf 3.11.4 h7bd577a_0
libtiff 4.1.0 h56a325e_0
markdown 3.1.1 py37_0
matplotlib 3.1.1 py37hc8f65d3_0
matplotlib-base 3.1.3 py37h64f37c6_0
mkl 2020.0 166
mkl-service 2.3.0 py37hb782905_0
mkl_fft 1.0.15 py37h14836fe_0
mkl_random 1.1.0 py37h675688f_0
networkx 2.4 py_0
numpy 1.17.4 py37h4320e6b_0
numpy-base 1.17.4 py37hc3f5095_0
nvidia-ml-py3 7.352.1 pypi_0 pypi
oauthlib 3.1.0 py_0
olefile 0.46 py37_0
opencv-python 4.1.2.30 pypi_0 pypi
openssl 1.1.1e he774522_0
opt_einsum 3.1.0 py_0
pathlib 1.0.1 py37_1
pillow 6.2.1 py37hdc69c19_0
pip 20.0.2 py37_1
protobuf 3.11.4 py37h33f27b4_0
psutil 5.7.0 py37he774522_0
pyasn1 0.4.8 py_0
pyasn1-modules 0.2.7 py_0
pycparser 2.20 py_0
pyjwt 1.7.1 py37_0
pyopenssl 19.1.0 py37_0
pyparsing 2.4.6 py_0
pyqt 5.9.2 py37h6538335_2
pyreadline 2.1 py37_1
pysocks 1.7.1 py37_0
python 3.7.7 h60c2a47_0_cpython
python-dateutil 2.8.1 py_0
python_abi 3.7 1_cp37m conda-forge
pytz 2019.3 py_0
pywavelets 1.1.1 py37he774522_0
pywin32 227 py37he774522_1
pyyaml 5.3.1 py37he774522_0
qt 5.9.7 vc14h73c81de_0
requests 2.23.0 py37_0
requests-oauthlib 1.3.0 py_0
rsa 4.0 py_0
scikit-image 0.16.2 py37h47e9c7a_0
scikit-learn 0.22.1 py37h6288b17_0
scipy 1.4.1 py37h9439919_0
setuptools 46.1.1 py37_0
sip 4.19.8 py37h6538335_0
six 1.14.0 py37_0
sqlite 3.31.1 he774522_0
tensorboard 2.1.0 py3_0
tensorflow 1.15.0 gpu_py37hc3743a6_0
tensorflow-base 1.15.0 gpu_py37h1afeea4_0
tensorflow-estimator 1.15.1 pyh2649769_0
tensorflow-gpu 1.15.0 h0d30ee6_0
termcolor 1.1.0 py37_1
tk 8.6.8 hfa6e2cd_0
toolz 0.10.0 py_0
toposort 1.5 py_3 conda-forge
tornado 6.0.4 py37he774522_1
tqdm 4.43.0 py_0
urllib3 1.25.8 py37_0
vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_1
werkzeug 0.16.1 py_0
wheel 0.34.2 py37_0
win_inet_pton 1.1.0 py37_0
wincertstore 0.2 py37_0
wrapt 1.12.1 py37he774522_1
xz 5.2.4 h2fa13f4_4
yaml 0.1.7 hc54c509_2
zlib 1.2.11 h62dcd97_3
zstd 1.3.7 h508b16e_0

================= Configs ==================
--------- .faceswap ---------
backend: nvidia

--------- convert.ini ---------

[color.color_transfer]
clip: True
preserve_paper: True

[color.manual_balance]
colorspace: HSV
balance_1: 0.0
balance_2: 0.0
balance_3: 0.0
contrast: 0.0
brightness: 0.0

[color.match_hist]
threshold: 99.0

[mask.box_blend]
type: gaussian
distance: 11.0
radius: 5.0
passes: 1

[mask.mask_blend]
type: normalized
kernel_size: 3
passes: 4
threshold: 4
erosion: 0.0

[scaling.sharpen]
method: unsharp_mask
amount: 150
radius: 0.3
threshold: 5.0

[writer.ffmpeg]
container: mp4
codec: libx264
crf: 23
preset: medium
tune: none
profile: auto
level: auto

[writer.gif]
fps: 25
loop: 0
palettesize: 256
subrectangles: False

[writer.opencv]
format: png
draw_transparent: False
jpg_quality: 75
png_compress_level: 3

[writer.pillow]
format: png
draw_transparent: False
optimize: False
gif_interlace: True
jpg_quality: 75
png_compress_level: 3
tif_compression: tiff_deflate

--------- extract.ini ---------

[global]
allow_growth: False

[align.fan]
batch-size: 12

[detect.cv2_dnn]
confidence: 50

[detect.mtcnn]
minsize: 20
threshold_1: 0.6
threshold_2: 0.7
threshold_3: 0.7
scalefactor: 0.709
batch-size: 8

[detect.s3fd]
confidence: 70
batch-size: 4

[mask.unet_dfl]
batch-size: 8

[mask.vgg_clear]
batch-size: 6

[mask.vgg_obstructed]
batch-size: 2

--------- gui.ini ---------

[global]
fullscreen: False
tab: extract
options_panel_width: 30
console_panel_height: 20
icon_size: 14
font: default
font_size: 9
autosave_last_session: prompt
timeout: 120
auto_load_model_stats: True

--------- train.ini ---------

[global]
coverage: 68.75
mask_type: none
mask_blur_kernel: 3
mask_threshold: 4
learn_mask: False
icnr_init: False
conv_aware_init: False
subpixel_upscaling: False
reflect_padding: False
penalized_mask_loss: True
loss_function: mae
learning_rate: 5e-05

[model.dfl_h128]
lowmem: False

[model.dfl_sae]
input_size: 128
clipnorm: True
architecture: df
autoencoder_dims: 0
encoder_dims: 42
decoder_dims: 21
multiscale_decoder: False

[model.dlight]
features: best
details: good
output_size: 256

[model.original]
lowmem: False

[model.realface]
input_size: 64
output_size: 128
dense_nodes: 1536
complexity_encoder: 128
complexity_decoder: 512

[model.unbalanced]
input_size: 128
lowmem: False
clipnorm: True
nodes: 1024
complexity_encoder: 128
complexity_decoder_a: 384
complexity_decoder_b: 512

[model.villain]
lowmem: False

[trainer.original]
preview_images: 14
zoom_amount: 5
rotation_range: 10
shift_range: 5
flip_chance: 50
color_lightness: 30
color_ab: 8
color_clahe_chance: 50
color_clahe_max_size: 4


User avatar
torzdf
Posts: 647
Joined: Fri Jul 12, 2019 12:53 am
Answers: 94
Has thanked: 17 times
Been thanked: 130 times

Re: Dual p106-6gb

Post by torzdf »

The "Allow Growth" option forces Single GPU usage
My word is final

Post Reply