Issue using dual P106-100s when training

ericpan0513 · Post by **ericpan0513** » Thu Aug 20, 2020 6:52 am

I also have the pop up and quickly disappear question, and also fix it by reinstall miniconda and faceswap.
I have questions about the new multi_gpus function.
After the update, how can we choose how many GPUs we will use? Or if I enable the "distributed" function, it just use all of the GPUs detected?

And there's another problem, I just enable distributed option with 2 P106-100 GPUs(just wanna try if this work on the new multi_gpu strategy), However one GPU was 100% loaded, the other only got 6%. What's more, the training speed dropped from 27 EG/s(1 GPU) to 4 EG/s(2GPUs). Do you know what's going on?
Thanks.

Post by **abigflea** » Thu Aug 20, 2020 8:37 am

Were you using those P106-100 before ? Those mining cards have an odd internal architecture .

Distributed enables multi GPU, and will use the GPU's you have not excluded.

Are you on Linux or Windows?
What is your GPU setup before and now?
Can you post the crash log?

ericpan0513 · Post by **ericpan0513** » Thu Aug 20, 2020 10:23 am

Code: Select all

============ System Information ============
encoding:            cp950
git_branch:          master
git_commits:         0a25dff model.config - Make convert batchsize a user configurable option. 45d6995 bugfix - Extract - VGG Clear Mask - Fix for TF2. baa2866 bugfix - Update Dependencies - Avoid constantly trying to redownload Tensorflow. 9c5568f Bugfix - Models.dfl_h128. f897562 Set minimum python version to 3.7
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: P106-100, GPU_1: P106-100
gpu_devices_active:  GPU_0, GPU_1
gpu_driver:          432.00
gpu_vram:            GPU_0: 6077MB, GPU_1: 6077MB
os_machine:          AMD64
os_platform:         Windows-10-10.0.18362-SP0
os_release:          10
py_command:          C:\Users\Guan Yi\faceswap/faceswap.py gui
py_conda_version:    conda 4.8.4
py_implementation:   CPython
py_version:          3.7.7
py_virtual_env:      True
sys_cores:           4
sys_processor:       Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
sys_ram:             Total: 16275MB, Available: 10177MB, Used: 6097MB, Free: 10177MB

=============== Pip Packages ===============


============== Conda Packages ==============
# packages in environment at C:\Users\Guan Yi\MiniConda3\envs\faceswap:
#
# Name                    Version                   Build  Channel
absl-py                   0.9.0                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
blas                      1.0                         mkl  

ca-certificates           2020.6.24                     0  

cachetools                4.1.1                    pypi_0    pypi
certifi                   2020.6.20                py37_0  

chardet                   3.0.4                    pypi_0    pypi
cudatoolkit               10.1.243             h74a9793_0  

cudnn                     7.6.5                cuda10.1_0  

cycler                    0.10.0                   py37_0  

fastcluster               1.1.26           py37h9b59f54_1    conda-forge
ffmpeg                    4.3.1                ha925a31_0    conda-forge
ffmpy                     0.2.3                    pypi_0    pypi
freetype                  2.10.2               hd328e21_0  

gast                      0.3.3                    pypi_0    pypi
git                       2.23.0               h6bb4b03_0  

google-auth               1.20.1                   pypi_0    pypi
google-auth-oauthlib      0.4.1                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.31.0                   pypi_0    pypi
h5py                      2.10.0                   pypi_0    pypi
icc_rt                    2019.0.0             h0cc432a_1  

icu                       58.2                 ha925a31_3  

idna                      2.10                     pypi_0    pypi
imageio                   2.9.0                      py_0  

imageio-ffmpeg            0.4.2                      py_0    conda-forge
importlib-metadata        1.7.0                    pypi_0    pypi
intel-openmp              2020.1                      216  

joblib                    0.16.0                     py_0  

jpeg                      9b                   hb83a4c4_2  

keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.2.0            py37h74a9793_0  

libpng                    1.6.37               h2a8f88b_0  

libtiff                   4.1.0                h56a325e_1  

lz4-c                     1.9.2                h62dcd97_1  

markdown                  3.2.2                    pypi_0    pypi
matplotlib                3.2.2                         0  

matplotlib-base           3.2.2            py37h64f37c6_0  

mkl                       2020.1                      216  

mkl-service               2.3.0            py37hb782905_0  

mkl_fft                   1.1.0            py37h45dec08_0  

mkl_random                1.1.1            py37h47e9c7a_0  

numpy                     1.19.1           py37h5510c5b_0  

numpy-base                1.19.1           py37ha3acd2a_0  

nvidia-ml-py3             7.352.1                  pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
olefile                   0.46                     py37_0  

opencv-python             4.4.0.42                 pypi_0    pypi
openssl                   1.1.1g               he774522_1  

opt-einsum                3.3.0                    pypi_0    pypi
pathlib                   1.0.1                    py37_2  

pillow                    7.2.0            py37hcc1f983_0  

pip                       20.2.2                   py37_0  

protobuf                  3.13.0                   pypi_0    pypi
psutil                    5.7.0            py37he774522_0  

pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pyparsing                 2.4.7                      py_0  

pyqt                      5.9.2            py37h6538335_2  

python                    3.7.7                h81c818b_4  

python-dateutil           2.8.1                      py_0  

python_abi                3.7                     1_cp37m    conda-forge
pywin32                   227              py37he774522_1  

qt                        5.9.7            vc14h73c81de_0  

requests                  2.24.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.6                      pypi_0    pypi
scikit-learn              0.23.1           py37h25d0782_0  

scipy                     1.4.1                    pypi_0    pypi
setuptools                49.6.0                   py37_0  

sip                       4.19.8           py37h6538335_0  

six                       1.15.0                     py_0  

sqlite                    3.32.3               h2a8f88b_0  

tensorboard               2.2.2                    pypi_0    pypi
tensorboard-plugin-wit    1.7.0                    pypi_0    pypi
tensorflow-gpu            2.2.0                    pypi_0    pypi
tensorflow-gpu-estimator  2.2.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
threadpoolctl             2.1.0              pyh5ca1d4c_0  

tk                        8.6.10               he774522_0  

tornado                   6.0.4            py37he774522_1  

tqdm                      4.48.2                     py_0  

urllib3                   1.25.10                  pypi_0    pypi
vc                        14.1                 h0510ff6_4  

vs2015_runtime            14.16.27012          hf0eaf9b_3  

werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.34.2                   py37_0  

wincertstore              0.2                      py37_0  

wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h62dcd97_0  

zipp                      3.1.0                    pypi_0    pypi
zlib                      1.2.11               h62dcd97_4  

zstd                      1.4.5                h04227a9_0  

================= Configs ==================
--------- .faceswap ---------
backend:                  nvidia

--------- convert.ini ---------

[color.color_transfer]
clip:                     True
preserve_paper:           True

[color.manual_balance]
colorspace:               HSV
balance_1:                0.0
balance_2:                0.0
balance_3:                0.0
contrast:                 0.0
brightness:               0.0

[color.match_hist]
threshold:                99.0

[mask.box_blend]
type:                     gaussian
distance:                 11.0
radius:                   5.0
passes:                   1

[mask.mask_blend]
type:                     normalized
kernel_size:              3
passes:                   4
threshold:                4
erosion:                  0.0

[scaling.sharpen]
method:                   unsharp_mask
amount:                   150
radius:                   0.3
threshold:                5.0

[writer.ffmpeg]
container:                mp4
codec:                    libx264
crf:                      23
preset:                   medium
tune:                     none
profile:                  auto
level:                    auto
skip_mux:                 False

[writer.gif]
fps:                      25
loop:                     0
palettesize:              256
subrectangles:            False

[writer.opencv]
format:                   png
draw_transparent:         False
jpg_quality:              75
png_compress_level:       3

[writer.pillow]
format:                   png
draw_transparent:         False
optimize:                 False
gif_interlace:            True
jpg_quality:              75
png_compress_level:       3
tif_compression:          tiff_deflate

--------- extract.ini ---------

[global]
allow_growth:             False

[align.fan]
batch-size:               12

[detect.cv2_dnn]
confidence:               50

[detect.mtcnn]
minsize:                  20
threshold_1:              0.6
threshold_2:              0.7
threshold_3:              0.7
scalefactor:              0.709
batch-size:               8

[detect.s3fd]
confidence:               70
batch-size:               4

[mask.unet_dfl]
batch-size:               8

[mask.vgg_clear]
batch-size:               6

[mask.vgg_obstructed]
batch-size:               2

--------- gui.ini ---------

[global]
fullscreen:               False
tab:                      extract
options_panel_width:      30
console_panel_height:     20
icon_size:                14
font:                     default
font_size:                9
autosave_last_session:    prompt
timeout:                  120
auto_load_model_stats:    True

--------- train.ini ---------

[global]
coverage:                 100.0
mask_type:                vgg-obstructed
mask_blur_kernel:         3
mask_threshold:           4
learn_mask:               False
penalized_mask_loss:      True
loss_function:            mae
icnr_init:                False
conv_aware_init:          False
optimizer:                adam
learning_rate:            5e-05
reflect_padding:          False
allow_growth:             False
mixed_precision:          False
convert_batchsize:        16

[model.dfl_h128]
lowmem:                   False

[model.dfl_sae]
input_size:               256
clipnorm:                 True
architecture:             liae
autoencoder_dims:         0
encoder_dims:             42
decoder_dims:             21
multiscale_decoder:       False

[model.dlight]
features:                 best
details:                  good
output_size:              384

[model.original]
lowmem:                   False

[model.realface]
input_size:               64
output_size:              128
dense_nodes:              1536
complexity_encoder:       128
complexity_decoder:       512

[model.unbalanced]
input_size:               128
lowmem:                   False
clipnorm:                 True
nodes:                    1024
complexity_encoder:       128
complexity_decoder_a:     384
complexity_decoder_b:     512

[model.villain]
lowmem:                   False

[trainer.original]
preview_images:           14
zoom_amount:              5
rotation_range:           10
shift_range:              5
flip_chance:              50
color_lightness:          30
color_ab:                 8
color_clahe_chance:       50
color_clahe_max_size:     4

This is my sys info. However I couldn't find a crash log.(Or I don't know where)
I'm using Windows 10.
One P106-100 is working fine, even better than 1060. However when distributed, it get worse and one of it have nearly 0 % load(by GPU-Z).
BTW how can I upload pictures? Can't I just copy it on the web?

Post by **abigflea** » Thu Aug 20, 2020 10:37 am

Im currently installing a clean install of Faceswap on a very clean Win10.
Let me see if i can replicate the issues.

The mining cards seem to be fine solo, but may be problematic and 'use at your own risk',
I have 2 and will see what happens. I just need to do some testing including pulling out my current GPU. Give me a bit.

Post by **abigflea** » Thu Aug 20, 2020 4:46 pm

ericpan0513 wrote: ↑Thu Aug 20, 2020 6:52 am
And there's another problem, I just enable distributed option with 2 P106-100 GPUs(just wanna try if this work on the new multi_gpu strategy), However one GPU was 100% loaded, the other only got 6%. What's more, the training speed dropped from 27 EG/s(1 GPU) to 4 EG/s(2GPUs). Do you know what's going on?
Thanks.

I havent forgotten you ericpan0513. Pulling GPU cards now and will start testing your situation which is likely different.
Bit of info I need, do you have any of your GPU connected through a 1x Pcie extender?
Or all plugged directly into your mainboard?
Can you pull up GpuZ and tell me the reported bus interface and number of shaders (Mining cards sometimes do odd things here)

ericpan0513 · Post by **ericpan0513** » Fri Aug 21, 2020 2:45 am

I've got one gpu plugged into mainboard, and the other is connected through a 1x pcie extender.
I have tried both gpus training singly(on mainboard & through extender), and they both worked fine. So maybe its not about the connecting way? And I've also changed the one with extender to different buses, but it still have the same issue.

Here's the two log files of the two gpus recorded by gpuz.
The weird thing is that although the gpu through extender has 100% load, its temperature won't go up, which maybe means that it's also not working? Usually when I train on one, the temperature went up to 65°C. But the CPU was also not full loaded so it's not using it to train. I'm just confused.

extender log.txt: (90.15 KiB) Downloaded 832 times

mainboard log.txt: (88.69 KiB) Downloaded 1184 times

Hope we can figure it out. Thanks for helping.

Post by **abigflea** » Fri Aug 21, 2020 3:31 am

Those logs are not showing me the number of shaders on the P106. That can be a huge tell something is amiss.
Maybe a screenshot of just Gpuz

Those 1xpcie connectors seem to cause some other issues I can't pin down, its how Nvidia drivers work in Linux and Windows. I don't think the Devs are in the mood to rewrite Nvidia drivers and Tensorflow from the ground up.

FYI I did test with my custom cards, also P106. They would get like 1.1 EGS/s which is way 'better' than in FS 1 distrubuted . Technically they shouldn't work at all.
Although, just like you, individually they work just fine! They get _8EGS/s each on my typical DFL-SAE model.
A single 1070 gets 20EGs/s .

Current Nvidia 452 drivers, and updated faceswap as of 12hrs ago.

Anyway, the screenshot of Gpuz and maybe the Faceswap.log or crash.log generated when you start up could be handy to see whats up. There still may be a chance you can use the new more efficient FS.

Yes, I tested a lot of different hardware and software configs today.

ericpan0513 · Post by **ericpan0513** » Fri Aug 21, 2020 3:50 am

It;s 1280 unified
The starting log:

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
08/21/2020 11:47:23 INFO     Log level set to: INFO
08/21/2020 11:47:25 INFO     Model A Directory: C:\Users\Guan Yi\Desktop\train\Trump_hd\Trump_hd_ex
08/21/2020 11:47:25 INFO     Model B Directory: C:\Users\Guan Yi\Desktop\train\Albert_hd\albert_hd_ex
08/21/2020 11:47:25 INFO     Training data directory: C:\Users\Guan Yi\Desktop\train\Model1
08/21/2020 11:47:25 INFO     ===================================================
08/21/2020 11:47:25 INFO       Starting
08/21/2020 11:47:25 INFO       Press 'Stop' to save and quit
08/21/2020 11:47:25 INFO     ===================================================
08/21/2020 11:47:26 INFO     Loading data, this may take a while...
08/21/2020 11:47:26 INFO     Loading Model from Dfl_Sae plugin...
08/21/2020 11:47:26 INFO     Using configuration saved in state file
08/21/2020 11:47:27 INFO     Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
08/21/2020 11:47:34 INFO     Loaded model from disk: 'C:\Users\Guan Yi\Desktop\train\Model1\dfl_sae.h5'
08/21/2020 11:47:34 WARNING  Clipnorm has been selected, but is unsupported when using distributed or mixed_precision training, so has been disabled. If you wish to enable clipnorm, then you must disable these options.
08/21/2020 11:47:34 INFO     Loading Trainer from Original plugin...

Reading training images (A):   0%|          | 0/3707 [00:00<?, ?it/s]
Reading training images (A):   1%|          | 43/3707 [00:00<00:09, 371.24it/s]
Reading training images (A):  10%|▉         | 369/3707 [00:00<00:06, 505.41it/s]
Reading training images (A):  13%|█▎        | 474/3707 [00:00<00:06, 534.17it/s]
Reading training images (A):  26%|██▌       | 950/3707 [00:00<00:03, 728.00it/s]
Reading training images (A):  31%|███       | 1157/3707 [00:00<00:02, 903.46it/s]
Reading training images (A):  37%|███▋      | 1364/3707 [00:00<00:02, 1069.82it/s]
Reading training images (A):  43%|████▎     | 1580/3707 [00:00<00:01, 1260.21it/s]
Reading training images (A):  48%|████▊     | 1792/3707 [00:00<00:01, 1405.39it/s]
Reading training images (A):  55%|█████▍    | 2033/3707 [00:01<00:01, 1605.49it/s]
Reading training images (A):  61%|██████    | 2250/3707 [00:01<00:00, 1740.40it/s]
Reading training images (A):  66%|██████▋   | 2465/3707 [00:01<00:00, 1844.67it/s]
Reading training images (A):  73%|███████▎  | 2690/3707 [00:01<00:00, 1948.71it/s]
Reading training images (A):  79%|███████▊  | 2912/3707 [00:01<00:00, 2021.50it/s]
Reading training images (A):  85%|████████▍ | 3137/3707 [00:01<00:00, 2083.43it/s]
Reading training images (A):  91%|█████████ | 3358/3707 [00:01<00:00, 2058.76it/s]
Reading training images (A):  96%|█████████▋| 3573/3707 [00:01<00:00, 1827.56it/s]


Reading training images (B):   0%|          | 0/4045 [00:00<?, ?it/s]
Reading training images (B):   1%|          | 36/4045 [00:00<00:12, 322.14it/s]
Reading training images (B):   6%|▋         | 254/4045 [00:00<00:08, 432.74it/s]
Reading training images (B):  13%|█▎        | 545/4045 [00:00<00:06, 578.81it/s]
Reading training images (B):  20%|█▉        | 796/4045 [00:00<00:04, 752.34it/s]
Reading training images (B):  24%|██▍       | 964/4045 [00:00<00:03, 898.44it/s]
Reading training images (B):  31%|███       | 1241/4045 [00:00<00:02, 1126.47it/s]
Reading training images (B):  36%|███▌      | 1457/4045 [00:00<00:01, 1314.68it/s]
Reading training images (B):  41%|████      | 1662/4045 [00:00<00:01, 1328.80it/s]
Reading training images (B):  48%|████▊     | 1954/4045 [00:00<00:01, 1587.83it/s]
Reading training images (B):  55%|█████▍    | 2218/4045 [00:01<00:01, 1802.56it/s]
Reading training images (B):  61%|██████    | 2469/4045 [00:01<00:00, 1967.89it/s]
Reading training images (B):  67%|██████▋   | 2706/4045 [00:01<00:00, 1675.41it/s]
Reading training images (B):  77%|███████▋  | 3105/4045 [00:01<00:00, 2002.57it/s]
Reading training images (B):  83%|████████▎ | 3358/4045 [00:01<00:00, 1844.54it/s]
Reading training images (B):  92%|█████████▏| 3711/4045 [00:01<00:00, 2151.79it/s]
Reading training images (B):  98%|█████████▊| 3973/4045 [00:01<00:00, 2107.07it/s]
08/21/2020 11:47:38 INFO     Reading alignments from: 'C:\Users\Guan Yi\Desktop\train\Trump_hd\trump_hd_alignments.fsa'
08/21/2020 11:47:38 INFO     Reading alignments from: 'C:\Users\Guan Yi\Desktop\train\Albert_hd\albert_hd_alignments.fsa'
08/21/2020 11:47:39 WARNING  169 alignments have been removed as their corresponding faces do not exist in the input folder for side B. Run in verbose mode if you wish to see which alignments have been excluded.
08/21/2020 11:47:40 INFO     batch_all_reduce: 78 all-reduces with algorithm = hierarchical_copy, num_packs = 1
08/21/2020 11:47:46 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:46 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:46 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:46 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:46 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:46 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:48 INFO     batch_all_reduce: 78 all-reduces with algorithm = hierarchical_copy, num_packs = 1
08/21/2020 11:47:49 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:49 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:49 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
08/21/2020 11:47:49 INFO     Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).

08/21/2020 11:48:10 INFO     [Saved models] - Average loss since last save: face_a: 0.00428, face_b: 0.00553

Is

Code: Select all

Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')

a problem? They are using the same replica. Or it's just the way it works?

Post by **abigflea** » Fri Aug 21, 2020 8:34 am

Well, I suspect this will work. scratches head
the "MirroredStrategy with devices" is correct. No problem there.

Update your NVIDIA drivers to the current first.
I am using this one . https://www.nvidia.com/download/driverR ... 3246/en-us

I that don't work , lets go through the typical steps.

Force Update Windows
Follow this to remove any possible conflicts with CUDA, Conda, or Python.
app.php/faqpage?sid=8a113082dbf6d2351b3 ... e7b0b#f1r1
Then install this, cant hurt, lack of a DLL got me the other day.
Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019
Then run a copy of the fresh installer
https://github.com/deepfakes/faceswap/releases/latest/download/faceswap_setup_x64.exe

Everything should be nice and fresh, and we will go from there.

ericpan0513 · Post by **ericpan0513** » Fri Aug 21, 2020 9:12 am

OK, really thanks for your help!
I won't be able to use this computer during weekends, so maybe I will try this next week.
If it works, I will reply here again.

Post by **abigflea** » Fri Aug 21, 2020 9:15 am

Ill be happy to hear what happens.

Post by **abigflea** » Fri Aug 21, 2020 2:02 pm

It has occurred to me have you checked the distribute box ?

ericpan0513 · Post by **ericpan0513** » Mon Aug 24, 2020 3:26 am

abigflea wrote: ↑Fri Aug 21, 2020 8:34 am
Well, I suspect this will work. scratches head
the "MirroredStrategy with devices" is correct. No problem there.

Update your NVIDIA drivers to the current first.
I am using this one . https://www.nvidia.com/download/driverR ... 3246/en-us

I that don't work , lets go through the typical steps.

Force Update Windows

Follow this to remove any possible conflicts with CUDA, Conda, or Python.
app.php/faqpage?sid=8a113082dbf6d2351b3 ... e7b0b#f1r1

Then install this, cant hurt, lack of a DLL got me the other day.
Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019

Then run a copy of the fresh installer
https://github.com/deepfakes/faceswap/releases/latest/download/faceswap_setup_x64.exe

Everything should be nice and fresh, and we will go from there.

Couldn't update NVIDIA drivers, I think is because p106 is too old or something. "Graphics driver could not found compatible graphic hardware"
I've finished other steps, still not working. Too bad. But thanks for helping.
And yes, my situation only happened when I check the distributed box, the training speed went down(compare to one GPU only) with one GPU 100% load but the GPU temperature won't go up, while the other 6% load, which is really weird.
However, thanks for helping me. Maybe I shouldn't use P106-100 for multiple GPU training. Already spent a few weeks trying to fix the problem.

Faceswap Forum

Issue using dual P106-100s when training

Issue using dual P106-100s when training

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Issue using dual P106-100s when training

Re: Issue using dual P106-100s when training

Re: Issue using dual P106-100s when training

Re: Issue using dual P106-100s when training

Re: Issue using dual P106-100s when training

Re: Issue using dual P106-100s when training

Re: Issue using dual P106-100s when training

Re: Issue using dual P106-100s when training