Can't use 2 GPU's after latest Faceswap update

djandg · Post by **djandg** » Thu Aug 20, 2020 1:01 am

After updating Faceswap to the latest version it would not start.. Command box popped up briefly and then disapppeared with insufficient time to see any error message. Nothing in crash files.
Uninstalled MiniConda and deleted Faceswap folder. Reinstalled and now it starts OK.
Love the new manual alignments tool.
BUT
I can no longer use the 2 Nvidia GPU's for training. I can use either GPU_0 or GPU_1 by selecting the "new " omit check box options for GPU_0 or GPU_1 but cannot, by leaving both unchecked, get updated Faceswap to use the 2 GPU's I have - windows and Nvidia performance apps shows other GPU with zero use if Ieave both check boxes free. Rolling back to a system backup done before updating Faceswap and the old "number of GPU's" selection and 2 GPU's works OK.

System Info dump from Faceswap with update indicates both GPU's detected and both active but it will only run at half the previous 2 GPU batch number if only 1 GPU is selected - meaning only 1 GPU is being addressed ?

Post by **abigflea** » Thu Aug 20, 2020 3:12 am

I'm surprised to hear this. With the recent updates I can do more multi-gpu. My unmatched GPU are having less issues.

Could use a lil more info.
What GPU do you have?
Can you try to start it, let it crash, and post the crash log?
Maybe we can suss out this issue.

Post by **torzdf** » Thu Aug 20, 2020 9:06 am

Unfortunately I won't be able to help much here, as I don't have a Multi-GPU setup. We did test this fairly extensively with [mention]abigflea[/mention] , [mention]pfakanator[/mention], [mention]deephomage[/mention] and others, and it all seems to work as expected (and as abigflea says, results were improved over previous Faceswap).

Outputting and pasting your system information (in the GUI Help>Output System Information) may help to diagnose this.

Also if you get a crash, the crash report will definitely help.

Hopefully one of the above can help shed more light on what you should be expecting.

djandg · Post by **djandg** » Thu Aug 20, 2020 9:54 am

There are two people on this thread with a similar issue, myself, the original poster, and ericpan0513 .
Don't get the two confused, espcially re the setup.

I have dual boot and can boot to a backup before the Faceswap update and both GPU's get used equally. Booting to OS after update and only 1 GPU used no matter what I try. Same OS, setup etc, just the reinstall of Faceswap to latest version is difference between both boot drives.
Strange.

System info below. 2 x 1070ti, latest Studio drivers both detected and active - but with latest update only 1 GPU is being used, no load distribution.
Tried SLI on, off (not removed SLI bridge, just software switch), Tried NVidia Studio and Game Ready Drivers, even default MS driver, no joy. I'm at a loss. Currently using "old" Faceswap boot drive for training and "new" for manual alignments. Not an ideal situation.

Code: Select all

No crash log generated as training never starts with batch size previously used with 2 GPU's - insufficient GPU memory error. Lowering batch to size for 1 GPU allows it to run, on either GPU_0 or GPU_1 by using the exclued function but not both and obviously at the reduced speed.
[color=#FF0000]08/20/2020 09:50:33 CRITICAL Error caught! Exiting...
08/20/2020 09:50:33 ERROR    Caught exception in thread: '_training_0'
08/20/2020 09:50:33 ERROR    You do not have enough GPU memory available to train the selected model at the selected settings. You can try a number of things:
blah blah blah[/color]

Code: Select all

Sys info 
============ System Information ============
encoding:            cp1252
git_branch:          master
git_commits:         0a25dff model.config - Make convert batchsize a user configurable option
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: GeForce GTX 1070 Ti, GPU_1: GeForce GTX 1070 Ti
gpu_devices_active:  GPU_0, GPU_1
gpu_driver:          452.06
gpu_vram:            GPU_0: 8192MB, GPU_1: 8192MB
os_machine:          AMD64
os_platform:         Windows-10-10.0.19041-SP0
os_release:          10
py_command:          C:\Users\HOME\faceswap/faceswap.py gui
py_conda_version:    conda 4.8.4
py_implementation:   CPython
py_version:          3.8.5
py_virtual_env:      True
sys_cores:           6
sys_processor:       Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
sys_ram:             Total: 16313MB, Available: 11470MB, Used: 4843MB, Free: 11470MB

=============== Pip Packages ===============
absl-py==0.10.0
astunparse==1.6.3
cachetools==4.1.1
certifi==2020.6.20
chardet==3.0.4
cycler==0.10.0
fastcluster==1.1.26
ffmpy==0.2.3
gast==0.3.3
google-auth==1.20.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.31.0
h5py==2.10.0
idna==2.10
imageio @ file:///tmp/build/80754af9/imageio_1594161405741/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1589202782679/work
joblib @ file:///tmp/build/80754af9/joblib_1594236160679/work
Keras-Preprocessing==1.1.2
kiwisolver==1.2.0
Markdown==3.2.2
matplotlib @ file:///C:/ci/matplotlib-base_1592837548929/work
mkl-fft==1.1.0
mkl-random==1.1.1
mkl-service==2.3.0
numpy @ file:///C:/ci/numpy_and_numpy_base_1596215850360/work
nvidia-ml-py3 @ git+https://github.com/deepfakes/nvidia-ml-py3.git@6fc29ac84b32bad877f078cb4a777c1548a00bf6
oauthlib==3.1.0
olefile==0.46
opencv-python==4.4.0.42
opt-einsum==3.3.0
pathlib==1.0.1
Pillow @ file:///C:/ci/pillow_1594298230227/work
protobuf==3.13.0
psutil==5.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==2.4.7
python-dateutil==2.8.1
pywin32==227
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
scikit-learn @ file:///C:/ci/scikit-learn_1592853510272/work
scipy==1.4.1
sip==4.19.13
six==1.15.0
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorflow-gpu==2.2.0
tensorflow-gpu-estimator==2.2.0
termcolor==1.1.0
threadpoolctl @ file:///tmp/tmp9twdgx9k/threadpoolctl-2.1.0-py3-none-any.whl
tornado==6.0.4
tqdm @ file:///tmp/build/80754af9/tqdm_1596810128862/work
urllib3==1.25.10
Werkzeug==1.0.1
wincertstore==0.2
wrapt==1.12.1

============== Conda Packages ==============
# packages in environment at C:\Users\HOME\MiniConda3\envs\faceswap:
#
# Name                    Version                   Build  Channel
absl-py                   0.10.0                   pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
blas                      1.0                         mkl  

ca-certificates           2020.6.24                     0  

cachetools                4.1.1                    pypi_0    pypi
certifi                   2020.6.20                py38_0  

chardet                   3.0.4                    pypi_0    pypi
cudatoolkit               10.1.243             h74a9793_0  

cudnn                     7.6.5                cuda10.1_0  

cycler                    0.10.0                   py38_0  

fastcluster               1.1.26           py38hbe40bda_1    conda-forge
ffmpeg                    4.3.1                ha925a31_0    conda-forge
ffmpy                     0.2.3                    pypi_0    pypi
freetype                  2.10.2               hd328e21_0  

gast                      0.3.3                    pypi_0    pypi
git                       2.23.0               h6bb4b03_0  

google-auth               1.20.1                   pypi_0    pypi
google-auth-oauthlib      0.4.1                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.31.0                   pypi_0    pypi
h5py                      2.10.0                   pypi_0    pypi
icc_rt                    2019.0.0             h0cc432a_1  

icu                       58.2                 ha925a31_3  

idna                      2.10                     pypi_0    pypi
imageio                   2.9.0                      py_0  

imageio-ffmpeg            0.4.2                      py_0    conda-forge
intel-openmp              2020.1                      216  

joblib                    0.16.0                     py_0  

jpeg                      9b                   hb83a4c4_2  

keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.2.0            py38h74a9793_0  

libpng                    1.6.37               h2a8f88b_0  

libtiff                   4.1.0                h56a325e_1  

lz4-c                     1.9.2                h62dcd97_1  

markdown                  3.2.2                    pypi_0    pypi
matplotlib                3.2.2                         0  

matplotlib-base           3.2.2            py38h64f37c6_0  

mkl                       2020.1                      216  

mkl-service               2.3.0            py38hb782905_0  

mkl_fft                   1.1.0            py38h45dec08_0  

mkl_random                1.1.1            py38h47e9c7a_0  

numpy                     1.19.1           py38h5510c5b_0  

numpy-base                1.19.1           py38ha3acd2a_0  

nvidia-ml-py3             7.352.1                  pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
olefile                   0.46                       py_0  

opencv-python             4.4.0.42                 pypi_0    pypi
openssl                   1.1.1g               he774522_1  

opt-einsum                3.3.0                    pypi_0    pypi
pathlib                   1.0.1                      py_1  

pillow                    7.2.0            py38hcc1f983_0  

pip                       20.2.2                   py38_0  

protobuf                  3.13.0                   pypi_0    pypi
psutil                    5.7.0            py38he774522_0  

pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pyparsing                 2.4.7                      py_0  

pyqt                      5.9.2            py38ha925a31_4  

python                    3.8.5                he1778fa_0  

python-dateutil           2.8.1                      py_0  

python_abi                3.8                      1_cp38    conda-forge
pywin32                   227              py38he774522_1  

qt                        5.9.7            vc14h73c81de_0  

requests                  2.24.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.6                      pypi_0    pypi
scikit-learn              0.23.1           py38h25d0782_0  

scipy                     1.4.1                    pypi_0    pypi
setuptools                49.6.0                   py38_0  

sip                       4.19.13          py38ha925a31_0  

six                       1.15.0                     py_0  

sqlite                    3.32.3               h2a8f88b_0  

tensorboard               2.2.2                    pypi_0    pypi
tensorboard-plugin-wit    1.7.0                    pypi_0    pypi
tensorflow-gpu            2.2.0                    pypi_0    pypi
tensorflow-gpu-estimator  2.2.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
threadpoolctl             2.1.0              pyh5ca1d4c_0  

tk                        8.6.10               he774522_0  

tornado                   6.0.4            py38he774522_1  

tqdm                      4.48.2                     py_0  

urllib3                   1.25.10                  pypi_0    pypi
vc                        14.1                 h0510ff6_4  

vs2015_runtime            14.16.27012          hf0eaf9b_3  

werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.34.2                   py38_0  

wincertstore              0.2                      py38_0  

wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h62dcd97_0  

zlib                      1.2.11               h62dcd97_4  

zstd                      1.4.5                h04227a9_0  

================= Configs ==================
--------- .faceswap ---------
backend:                  nvidia

--------- convert.ini ---------

[color.color_transfer]
clip:                     True
preserve_paper:           True

[color.manual_balance]
colorspace:               HSV
balance_1:                0.0
balance_2:                0.0
balance_3:                0.0
contrast:                 0.0
brightness:               0.0

[color.match_hist]
threshold:                99.0

[mask.box_blend]
type:                     gaussian
distance:                 11.0
radius:                   5.0
passes:                   1

[mask.mask_blend]
type:                     normalized
kernel_size:              3
passes:                   4
threshold:                4
erosion:                  0.0

[scaling.sharpen]
method:                   unsharp_mask
amount:                   150
radius:                   0.3
threshold:                5.0

[writer.ffmpeg]
container:                mp4
codec:                    libx264
crf:                      23
preset:                   medium
tune:                     none
profile:                  auto
level:                    auto
skip_mux:                 False

[writer.gif]
fps:                      25
loop:                     0
palettesize:              256
subrectangles:            False

[writer.opencv]
format:                   png
draw_transparent:         False
jpg_quality:              75
png_compress_level:       3

[writer.pillow]
format:                   png
draw_transparent:         False
optimize:                 False
gif_interlace:            True
jpg_quality:              75
png_compress_level:       3
tif_compression:          tiff_deflate

--------- extract.ini ---------

[global]
allow_growth:             False

[align.fan]
batch-size:               12

[detect.cv2_dnn]
confidence:               50

[detect.mtcnn]
minsize:                  20
threshold_1:              0.6
threshold_2:              0.7
threshold_3:              0.7
scalefactor:              0.709
batch-size:               8

[detect.s3fd]
confidence:               70
batch-size:               4

[mask.unet_dfl]
batch-size:               8

[mask.vgg_clear]
batch-size:               6

[mask.vgg_obstructed]
batch-size:               2

--------- gui.ini ---------

[global]
fullscreen:               False
tab:                      extract
options_panel_width:      30
console_panel_height:     20
icon_size:                14
font:                     default
font_size:                9
autosave_last_session:    prompt
timeout:                  120
auto_load_model_stats:    True

--------- train.ini ---------

[global]
coverage:                 68.75
mask_type:                extended
mask_blur_kernel:         3
mask_threshold:           4
learn_mask:               False
penalized_mask_loss:      True
loss_function:            mae
icnr_init:                False
conv_aware_init:          False
optimizer:                adam
learning_rate:            5e-05
reflect_padding:          False
allow_growth:             False
mixed_precision:          False
convert_batchsize:        16

[model.dfl_h128]
lowmem:                   False

[model.dfl_sae]
input_size:               128
clipnorm:                 True
architecture:             df
autoencoder_dims:         0
encoder_dims:             42
decoder_dims:             21
multiscale_decoder:       False

[model.dlight]
features:                 best
details:                  good
output_size:              256

[model.original]
lowmem:                   False

[model.realface]
input_size:               64
output_size:              128
dense_nodes:              1536
complexity_encoder:       128
complexity_decoder:       512

[model.unbalanced]
input_size:               128
lowmem:                   False
clipnorm:                 True
nodes:                    1024
complexity_encoder:       128
complexity_decoder_a:     384
complexity_decoder_b:     512

[model.villain]
lowmem:                   False

[trainer.original]
preview_images:           14
zoom_amount:              5
rotation_range:           10
shift_range:              5
flip_chance:              50
color_lightness:          30
color_ab:                 8
color_clahe_chance:       50
color_clahe_max_size:     4

Post by **abigflea** » Thu Aug 20, 2020 10:05 am

The lower batch is expected. With the newer version of Faceswap , the batch has changed.
Effectively a batch of 8 on old is a batch of 4 on the new (not exactly, also more efficient).
Additionally the Models have changed a bit but should convert over.

I don't see anything standing out in the log.
Maybe someone else will see something
In the meantime, I will see if I can replicate your issue in Windows, typically I'm in Linux for FS.

djandg · Post by **djandg** » Thu Aug 20, 2020 10:55 am

Quote
"The lower batch is expected. With the newer version of Faceswap , the batch has changed.
Effectively a batch of 8 on old is a batch of 4 on the new (not exactly, also more efficient). "

This is important info and not conveyed anywhere.
Running a optimum batch size test does indeed give a batch size about 50% of pre-update. However perfomance monitor also indicates that the second GPU has zero utilisation, so batch size would be half anyway as only 1 GPU of a 2 GPU system is being used ? Nivida cards as opposed to the other chaps mining cards.

I'm also going to do a clean install of Win10 as a test and see, but even if successful it's not a real solution. That will take me a bit of time.

Post by **torzdf** » Thu Aug 20, 2020 11:22 am

Just to give you some context on this. We cannot use the "old" method for multi-gpu as it is not supported in Tensorflow 2.

We are using the standard method for multiple GPU training under Tensorflow (specifically the "mirror" strategy). You can (and probably should) read up more about it here:
https://www.tensorflow.org/guide/distributed_training

From our end, all we are effectively doing is "flicking a switch" to turn on distributed training, which would suggest to me that the issue is upstream somewhere.

That being said, I cannot speak for how GPU utilization is reflected under a multi-gpu setup (under Windows or otherwise) as I don't have a Multi-GPU setup to test on, hopefully abigflea will report back soon. It may be worth clicking on the GPUs within task manager, and checking their Cuda Utilization (I'm guessing here).

djandg wrote: ↑Thu Aug 20, 2020 10:55 am
Quote
"The lower batch is expected. With the newer version of Faceswap , the batch has changed.
Effectively a batch of 8 on old is a batch of 4 on the new (not exactly, also more efficient). "

This is important info and not conveyed anywhere.

It is reflected in the batchsize ToolTip. I will also be updating the Training guide, however, I hope you can appreciate that my time is limited at the moment, and fixing these kinds of bugs/getting to the bottom of these issues, is my priority at the moment.

djandg · Post by **djandg** » Thu Aug 20, 2020 3:42 pm

Quite understand time and capacity

UPDATE:
Fresh install of Windows 10 and just Faceswap - still uses only 1 GPU out of the two and can swap between the GPU's using the Exclude option. Just not both at the same time.
Looking at Tensorflow description of mirroring, it mentions NCCL communication over PCI, so SLI bridge removed to force PCI communication. No Joy.
There is no BIOS update for the board in case it's PCI communication address issues but Sys Info seems to indicate Tensorflow has found both GPU's and therefore should utilise them.

If I had hair, I'd be pulling it out -

Post by **torzdf** » Thu Aug 20, 2020 3:43 pm

Can you do me a favour please.

Could you hit the "Generate" button and post the output when you have it setup for what you would hope is multi-gpu training. I just want to check that everything looks ok in terms of the commands being sent to Faceswap.

Thanks.

Post by **abigflea** » Thu Aug 20, 2020 4:36 pm

djandg wrote: ↑Thu Aug 20, 2020 3:42 pm
Fresh install of Windows 10 and just Faceswap - still uses only 1 GPU out of the two and can swap between the GPU's using the Exclude option. Just not both at the same time.
Looking at Tensorflow description of mirroring, it mentions NCCL communication over PCI, so SLI bridge removed to force PCI communication. No Joy.

I was wondering about the SLI bridge.
O.K. so far I have been unable to replicate your issue.... and boy oh boy I've tried.
with Mp, without MP, clipnorm on/off, different loss functions, various situations of resuming models, yelling.
Torzdf had me try some other things, and I'm continuing for a few more.

All I've learned so far is a single 2070 with Mixed precision on is very fast.

Post by **torzdf** » Thu Aug 20, 2020 6:04 pm

Ultimately, [mention]djandg[/mention] what I'm getting at, is you have definitely enabled this option, right?

: Image 7.png (3.39 KiB) Viewed 6660 times

Post by **bryanlyon** » Thu Aug 20, 2020 7:39 pm

SLI bridge shouldn't harm anything, in fact, NCCL is designed (by Nvidia) to use the fastest path at all times. Don't worry about removing the bridge, BUT you might have to turn off "SLI" in your settings (or at least make sure nothing is using the SLI feature which may prevent us from using both cards).

djandg · Post by **djandg** » Fri Aug 21, 2020 11:22 am

OK, put a Dunces hat on me and stick me in the corner, or hit me with a wet fish. whatever is the most effective
I worked it out at 1am UK time last night and only post now.

Upgrading Faceswap to the newest version would initially not allow my multiple GPU system to use more than one GPU when training.

Thanks to comments from Torzdf and Abigflea the resolution, for similar dullards to me, is :-

The "Distribute" box has to be selected on every new training project to allow multiple GPU's to be used. Easily overlooked when you're the sort of person that insists they know the best route from A to B without using a map or GPS ! My wife always says "you never ask for directions"

The batch size is also different with the latest FS version, so the "optimum batch size" test (as written in the training documentation) has to be run again for each model chosen and my experience is that the optimum batch is now just over 50% of what is was before.
With Realface and a 2 x GTX 1070ti system, it goes from the old optimum batch size of 26 to a new of 16.
Even with this lower apparent batch size, the EGs/sec goes up from 27 to 35, so more efficient = faster training.

Ta

Post by **torzdf** » Fri Aug 21, 2020 11:28 am

I'm glad we got there. It's always a bit tricky when you're used to a certain workflow and things change.

The reason the batch sizes are different (in case you're interested), is that the models used to be trained in an iterative fashion (i.e. with a batch size of 24, it would train A on 24 images then B on 24 images) so there would only ever be 24 images in the model at the same time.

The architecture has changed a bit now, so we no longer iterate, rather we feed both sides into the model at the same time. So a batch size of 24 will now mean that there are 48 images within the model at the same time, hence the requirement to reduce the batch size.

I thought about the best way to do this, and didn't want to force a requirement of making sure that the batch size was always divisible by 2, hence why the batchsize needs to be about half of what it used to be (the reality being that if you give it a batchsize of 24, the actual batch size is actually 48).

djandg · Post by **djandg** » Fri Aug 21, 2020 3:03 pm

Thanks for the clarification on batch size. makes sense, as image size fed in and memory hasn't changed.

Sorry to have caused you all some work. Hopefully others who may come across the issue read this and get the message

Faceswap Forum

Can't use 2 GPU's after latest Faceswap update

Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update - Resolved - pass the wet fish.

Re: Can't use 2 GPU's after latest Faceswap update

Re: Can't use 2 GPU's after latest Faceswap update