Can't use 2 GPU's after latest Faceswap update

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
djandg
Posts: 43
Joined: Mon Dec 09, 2019 7:00 pm
Has thanked: 4 times
Been thanked: 2 times

Can't use 2 GPU's after latest Faceswap update

Post by djandg »

After updating Faceswap to the latest version it would not start.. Command box popped up briefly and then disapppeared with insufficient time to see any error message. Nothing in crash files.
Uninstalled MiniConda and deleted Faceswap folder. Reinstalled and now it starts OK.
Love the new manual alignments tool.
BUT
I can no longer use the 2 Nvidia GPU's for training. I can use either GPU_0 or GPU_1 by selecting the "new " omit check box options for GPU_0 or GPU_1 but cannot, by leaving both unchecked, get updated Faceswap to use the 2 GPU's I have - windows and Nvidia performance apps shows other GPU with zero use if Ieave both check boxes free. Rolling back to a system backup done before updating Faceswap and the old "number of GPU's" selection and 2 GPU's works OK.

System Info dump from Faceswap with update indicates both GPU's detected and both active but it will only run at half the previous 2 GPU batch number if only 1 GPU is selected - meaning only 1 GPU is being addressed ?

User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by abigflea »

I'm surprised to hear this. With the recent updates I can do more multi-gpu. My unmatched GPU are having less issues.

Could use a lil more info.
What GPU do you have?
Can you try to start it, let it crash, and post the crash log?
Maybe we can suss out this issue.

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
torzdf
Posts: 2689
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by torzdf »

Unfortunately I won't be able to help much here, as I don't have a Multi-GPU setup. We did test this fairly extensively with [mention]abigflea[/mention] , [mention]pfakanator[/mention], [mention]deephomage[/mention] and others, and it all seems to work as expected (and as abigflea says, results were improved over previous Faceswap).

Outputting and pasting your system information (in the GUI Help>Output System Information) may help to diagnose this.

Also if you get a crash, the crash report will definitely help.

Hopefully one of the above can help shed more light on what you should be expecting.

My word is final

User avatar
djandg
Posts: 43
Joined: Mon Dec 09, 2019 7:00 pm
Has thanked: 4 times
Been thanked: 2 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by djandg »

There are two people on this thread with a similar issue, myself, the original poster, and ericpan0513 .
Don't get the two confused, espcially re the setup.

I have dual boot and can boot to a backup before the Faceswap update and both GPU's get used equally. Booting to OS after update and only 1 GPU used no matter what I try. Same OS, setup etc, just the reinstall of Faceswap to latest version is difference between both boot drives.
Strange.

System info below. 2 x 1070ti, latest Studio drivers both detected and active - but with latest update only 1 GPU is being used, no load distribution.
Tried SLI on, off (not removed SLI bridge, just software switch), Tried NVidia Studio and Game Ready Drivers, even default MS driver, no joy. I'm at a loss. Currently using "old" Faceswap boot drive for training and "new" for manual alignments. Not an ideal situation.

Code: Select all

No crash log generated as training never starts with batch size previously used with 2 GPU's - insufficient GPU memory error. Lowering batch to size for 1 GPU allows it to run, on either GPU_0 or GPU_1 by using the exclued function but not both and obviously at the reduced speed.
[color=#FF0000]08/20/2020 09:50:33 CRITICAL Error caught! Exiting...
08/20/2020 09:50:33 ERROR    Caught exception in thread: '_training_0'
08/20/2020 09:50:33 ERROR    You do not have enough GPU memory available to train the selected model at the selected settings. You can try a number of things:
blah blah blah[/color]

Code: Select all

Sys info 
============ System Information ============
encoding:            cp1252
git_branch:          master
git_commits:         0a25dff model.config - Make convert batchsize a user configurable option
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: GeForce GTX 1070 Ti, GPU_1: GeForce GTX 1070 Ti
gpu_devices_active:  GPU_0, GPU_1
gpu_driver:          452.06
gpu_vram:            GPU_0: 8192MB, GPU_1: 8192MB
os_machine:          AMD64
os_platform:         Windows-10-10.0.19041-SP0
os_release:          10
py_command:          C:\Users\HOME\faceswap/faceswap.py gui
py_conda_version:    conda 4.8.4
py_implementation:   CPython
py_version:          3.8.5
py_virtual_env:      True
sys_cores:           6
sys_processor:       Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
sys_ram:             Total: 16313MB, Available: 11470MB, Used: 4843MB, Free: 11470MB

=============== Pip Packages ===============
absl-py==0.10.0
astunparse==1.6.3
cachetools==4.1.1
certifi==2020.6.20
chardet==3.0.4
cycler==0.10.0
fastcluster==1.1.26
ffmpy==0.2.3
gast==0.3.3
google-auth==1.20.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.31.0
h5py==2.10.0
idna==2.10
imageio @ file:///tmp/build/80754af9/imageio_1594161405741/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1589202782679/work
joblib @ file:///tmp/build/80754af9/joblib_1594236160679/work
Keras-Preprocessing==1.1.2
kiwisolver==1.2.0
Markdown==3.2.2
matplotlib @ file:///C:/ci/matplotlib-base_1592837548929/work
mkl-fft==1.1.0
mkl-random==1.1.1
mkl-service==2.3.0
numpy @ file:///C:/ci/numpy_and_numpy_base_1596215850360/work
nvidia-ml-py3 @ git+https://github.com/deepfakes/nvidia-ml-py3.git@6fc29ac84b32bad877f078cb4a777c1548a00bf6
oauthlib==3.1.0
olefile==0.46
opencv-python==4.4.0.42
opt-einsum==3.3.0
pathlib==1.0.1
Pillow @ file:///C:/ci/pillow_1594298230227/work
protobuf==3.13.0
psutil==5.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==2.4.7
python-dateutil==2.8.1
pywin32==227
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
scikit-learn @ file:///C:/ci/scikit-learn_1592853510272/work
scipy==1.4.1
sip==4.19.13
six==1.15.0
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorflow-gpu==2.2.0
tensorflow-gpu-estimator==2.2.0
termcolor==1.1.0
threadpoolctl @ file:///tmp/tmp9twdgx9k/threadpoolctl-2.1.0-py3-none-any.whl
tornado==6.0.4
tqdm @ file:///tmp/build/80754af9/tqdm_1596810128862/work
urllib3==1.25.10
Werkzeug==1.0.1
wincertstore==0.2
wrapt==1.12.1

============== Conda Packages ==============
# packages in environment at C:\Users\HOME\MiniConda3\envs\faceswap:
#
# Name                    Version                   Build  Channel
absl-py                   0.10.0                   pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
blas                      1.0                         mkl  
ca-certificates 2020.6.24 0
cachetools 4.1.1 pypi_0 pypi certifi 2020.6.20 py38_0
chardet 3.0.4 pypi_0 pypi cudatoolkit 10.1.243 h74a9793_0
cudnn 7.6.5 cuda10.1_0
cycler 0.10.0 py38_0
fastcluster 1.1.26 py38hbe40bda_1 conda-forge ffmpeg 4.3.1 ha925a31_0 conda-forge ffmpy 0.2.3 pypi_0 pypi freetype 2.10.2 hd328e21_0
gast 0.3.3 pypi_0 pypi git 2.23.0 h6bb4b03_0
google-auth 1.20.1 pypi_0 pypi google-auth-oauthlib 0.4.1 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.31.0 pypi_0 pypi h5py 2.10.0 pypi_0 pypi icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha925a31_3
idna 2.10 pypi_0 pypi imageio 2.9.0 py_0
imageio-ffmpeg 0.4.2 py_0 conda-forge intel-openmp 2020.1 216
joblib 0.16.0 py_0
jpeg 9b hb83a4c4_2
keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.2.0 py38h74a9793_0
libpng 1.6.37 h2a8f88b_0
libtiff 4.1.0 h56a325e_1
lz4-c 1.9.2 h62dcd97_1
markdown 3.2.2 pypi_0 pypi matplotlib 3.2.2 0
matplotlib-base 3.2.2 py38h64f37c6_0
mkl 2020.1 216
mkl-service 2.3.0 py38hb782905_0
mkl_fft 1.1.0 py38h45dec08_0
mkl_random 1.1.1 py38h47e9c7a_0
numpy 1.19.1 py38h5510c5b_0
numpy-base 1.19.1 py38ha3acd2a_0
nvidia-ml-py3 7.352.1 pypi_0 pypi oauthlib 3.1.0 pypi_0 pypi olefile 0.46 py_0
opencv-python 4.4.0.42 pypi_0 pypi openssl 1.1.1g he774522_1
opt-einsum 3.3.0 pypi_0 pypi pathlib 1.0.1 py_1
pillow 7.2.0 py38hcc1f983_0
pip 20.2.2 py38_0
protobuf 3.13.0 pypi_0 pypi psutil 5.7.0 py38he774522_0
pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 2.4.7 py_0
pyqt 5.9.2 py38ha925a31_4
python 3.8.5 he1778fa_0
python-dateutil 2.8.1 py_0
python_abi 3.8 1_cp38 conda-forge pywin32 227 py38he774522_1
qt 5.9.7 vc14h73c81de_0
requests 2.24.0 pypi_0 pypi requests-oauthlib 1.3.0 pypi_0 pypi rsa 4.6 pypi_0 pypi scikit-learn 0.23.1 py38h25d0782_0
scipy 1.4.1 pypi_0 pypi setuptools 49.6.0 py38_0
sip 4.19.13 py38ha925a31_0
six 1.15.0 py_0
sqlite 3.32.3 h2a8f88b_0
tensorboard 2.2.2 pypi_0 pypi tensorboard-plugin-wit 1.7.0 pypi_0 pypi tensorflow-gpu 2.2.0 pypi_0 pypi tensorflow-gpu-estimator 2.2.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 2.1.0 pyh5ca1d4c_0
tk 8.6.10 he774522_0
tornado 6.0.4 py38he774522_1
tqdm 4.48.2 py_0
urllib3 1.25.10 pypi_0 pypi vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_3
werkzeug 1.0.1 pypi_0 pypi wheel 0.34.2 py38_0
wincertstore 0.2 py38_0
wrapt 1.12.1 pypi_0 pypi xz 5.2.5 h62dcd97_0
zlib 1.2.11 h62dcd97_4
zstd 1.4.5 h04227a9_0 ================= Configs ================== --------- .faceswap --------- backend: nvidia --------- convert.ini --------- [color.color_transfer] clip: True preserve_paper: True [color.manual_balance] colorspace: HSV balance_1: 0.0 balance_2: 0.0 balance_3: 0.0 contrast: 0.0 brightness: 0.0 [color.match_hist] threshold: 99.0 [mask.box_blend] type: gaussian distance: 11.0 radius: 5.0 passes: 1 [mask.mask_blend] type: normalized kernel_size: 3 passes: 4 threshold: 4 erosion: 0.0 [scaling.sharpen] method: unsharp_mask amount: 150 radius: 0.3 threshold: 5.0 [writer.ffmpeg] container: mp4 codec: libx264 crf: 23 preset: medium tune: none profile: auto level: auto skip_mux: False [writer.gif] fps: 25 loop: 0 palettesize: 256 subrectangles: False [writer.opencv] format: png draw_transparent: False jpg_quality: 75 png_compress_level: 3 [writer.pillow] format: png draw_transparent: False optimize: False gif_interlace: True jpg_quality: 75 png_compress_level: 3 tif_compression: tiff_deflate --------- extract.ini --------- [global] allow_growth: False [align.fan] batch-size: 12 [detect.cv2_dnn] confidence: 50 [detect.mtcnn] minsize: 20 threshold_1: 0.6 threshold_2: 0.7 threshold_3: 0.7 scalefactor: 0.709 batch-size: 8 [detect.s3fd] confidence: 70 batch-size: 4 [mask.unet_dfl] batch-size: 8 [mask.vgg_clear] batch-size: 6 [mask.vgg_obstructed] batch-size: 2 --------- gui.ini --------- [global] fullscreen: False tab: extract options_panel_width: 30 console_panel_height: 20 icon_size: 14 font: default font_size: 9 autosave_last_session: prompt timeout: 120 auto_load_model_stats: True --------- train.ini --------- [global] coverage: 68.75 mask_type: extended mask_blur_kernel: 3 mask_threshold: 4 learn_mask: False penalized_mask_loss: True loss_function: mae icnr_init: False conv_aware_init: False optimizer: adam learning_rate: 5e-05 reflect_padding: False allow_growth: False mixed_precision: False convert_batchsize: 16 [model.dfl_h128] lowmem: False [model.dfl_sae] input_size: 128 clipnorm: True architecture: df autoencoder_dims: 0 encoder_dims: 42 decoder_dims: 21 multiscale_decoder: False [model.dlight] features: best details: good output_size: 256 [model.original] lowmem: False [model.realface] input_size: 64 output_size: 128 dense_nodes: 1536 complexity_encoder: 128 complexity_decoder: 512 [model.unbalanced] input_size: 128 lowmem: False clipnorm: True nodes: 1024 complexity_encoder: 128 complexity_decoder_a: 384 complexity_decoder_b: 512 [model.villain] lowmem: False [trainer.original] preview_images: 14 zoom_amount: 5 rotation_range: 10 shift_range: 5 flip_chance: 50 color_lightness: 30 color_ab: 8 color_clahe_chance: 50 color_clahe_max_size: 4
Attachments
error.txt
(730 Bytes) Downloaded 342 times
SYS info.txt
(12.88 KiB) Downloaded 349 times
User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by abigflea »

The lower batch is expected. With the newer version of Faceswap , the batch has changed.
Effectively a batch of 8 on old is a batch of 4 on the new (not exactly, also more efficient).
Additionally the Models have changed a bit but should convert over.

I don't see anything standing out in the log.
Maybe someone else will see something
In the meantime, I will see if I can replicate your issue in Windows, typically I'm in Linux for FS.

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
djandg
Posts: 43
Joined: Mon Dec 09, 2019 7:00 pm
Has thanked: 4 times
Been thanked: 2 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by djandg »

Quote
"The lower batch is expected. With the newer version of Faceswap , the batch has changed.
Effectively a batch of 8 on old is a batch of 4 on the new (not exactly, also more efficient). "

This is important info and not conveyed anywhere.
Running a optimum batch size test does indeed give a batch size about 50% of pre-update. However perfomance monitor also indicates that the second GPU has zero utilisation, so batch size would be half anyway as only 1 GPU of a 2 GPU system is being used ? Nivida cards as opposed to the other chaps mining cards.

Image

I'm also going to do a clean install of Win10 as a test and see, but even if successful it's not a real solution. That will take me a bit of time.

User avatar
torzdf
Posts: 2689
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by torzdf »

Just to give you some context on this. We cannot use the "old" method for multi-gpu as it is not supported in Tensorflow 2.

We are using the standard method for multiple GPU training under Tensorflow (specifically the "mirror" strategy). You can (and probably should) read up more about it here:
https://www.tensorflow.org/guide/distributed_training

From our end, all we are effectively doing is "flicking a switch" to turn on distributed training, which would suggest to me that the issue is upstream somewhere.

That being said, I cannot speak for how GPU utilization is reflected under a multi-gpu setup (under Windows or otherwise) as I don't have a Multi-GPU setup to test on, hopefully abigflea will report back soon. It may be worth clicking on the GPUs within task manager, and checking their Cuda Utilization (I'm guessing here).

djandg wrote: Thu Aug 20, 2020 10:55 am

Quote
"The lower batch is expected. With the newer version of Faceswap , the batch has changed.
Effectively a batch of 8 on old is a batch of 4 on the new (not exactly, also more efficient). "

This is important info and not conveyed anywhere.

It is reflected in the batchsize ToolTip. I will also be updating the Training guide, however, I hope you can appreciate that my time is limited at the moment, and fixing these kinds of bugs/getting to the bottom of these issues, is my priority at the moment.

My word is final

User avatar
djandg
Posts: 43
Joined: Mon Dec 09, 2019 7:00 pm
Has thanked: 4 times
Been thanked: 2 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by djandg »

Quite understand time and capacity :)

UPDATE:
Fresh install of Windows 10 and just Faceswap - still uses only 1 GPU out of the two and can swap between the GPU's using the Exclude option. Just not both at the same time.
Looking at Tensorflow description of mirroring, it mentions NCCL communication over PCI, so SLI bridge removed to force PCI communication. No Joy.
There is no BIOS update for the board in case it's PCI communication address issues but Sys Info seems to indicate Tensorflow has found both GPU's and therefore should utilise them.

If I had hair, I'd be pulling it out - :lol:

User avatar
torzdf
Posts: 2689
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by torzdf »

Can you do me a favour please.

Could you hit the "Generate" button and post the output when you have it setup for what you would hope is multi-gpu training. I just want to check that everything looks ok in terms of the commands being sent to Faceswap.

Thanks.

My word is final

User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by abigflea »

djandg wrote: Thu Aug 20, 2020 3:42 pm

Fresh install of Windows 10 and just Faceswap - still uses only 1 GPU out of the two and can swap between the GPU's using the Exclude option. Just not both at the same time.
Looking at Tensorflow description of mirroring, it mentions NCCL communication over PCI, so SLI bridge removed to force PCI communication. No Joy.

I was wondering about the SLI bridge.
O.K. so far I have been unable to replicate your issue.... and boy oh boy I've tried.
with Mp, without MP, clipnorm on/off, different loss functions, various situations of resuming models, yelling.
Torzdf had me try some other things, and I'm continuing for a few more.

All I've learned so far is a single 2070 with Mixed precision on is very fast.

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
torzdf
Posts: 2689
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by torzdf »

Ultimately, [mention]djandg[/mention] what I'm getting at, is you have definitely enabled this option, right?

Image 7.png
Image 7.png (3.39 KiB) Viewed 6660 times

My word is final

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Can't use 2 GPU's after latest Faceswap update

Post by bryanlyon »

SLI bridge shouldn't harm anything, in fact, NCCL is designed (by Nvidia) to use the fastest path at all times. Don't worry about removing the bridge, BUT you might have to turn off "SLI" in your settings (or at least make sure nothing is using the SLI feature which may prevent us from using both cards).

User avatar
djandg
Posts: 43
Joined: Mon Dec 09, 2019 7:00 pm
Has thanked: 4 times
Been thanked: 2 times

Re: Can't use 2 GPU's after latest Faceswap update - Resolved - pass the wet fish.

Post by djandg »

OK, put a Dunces hat on me and stick me in the corner, or hit me with a wet fish. whatever is the most effective :lol:
I worked it out at 1am UK time last night and only post now.

Upgrading Faceswap to the newest version would initially not allow my multiple GPU system to use more than one GPU when training.

Thanks to comments from Torzdf and Abigflea the resolution, for similar dullards to me, is :-

The "Distribute" box has to be selected on every new training project to allow multiple GPU's to be used. Easily overlooked when you're the sort of person that insists they know the best route from A to B without using a map or GPS ! My wife always says "you never ask for directions"

The batch size is also different with the latest FS version, so the "optimum batch size" test (as written in the training documentation) has to be run again for each model chosen and my experience is that the optimum batch is now just over 50% of what is was before.
With Realface and a 2 x GTX 1070ti system, it goes from the old optimum batch size of 26 to a new of 16.
Even with this lower apparent batch size, the EGs/sec goes up from 27 to 35, so more efficient = faster training.

Ta :)

User avatar
torzdf
Posts: 2689
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by torzdf »

I'm glad we got there. It's always a bit tricky when you're used to a certain workflow and things change.

The reason the batch sizes are different (in case you're interested), is that the models used to be trained in an iterative fashion (i.e. with a batch size of 24, it would train A on 24 images then B on 24 images) so there would only ever be 24 images in the model at the same time.

The architecture has changed a bit now, so we no longer iterate, rather we feed both sides into the model at the same time. So a batch size of 24 will now mean that there are 48 images within the model at the same time, hence the requirement to reduce the batch size.

I thought about the best way to do this, and didn't want to force a requirement of making sure that the batch size was always divisible by 2, hence why the batchsize needs to be about half of what it used to be (the reality being that if you give it a batchsize of 24, the actual batch size is actually 48).

My word is final

User avatar
djandg
Posts: 43
Joined: Mon Dec 09, 2019 7:00 pm
Has thanked: 4 times
Been thanked: 2 times

Re: Can't use 2 GPU's after latest Faceswap update

Post by djandg »

Thanks for the clarification on batch size. makes sense, as image size fed in and memory hasn't changed.

Sorry to have caused you all some work. Hopefully others who may come across the issue read this and get the message :)

Locked