Very slow training performance - same hardware

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
derad
Posts: 4
Joined: Sat Sep 26, 2020 2:51 pm

Very slow training performance - same hardware

Post by derad »

Hi all,

I'm running training on an RTX 3080 16GB laptop card, i7-11800H, 32 GB RAM, with Dfaker, batch size of 28.

I'm only seeing 5.6 EGs/sec during a 12 hour training run.

When I've run in the past on the same hardware, I've seen well over 10x that. I recently had to reformat and reinstall Windows, and now I'm seeing this really slow training.

I have confirmed that the GPU is enabled (or at least, it's not checked to be disabled under Global Options). What else should I be looking into?

Thank you!

Edit: system info.

Code: Select all

mation ============
encoding:            cp1252
git_branch:          master
git_commits:         09c7d8a Merge branch 'staging' of https://github.com/deepfakes/faceswap into staging
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: NVIDIA GeForce RTX 3080 Laptop GPU
gpu_devices_active:  GPU_0
gpu_driver:          471.75
gpu_vram:            GPU_0: 16384MB
os_machine:          AMD64
os_platform:         Windows-10-10.0.19043-SP0
os_release:          10
py_command:          C:\Users\aaron\faceswap/faceswap.py gui
py_conda_version:    conda 4.11.0
py_implementation:   CPython
py_version:          3.8.12
py_virtual_env:      True
sys_cores:           16
sys_processor:       Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
sys_ram:             Total: 32429MB, Available: 15169MB, Used: 17259MB, Free: 15169MB

=============== Pip Packages ===============
absl-py==0.15.0
astunparse==1.6.3
cachetools==4.2.4
certifi==2021.10.8
charset-normalizer==2.0.10
clang==5.0
cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work
fastcluster==1.1.26
ffmpy==0.2.3
flatbuffers==1.12
gast==0.4.0
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.43.0
h5py==3.1.0
idna==3.3
imageio @ file:///tmp/build/80754af9/imageio_1617700267927/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1629987409325/work
importlib-metadata==4.10.0
joblib @ file:///tmp/build/80754af9/joblib_1635411271373/work
keras==2.6.0
Keras-Preprocessing==1.1.2
kiwisolver @ file:///C:/ci/kiwisolver_1612282606037/work
Markdown==3.3.6
matplotlib @ file:///C:/ci/matplotlib-base_1592837548929/work
mkl-fft==1.3.0
mkl-random==1.1.1
mkl-service==2.3.0
numpy @ file:///C:/ci/numpy_and_numpy_base_1603466732592/work
nvidia-ml-py==11.495.46
oauthlib==3.1.1
olefile @ file:///Users/ktietz/demo/mc3/conda-bld/olefile_1629805411829/work
opencv-python==4.5.5.62
opt-einsum==3.3.0
Pillow==8.4.0
protobuf==3.19.3
psutil @ file:///C:/ci/psutil_1612298324802/work
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing @ file:///tmp/build/80754af9/pyparsing_1635766073266/work
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
pywin32==302
requests==2.27.1
requests-oauthlib==1.3.0
rsa==4.8
scikit-learn @ file:///C:/ci/scikit-learn_1641891148727/work
scipy @ file:///C:/ci/scipy_1616703433439/work
sip==4.19.13
six==1.15.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow-estimator==2.6.0
tensorflow-gpu==2.6.2
termcolor==1.1.0
threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work
tornado @ file:///C:/ci/tornado_1606942392901/work
tqdm @ file:///tmp/build/80754af9/tqdm_1635330843403/work
typing-extensions==3.7.4.3
urllib3==1.26.8
Werkzeug==2.0.2
wincertstore==0.2
wrapt==1.12.1
zipp==3.7.0

============== Conda Packages ==============
# packages in environment at C:\Users\aaron\MiniConda3\envs\faceswap:
#
# Name                    Version                   Build  Channel
absl-py                   0.15.0                   pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
blas                      1.0                         mkl  
ca-certificates 2021.10.26 haa95532_2
cachetools 4.2.4 pypi_0 pypi certifi 2021.10.8 py38haa95532_2
charset-normalizer 2.0.10 pypi_0 pypi clang 5.0 pypi_0 pypi cycler 0.11.0 pyhd3eb1b0_0
fastcluster 1.1.26 py38h5d928e2_3 conda-forge ffmpeg 4.3.1 ha925a31_0 conda-forge ffmpy 0.2.3 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi freetype 2.10.4 hd328e21_0
gast 0.4.0 pypi_0 pypi git 2.32.0 haa95532_1
google-auth 1.35.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.43.0 pypi_0 pypi h5py 3.1.0 pypi_0 pypi icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha925a31_3
idna 3.3 pypi_0 pypi imageio 2.9.0 pyhd3eb1b0_0
imageio-ffmpeg 0.4.5 pyhd8ed1ab_0 conda-forge importlib-metadata 4.10.0 pypi_0 pypi intel-openmp 2021.4.0 haa95532_3556
joblib 1.1.0 pyhd3eb1b0_0
jpeg 9d h2bbff1b_0
keras 2.6.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.3.1 py38hd77b12b_0
libpng 1.6.37 h2a8f88b_0
libtiff 4.2.0 hd0e1b90_0
libwebp 1.2.0 h2bbff1b_0
lz4-c 1.9.3 h2bbff1b_1
markdown 3.3.6 pypi_0 pypi matplotlib 3.2.2 0
matplotlib-base 3.2.2 py38h64f37c6_0
mkl 2020.2 256
mkl-service 2.3.0 py38h196d8e1_0
mkl_fft 1.3.0 py38h46781fe_0
mkl_random 1.1.1 py38h47e9c7a_0
numpy 1.19.2 py38hadc3359_0
numpy-base 1.19.2 py38ha3acd2a_0
nvidia-ml-py 11.495.46 pypi_0 pypi oauthlib 3.1.1 pypi_0 pypi olefile 0.46 pyhd3eb1b0_0
opencv-python 4.5.5.62 pypi_0 pypi openssl 1.1.1l h2bbff1b_0
opt-einsum 3.3.0 pypi_0 pypi pillow 8.4.0 py38hd45dc43_0
pip 21.2.2 py38haa95532_0
protobuf 3.19.3 pypi_0 pypi psutil 5.8.0 py38h2bbff1b_1
pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.4 pyhd3eb1b0_0
pyqt 5.9.2 py38ha925a31_4
python 3.8.12 h6244533_0
python-dateutil 2.8.2 pyhd3eb1b0_0
python_abi 3.8 2_cp38 conda-forge pywin32 302 py38h827c3e9_1
qt 5.9.7 vc14h73c81de_0
requests 2.27.1 pypi_0 pypi requests-oauthlib 1.3.0 pypi_0 pypi rsa 4.8 pypi_0 pypi scikit-learn 1.0.2 py38hf11a4ad_0
scipy 1.6.2 py38h14eb087_0
setuptools 58.0.4 py38haa95532_0
sip 4.19.13 py38ha925a31_0
six 1.15.0 pypi_0 pypi sqlite 3.37.0 h2bbff1b_0
tensorboard 2.6.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow-estimator 2.6.0 pypi_0 pypi tensorflow-gpu 2.6.2 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 2.2.0 pyh0d69192_0
tk 8.6.11 h2bbff1b_0
tornado 6.1 py38h2bbff1b_0
tqdm 4.62.3 pyhd3eb1b0_1
typing-extensions 3.7.4.3 pypi_0 pypi urllib3 1.26.8 pypi_0 pypi vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
werkzeug 2.0.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
wincertstore 0.2 py38haa95532_2
wrapt 1.12.1 pypi_0 pypi xz 5.2.5 h62dcd97_0
zipp 3.7.0 pypi_0 pypi zlib 1.2.11 h8cc25b3_4
zstd 1.4.9 h19a0ad4_0 ================= Configs ================== --------- .faceswap --------- backend: nvidia --------- convert.ini --------- [color.color_transfer] clip: True preserve_paper: True [color.manual_balance] colorspace: HSV balance_1: 0.0 balance_2: 0.0 balance_3: 0.0 contrast: 0.0 brightness: 0.0 [color.match_hist] threshold: 99.0 [mask.box_blend] type: gaussian distance: 11.0 radius: 5.0 passes: 8 [mask.mask_blend] type: gaussian kernel_size: 3 passes: 8 threshold: 4 erosion: 0.0 [scaling.sharpen] method: unsharp_mask amount: 150 radius: 0.3 threshold: 5.0 [writer.ffmpeg] container: mp4 codec: libx264 crf: 23 preset: medium tune: None profile: auto level: auto skip_mux: False [writer.gif] fps: 25 loop: 0 palettesize: 256 subrectangles: False [writer.opencv] format: png draw_transparent: False jpg_quality: 75 png_compress_level: 0 [writer.pillow] format: png draw_transparent: False optimize: False gif_interlace: True jpg_quality: 75 png_compress_level: 3 tif_compression: tiff_deflate --------- extract.ini --------- [global] allow_growth: True [align.fan] batch-size: 12 [detect.cv2_dnn] confidence: 50 [detect.mtcnn] minsize: 20 scalefactor: 0.709 batch-size: 8 threshold_1: 0.6 threshold_2: 0.7 threshold_3: 0.7 [detect.s3fd] confidence: 70 batch-size: 4 [mask.bisenet_fp] batch-size: 8 include_ears: False include_hair: False include_glasses: True [mask.unet_dfl] batch-size: 8 [mask.vgg_clear] batch-size: 6 [mask.vgg_obstructed] batch-size: 2 --------- gui.ini --------- [global] fullscreen: False tab: extract options_panel_width: 30 console_panel_height: 20 icon_size: 14 font: default font_size: 9 autosave_last_session: prompt timeout: 120 auto_load_model_stats: True --------- train.ini --------- [global] centering: face coverage: 87.5 icnr_init: False conv_aware_init: False optimizer: adam learning_rate: 5e-05 epsilon_exponent: -7 reflect_padding: False allow_growth: True mixed_precision: False nan_protection: True convert_batchsize: 16 [global.loss] loss_function: ssim mask_loss_function: mse l2_reg_term: 100 eye_multiplier: 3 mouth_multiplier: 2 penalized_mask_loss: True mask_type: extended mask_blur_kernel: 3 mask_threshold: 4 learn_mask: False [model.dfaker] output_size: 128 [model.dfl_h128] lowmem: False [model.dfl_sae] input_size: 128 clipnorm: True architecture: df autoencoder_dims: 0 encoder_dims: 42 decoder_dims: 21 multiscale_decoder: False [model.dlight] features: best details: good output_size: 256 [model.original] lowmem: False [model.phaze_a] output_size: 128 shared_fc: None enable_gblock: True split_fc: True split_gblock: False split_decoders: False enc_architecture: fs_original enc_scaling: 40 enc_load_weights: True bottleneck_type: dense bottleneck_norm: None bottleneck_size: 1024 bottleneck_in_encoder: True fc_depth: 1 fc_min_filters: 1024 fc_max_filters: 1024 fc_dimensions: 4 fc_filter_slope: -0.5 fc_dropout: 0.0 fc_upsampler: upsample2d fc_upsamples: 1 fc_upsample_filters: 512 fc_gblock_depth: 3 fc_gblock_min_nodes: 512 fc_gblock_max_nodes: 512 fc_gblock_filter_slope: -0.5 fc_gblock_dropout: 0.0 dec_upscale_method: subpixel dec_norm: None dec_min_filters: 64 dec_max_filters: 512 dec_filter_slope: -0.45 dec_res_blocks: 1 dec_output_kernel: 5 dec_gaussian: True dec_skip_last_residual: True freeze_layers: keras_encoder load_layers: encoder fs_original_depth: 4 fs_original_min_filters: 128 fs_original_max_filters: 1024 mobilenet_width: 1.0 mobilenet_depth: 1 mobilenet_dropout: 0.001 [model.realface] input_size: 64 output_size: 128 dense_nodes: 1536 complexity_encoder: 128 complexity_decoder: 512 [model.unbalanced] input_size: 128 lowmem: False clipnorm: True nodes: 1024 complexity_encoder: 128 complexity_decoder_a: 384 complexity_decoder_b: 512 [model.villain] lowmem: False [trainer.original] preview_images: 14 zoom_amount: 5 rotation_range: 10 shift_range: 5 flip_chance: 50 color_lightness: 30 color_ab: 8 color_clahe_chance: 50 color_clahe_max_size: 4
User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Very slow training performance - same hardware

Post by bryanlyon »

It sounds like it's not using the GPU.

app.php/faqpage#f1r1

User avatar
derad
Posts: 4
Joined: Sat Sep 26, 2020 2:51 pm

Re: Very slow training performance - same hardware

Post by derad »

Hi, thanks for that. I followed the steps exactly and Faceswap stopped launching entirely now.

I was having some other system issues so I reformatted and reinstalled the machine. Installed everything from scratch. Ensured I had the most up-to-date drivers. Still won't start, the CMD prompt launches and then it all closes.

Any thoughts as to next steps or how to identify the issue?

User avatar
derad
Posts: 4
Joined: Sat Sep 26, 2020 2:51 pm

Re: Very slow training performance - same hardware

Post by derad »

I read from other posts on here how to launch myself from CMD and here's the error it's throwing:

Code: Select all

Setting Faceswap backend to NVIDIA
Traceback (most recent call last):
  File "C:\Users\XXX\faceswap/faceswap.py", line 6, in <module>
    from lib.cli import args as cli_args
  File "C:\Users\XXX\faceswap\lib\cli\args.py", line 22, in <module>
    _GPUS = GPUStats().cli_devices
  File "C:\Users\XXX\faceswap\lib\gpu_stats.py", line 78, in __init__
    self._driver = self._get_driver()
  File "C:\Users\XXX\faceswap\lib\gpu_stats.py", line 273, in _get_driver
    driver = pynvml.nvmlSystemGetDriverVersion().decode("utf-8")
AttributeError: 'str' object has no attribute 'decode'
User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Very slow training performance - same hardware

Post by torzdf »

This bug has been fixed. Please delete your faceswap folder + re-run the installer.

My word is final

Locked