Background: I run in the cloud, and my super old image stopped working recently due to s3fd models being moved around. This forces my hand, so I'm trying to set up nVidia drivers from scratch on a GCE instance with a T4.
It's actually pretty easy to bring up an instance and install the basic nVidia stuff. I can't find any global "cuda" libraries anywhere, and if I look for the uninstaller (or run the apt commands to uninstall it), nothing happens.
When I run nvidia-smi, it clearly shows that it sees the T4.
When I get around to running faceswap, it says "Setting Faceswap backend to NVIDIA" which is a good start.
... but when I start training, it outputs the following, and processes incredibly slowly:
Code: Select all
WARNING Mixed precision compatibility check (mixed_float16): WARNING
The dtype policy mixed_float16 may run slowly because this machine does not have a GPU. Only Nvidia GPUs with compute capability of at least 7.0 run quickly with mixed_float16.
If you will use compatible GPU(s) not attached to this host, e.g. by running a multi-worker model, you can ignore this warning. This message will only be logged once
Here's my sysinfo, since that's what the cool kids are posting:
Note there's no global cuda found, which I think is the thing that most people get tripped up by. Also, in the sysinfo, it has no problem seeing that there IS a GPU, so... help?
Code: Select all
============ System Information ============
encoding: UTF-8
git_branch: Not Found
git_commits: Not Found
gpu_cuda: No global version found. Check Conda packages for Conda Cuda
gpu_cudnn: No global version found. Check Conda packages for Conda cuDNN
gpu_devices: GPU_0: Tesla T4
gpu_devices_active: GPU_0
gpu_driver: 495.46
gpu_vram: GPU_0: 15109MB
os_machine: x86_64
os_platform: Linux-5.13.0-1033-gcp-x86_64-with-glibc2.31
os_release: 5.13.0-1033-gcp
py_command: /home/(((redacted_username)))/faceswap/faceswap.py
py_conda_version: conda 4.12.0
py_implementation: CPython
py_version: 3.9.12
py_virtual_env: True
sys_cores: 4
sys_processor: x86_64
sys_ram: Total: 14992MB, Available: 14411MB, Used: 280MB, Free: 11950MB
=============== Pip Packages ===============
absl-py==1.1.0
astunparse==1.6.3
cachetools==5.2.0
certifi==2022.6.15
charset-normalizer==2.0.12
cloudpickle==2.1.0
cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work
decorator==5.1.1
dm-tree==0.1.7
fastcluster @ file:///home/conda/feedstock_root/build_artifacts/fastcluster_1649783242764/work
ffmpy==0.2.3
flatbuffers==2.0
fonttools==4.25.0
gast==0.5.3
google-auth==2.8.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.47.0
h5py==3.7.0
idna==3.3
imageio @ file:///tmp/build/80754af9/imageio_1617700267927/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1649960641006/work
importlib-metadata==4.12.0
joblib @ file:///tmp/build/80754af9/joblib_1635411271373/work
keras==2.8.0
Keras-Preprocessing==1.1.2
kiwisolver @ file:///opt/conda/conda-bld/kiwisolver_1653292039266/work
libclang==14.0.1
Markdown==3.3.7
matplotlib @ file:///tmp/build/80754af9/matplotlib-suite_1647441664166/work
mkl-fft==1.3.1
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186066731/work
mkl-service==2.4.0
munkres==1.1.4
numpy @ file:///opt/conda/conda-bld/numpy_and_numpy_base_1652801679809/work
nvidia-ml-py==11.510.69
oauthlib==3.2.0
opencv-python==4.6.0.66
opt-einsum==3.3.0
packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work
Pillow==9.0.1
protobuf==3.19.4
psutil @ file:///tmp/build/80754af9/psutil_1612297992929/work
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing @ file:///tmp/build/80754af9/pyparsing_1635766073266/work
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
requests==2.28.0
requests-oauthlib==1.3.1
rsa==4.8
scikit-learn @ file:///tmp/build/80754af9/scikit-learn_1642617106979/work
scipy @ file:///tmp/build/80754af9/scipy_1641555004408/work
sip==4.19.13
six @ file:///tmp/build/80754af9/six_1644875935023/work
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow-estimator==2.8.0
tensorflow-gpu==2.8.2
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-probability==0.16.0
termcolor==1.1.0
threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work
tornado @ file:///tmp/build/80754af9/tornado_1606942317143/work
tqdm @ file:///opt/conda/conda-bld/tqdm_1650891076910/work
typing_extensions @ file:///opt/conda/conda-bld/typing_extensions_1647553014482/work
urllib3==1.26.9
Werkzeug==2.1.2
wrapt==1.14.1
zipp==3.8.0
============== Conda Packages ==============
# packages in environment at /home/(((redacted_username)))/miniconda3:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
brotlipy 0.7.0 py39h27cfd23_1003
ca-certificates 2022.3.29 h06a4308_1
certifi 2021.10.8 py39h06a4308_2
cffi 1.15.0 py39hd667e15_1
charset-normalizer 2.0.4 pyhd3eb1b0_0
colorama 0.4.4 pyhd3eb1b0_0
conda 4.12.0 py39h06a4308_0
conda-content-trust 0.1.1 pyhd3eb1b0_0
conda-package-handling 1.8.1 py39h7f8727e_0
cryptography 36.0.0 py39h9ce1e76_0
idna 3.3 pyhd3eb1b0_0
ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgomp 9.3.0 h5101ec6_17
libstdcxx-ng 9.3.0 hd4cf53a_17
ncurses 6.3 h7f8727e_2
openssl 1.1.1n h7f8727e_0
pip 21.2.4 py39h06a4308_0
pycosat 0.6.3 py39h27cfd23_0
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 22.0.0 pyhd3eb1b0_0
pysocks 1.7.1 py39h06a4308_0
python 3.9.12 h12debd9_0
readline 8.1.2 h7f8727e_1
requests 2.27.1 pyhd3eb1b0_0
ruamel_yaml 0.15.100 py39h27cfd23_0
setuptools 61.2.0 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.38.2 hc218d9a_0
tk 8.6.11 h1ccaba5_0
tqdm 4.63.0 pyhd3eb1b0_0
tzdata 2022a hda174b7_0
urllib3 1.26.8 pyhd3eb1b0_0
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zlib 1.2.12 h7f8727e_1
================= Configs ==================
--------- convert.ini ---------
[color.manual_balance]
colorspace: HSV
balance_1: 0.0
balance_2: 0.0
balance_3: 0.0
contrast: 0.0
brightness: 0.0
[color.color_transfer]
clip: True
preserve_paper: True
[color.match_hist]
threshold: 99.0
[writer.ffmpeg]
container: mp4
codec: libx264
crf: 23
preset: medium
tune: none
profile: auto
level: auto
skip_mux: False
[writer.gif]
fps: 25
loop: 0
palettesize: 256
subrectangles: False
[writer.opencv]
format: png
draw_transparent: False
jpg_quality: 75
png_compress_level: 3
[writer.pillow]
format: png
draw_transparent: False
optimize: False
gif_interlace: True
jpg_quality: 75
png_compress_level: 3
tif_compression: tiff_deflate
[mask.mask_blend]
type: normalized
kernel_size: 3
passes: 4
threshold: 4
erosion: 0.0
erosion_top: 0.0
erosion_bottom: 0.0
erosion_left: 0.0
erosion_right: 0.0
[scaling.sharpen]
method: gaussian
amount: 150
radius: 0.3
threshold: 5.0
--------- train.ini ---------
[global]
centering: face
coverage: 87.5
icnr_init: False
conv_aware_init: False
optimizer: adam
learning_rate: 5e-05
epsilon_exponent: -7
autoclip: False
reflect_padding: False
allow_growth: False
mixed_precision: False
nan_protection: True
convert_batchsize: 16
[global.loss]
loss_function: ssim
loss_function_2: mse
loss_weight_2: 100
loss_function_3: none
loss_weight_3: 0
loss_function_4: none
loss_weight_4: 0
mask_loss_function: mse
eye_multiplier: 3
mouth_multiplier: 2
penalized_mask_loss: True
mask_type: extended
mask_blur_kernel: 3
mask_threshold: 4
learn_mask: False
[model.villain]
lowmem: False
[model.dlight]
features: best
details: good
output_size: 256
[model.original]
lowmem: False
[model.dfl_sae]
input_size: 128
architecture: df
autoencoder_dims: 0
encoder_dims: 42
decoder_dims: 21
multiscale_decoder: False
[model.unbalanced]
input_size: 128
lowmem: False
nodes: 1024
complexity_encoder: 128
complexity_decoder_a: 384
complexity_decoder_b: 512
[model.phaze_a]
output_size: 128
shared_fc: none
enable_gblock: True
split_fc: True
split_gblock: False
split_decoders: False
enc_architecture: fs_original
enc_scaling: 7
enc_load_weights: True
bottleneck_type: dense
bottleneck_norm: none
bottleneck_size: 1024
bottleneck_in_encoder: True
fc_depth: 1
fc_min_filters: 1024
fc_max_filters: 1024
fc_dimensions: 4
fc_filter_slope: -0.5
fc_dropout: 0.0
fc_upsampler: upsample2d
fc_upsamples: 1
fc_upsample_filters: 512
fc_gblock_depth: 3
fc_gblock_min_nodes: 512
fc_gblock_max_nodes: 512
fc_gblock_filter_slope: -0.5
fc_gblock_dropout: 0.0
dec_upscale_method: subpixel
dec_upscales_in_fc: 0
dec_norm: none
dec_min_filters: 64
dec_max_filters: 512
dec_slope_mode: full
dec_filter_slope: -0.45
dec_res_blocks: 1
dec_output_kernel: 5
dec_gaussian: True
dec_skip_last_residual: True
freeze_layers: keras_encoder
load_layers: encoder
fs_original_depth: 4
fs_original_min_filters: 128
fs_original_max_filters: 1024
fs_original_use_alt: False
mobilenet_width: 1.0
mobilenet_depth: 1
mobilenet_dropout: 0.001
mobilenet_minimalistic: False
[model.dfl_h128]
lowmem: False
[model.dfaker]
output_size: 128
[model.realface]
input_size: 64
output_size: 128
dense_nodes: 1536
complexity_encoder: 128
complexity_decoder: 512
[trainer.original]
preview_images: 14
zoom_amount: 5
rotation_range: 10
shift_range: 5
flip_chance: 50
color_lightness: 30
color_ab: 8
color_clahe_chance: 50
color_clahe_max_size: 4
--------- .faceswap ---------
backend: nvidia
--------- extract.ini ---------
[global]
allow_growth: False
[detect.s3fd]
confidence: 70
batch-size: 4
[detect.mtcnn]
minsize: 20
scalefactor: 0.709
batch-size: 8
threshold_1: 0.6
threshold_2: 0.7
threshold_3: 0.7
[detect.cv2_dnn]
confidence: 50
[mask.vgg_obstructed]
batch-size: 2
[mask.unet_dfl]
batch-size: 8
[mask.vgg_clear]
batch-size: 6
[mask.bisenet_fp]
batch-size: 8
weights: faceswap
include_ears: False
include_hair: False
include_glasses: True
[align.fan]
batch-size: 12
One thing that's really strange is when I run "sudo nvidia-smi", it does display a CUDA version, and it's not the version installed by faceswap (which is 11.2 iirc)... But I can't find any other cuda installs, and apparently neither can faceswap sysinfo.
Code: Select all
Sat Jun 25 21:28:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46 Driver Version: 495.46 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 66C P8 12W / 70W | 3MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+