Faceswap installer broken on GCE (and possibly elsewhere)?

Want to use Faceswap in The Cloud? This is not directly supported by the Devs, but you may find community support here


Forum rules

Read the FAQs and search the forum before posting a new topic.

NB: The Devs do not directly support using Cloud based services, but you can find community support here.

Please mark any answers that fixed your problems so others can find the solutions.

Post Reply
User avatar
Replicon
Posts: 40
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 1 time

Faceswap installer broken on GCE (and possibly elsewhere)?

Post by Replicon »

I created a fresh installed image using my normal flow (basically as shown in the instructions here).

The installation process itself looked weird (It "failed with initial frozen solve", and after recovering, it spent a bunch of time downloading like 1.5GB of cuda stuff).

The install succeeded, but when I ran an extract, it eventually crashed with:

Code: Select all

tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: initialization error

Did something change with the package repos or something? I'm not too familiar with Conda, but It definitely looked like the install process wasn't behaving right. I tried this twice, and it misbehaved similarly.

Some relevant info from the crash log follows; I can give you more, but it likely wouldn't help:

Code: Select all

============ System Information ============
encoding:            UTF-8
git_branch:          Not Found
git_commits:         Not Found
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: Tesla T4
gpu_devices_active:  GPU_0
gpu_driver:          440.100
gpu_vram:            GPU_0: 15109MB
os_machine:          x86_64
os_platform:         Linux-5.4.0-1021-gcp-x86_64-with-glibc2.17
os_release:          5.4.0-1021-gcp
py_command:          /home/[[REDACTED]]/faceswap/faceswap.py extract -i /home/[[REDACTED]].mp4 -o /home/[[REDACTED]] -D s3fd -A fan -nm hist -rf 8 -min 0 -l 0.4 -sz 512 -een 3 -si 0 -L INFO
py_conda_version:    Conda is used, but version not found
py_implementation:   CPython
py_version:          3.8.11
py_virtual_env:      True
sys_cores:           4
sys_processor:       x86_64
sys_ram:             Total: 15005MB, Available: 14245MB, Used: 482MB, Free: 13759MB

=============== Pip Packages ===============
absl-py==0.13.0
astunparse==1.6.3
cachetools==4.2.2
certifi==2021.5.30
charset-normalizer==2.0.6
clang==5.0
cycler==0.10.0
fastcluster==1.1.26
ffmpy==0.2.3
flatbuffers==1.12
gast==0.4.0
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.40.0
h5py==3.1.0
idna==3.2
imageio @ file:///tmp/build/80754af9/imageio_1617700267927/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1629987409325/work
joblib @ file:///tmp/build/80754af9/joblib_1613502643832/work
keras==2.6.0
Keras-Preprocessing==1.1.2
kiwisolver @ file:///tmp/build/80754af9/kiwisolver_1612282420641/work
Markdown==3.3.4
matplotlib @ file:///tmp/build/80754af9/matplotlib-base_1592846008246/work
mkl-fft==1.3.0
mkl-random==1.1.1
mkl-service==2.3.0
numpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1603570489231/work
nvidia-ml-py==11.470.66
oauthlib==3.1.1
olefile @ file:///Users/ktietz/demo/mc3/conda-bld/olefile_1629805411829/work
opencv-python==4.5.3.56
opt-einsum==3.3.0
Pillow @ file:///tmp/build/80754af9/pillow_1625655817137/work
protobuf==3.18.0
psutil @ file:///tmp/build/80754af9/psutil_1612298023621/work
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing @ file:///home/linux1/recipes/ci/pyparsing_1610983426697/work
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.7.2
scikit-learn @ file:///tmp/build/80754af9/scikit-learn_1621370412049/work
scipy @ file:///tmp/build/80754af9/scipy_1616703172749/work
sip==4.19.13
six==1.15.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow-estimator==2.6.0
tensorflow-gpu==2.6.0
termcolor==1.1.0
threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work
tornado @ file:///tmp/build/80754af9/tornado_1606942300299/work
tqdm @ file:///tmp/build/80754af9/tqdm_1631818572807/work
typing-extensions==3.7.4.3
urllib3==1.26.6
Werkzeug==2.0.1
wrapt==1.12.1

============== Conda Packages ==============
Could not get package list

User avatar
Replicon
Posts: 40
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 1 time

Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Post by Replicon »

OK.

So I intercepted the installer, and added a

Code: Select all

git checkout c7d85f89e69c74e97bf7485b064c07487d31faae

in the right place.

This is to roll back the Tensorflow 2.6 change, which looked like a really suspicious potential root cause.

And it works again!

I was wrong about the "failed frozen..." stuff, which happens regardless, but I was right that it did't need to download 1.5GB of Cuda libraries... Which leads me to think that stuff was already set up on the GCE image, with compatible drivers.

It may be that the GCE base image needs to be updated, with drivers and libraries that make everything compatible again, so fresh installs can keep getting the latest and greatest.


User avatar
torzdf
Posts: 1526
Joined: Fri Jul 12, 2019 12:53 am
Answers: 126
Has thanked: 54 times
Been thanked: 288 times

Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Post by torzdf »

This kind of thing doesn't surprise me too much. Unfortunately the workaround to getting 30xx support into faceswap is a bit messy, so I'm not surprised there are conflicts.

@pfakanator may be able to update this image, or (hopefully) this issue will resolve in time when Conda get 30xx support working properly with their builds and I can move our install back to a less hacky solution.

My word is final


User avatar
Replicon
Posts: 40
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 1 time

Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Post by Replicon »

Hey, welcome back!

I'm not sure I understand where we're at right now. What's the current "hacky" solution? I'm not aware of running RTX 30xx on the cloud, but I don't know much about GPU lingo... I see it says it's RTX-capable, and maybe that's what you're referring to.

Are we talking about two separate problems? If Conda figures out their build issues, will the latest faceswap versions (with tensorflow 2.6) start working on the provided GCE image, or is it just that the latest versions of software used by faceswap are simply not backwards compatible with the aging drivers on the GCE image? Even before, one thing I had to do with my GCE install was to disable auto-updates, because as soon as things get updated (or I run update/upgrade and reboot), it starts to fail to detect the video card.

The answer might just be to install the latest drivers on a fresh image every time, as part of the overall faceswap install.

While I'm sure it's probably a bit more complicated than literally running "apt-get install <short-list-of-packages>" on a clean base, I expect it to be really well documented, or nobody would use them.


User avatar
torzdf
Posts: 1526
Joined: Fri Jul 12, 2019 12:53 am
Answers: 126
Has thanked: 54 times
Been thanked: 288 times

Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Post by torzdf »

Honestly, I don't know, sorry. I don't use GCE/cloud at all.

My word is final


Post Reply