Page 1 of 1

Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Mon Sep 20, 2021 10:39 pm
by Replicon

I created a fresh installed image using my normal flow (basically as shown in the instructions here).

The installation process itself looked weird (It "failed with initial frozen solve", and after recovering, it spent a bunch of time downloading like 1.5GB of cuda stuff).

The install succeeded, but when I ran an extract, it eventually crashed with:

Code: Select all

tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: initialization error

Did something change with the package repos or something? I'm not too familiar with Conda, but It definitely looked like the install process wasn't behaving right. I tried this twice, and it misbehaved similarly.

Some relevant info from the crash log follows; I can give you more, but it likely wouldn't help:

Code: Select all

============ System Information ============
encoding:            UTF-8
git_branch:          Not Found
git_commits:         Not Found
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: Tesla T4
gpu_devices_active:  GPU_0
gpu_driver:          440.100
gpu_vram:            GPU_0: 15109MB
os_machine:          x86_64
os_platform:         Linux-5.4.0-1021-gcp-x86_64-with-glibc2.17
os_release:          5.4.0-1021-gcp
py_command:          /home/[[REDACTED]]/faceswap/faceswap.py extract -i /home/[[REDACTED]].mp4 -o /home/[[REDACTED]] -D s3fd -A fan -nm hist -rf 8 -min 0 -l 0.4 -sz 512 -een 3 -si 0 -L INFO
py_conda_version:    Conda is used, but version not found
py_implementation:   CPython
py_version:          3.8.11
py_virtual_env:      True
sys_cores:           4
sys_processor:       x86_64
sys_ram:             Total: 15005MB, Available: 14245MB, Used: 482MB, Free: 13759MB

=============== Pip Packages ===============
absl-py==0.13.0
astunparse==1.6.3
cachetools==4.2.2
certifi==2021.5.30
charset-normalizer==2.0.6
clang==5.0
cycler==0.10.0
fastcluster==1.1.26
ffmpy==0.2.3
flatbuffers==1.12
gast==0.4.0
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.40.0
h5py==3.1.0
idna==3.2
imageio @ file:///tmp/build/80754af9/imageio_1617700267927/work
imageio-ffmpeg @ file:///home/conda/feedstock_root/build_artifacts/imageio-ffmpeg_1629987409325/work
joblib @ file:///tmp/build/80754af9/joblib_1613502643832/work
keras==2.6.0
Keras-Preprocessing==1.1.2
kiwisolver @ file:///tmp/build/80754af9/kiwisolver_1612282420641/work
Markdown==3.3.4
matplotlib @ file:///tmp/build/80754af9/matplotlib-base_1592846008246/work
mkl-fft==1.3.0
mkl-random==1.1.1
mkl-service==2.3.0
numpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1603570489231/work
nvidia-ml-py==11.470.66
oauthlib==3.1.1
olefile @ file:///Users/ktietz/demo/mc3/conda-bld/olefile_1629805411829/work
opencv-python==4.5.3.56
opt-einsum==3.3.0
Pillow @ file:///tmp/build/80754af9/pillow_1625655817137/work
protobuf==3.18.0
psutil @ file:///tmp/build/80754af9/psutil_1612298023621/work
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing @ file:///home/linux1/recipes/ci/pyparsing_1610983426697/work
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.7.2
scikit-learn @ file:///tmp/build/80754af9/scikit-learn_1621370412049/work
scipy @ file:///tmp/build/80754af9/scipy_1616703172749/work
sip==4.19.13
six==1.15.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow-estimator==2.6.0
tensorflow-gpu==2.6.0
termcolor==1.1.0
threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work
tornado @ file:///tmp/build/80754af9/tornado_1606942300299/work
tqdm @ file:///tmp/build/80754af9/tqdm_1631818572807/work
typing-extensions==3.7.4.3
urllib3==1.26.6
Werkzeug==2.0.1
wrapt==1.12.1

============== Conda Packages ==============
Could not get package list

Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Tue Sep 21, 2021 3:07 pm
by Replicon

OK.

So I intercepted the installer, and added a

Code: Select all

git checkout c7d85f89e69c74e97bf7485b064c07487d31faae

in the right place.

This is to roll back the Tensorflow 2.6 change, which looked like a really suspicious potential root cause.

And it works again!

I was wrong about the "failed frozen..." stuff, which happens regardless, but I was right that it did't need to download 1.5GB of Cuda libraries... Which leads me to think that stuff was already set up on the GCE image, with compatible drivers.

It may be that the GCE base image needs to be updated, with drivers and libraries that make everything compatible again, so fresh installs can keep getting the latest and greatest.


Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Sat Oct 09, 2021 12:42 pm
by torzdf

This kind of thing doesn't surprise me too much. Unfortunately the workaround to getting 30xx support into faceswap is a bit messy, so I'm not surprised there are conflicts.

[mention]pfakanator[/mention] may be able to update this image, or (hopefully) this issue will resolve in time when Conda get 30xx support working properly with their builds and I can move our install back to a less hacky solution.


Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Wed Oct 13, 2021 2:48 pm
by Replicon

Hey, welcome back!

I'm not sure I understand where we're at right now. What's the current "hacky" solution? I'm not aware of running RTX 30xx on the cloud, but I don't know much about GPU lingo... I see it says it's RTX-capable, and maybe that's what you're referring to.

Are we talking about two separate problems? If Conda figures out their build issues, will the latest faceswap versions (with tensorflow 2.6) start working on the provided GCE image, or is it just that the latest versions of software used by faceswap are simply not backwards compatible with the aging drivers on the GCE image? Even before, one thing I had to do with my GCE install was to disable auto-updates, because as soon as things get updated (or I run update/upgrade and reboot), it starts to fail to detect the video card.

The answer might just be to install the latest drivers on a fresh image every time, as part of the overall faceswap install.

While I'm sure it's probably a bit more complicated than literally running "apt-get install <short-list-of-packages>" on a clean base, I expect it to be really well documented, or nobody would use them.


Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Tue Oct 19, 2021 11:14 am
by torzdf

Honestly, I don't know, sorry. I don't use GCE/cloud at all.


Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Sun Oct 24, 2021 6:40 am
by nigelwang

Hi, running into the same issue. Can you be more specific how to fix this issue? Not sure how I can change the installer.
Thank you!!

Replicon wrote: Tue Sep 21, 2021 3:07 pm

OK.

So I intercepted the installer, and added a

Code: Select all

git checkout c7d85f89e69c74e97bf7485b064c07487d31faae

in the right place.

This is to roll back the Tensorflow 2.6 change, which looked like a really suspicious potential root cause.

And it works again!

I was wrong about the "failed frozen..." stuff, which happens regardless, but I was right that it did't need to download 1.5GB of Cuda libraries... Which leads me to think that stuff was already set up on the GCE image, with compatible drivers.

It may be that the GCE base image needs to be updated, with drivers and libraries that make everything compatible again, so fresh installs can keep getting the latest and greatest.


Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Tue Dec 07, 2021 9:02 pm
by Replicon
nigelwang wrote: Sun Oct 24, 2021 6:40 am

Hi, running into the same issue. Can you be more specific how to fix this issue? Not sure how I can change the installer.
Thank you!!

Haven't logged in in a while so only seeing this now.

If you inspect the setup script on the image setup, all it does is download and run the install script.

If you download and inspect the install script, it downloads the latest faceswap from github using the git CLI.

The code change I linked broke faceswap on the provided cloud base image, possibly because of driver incompatibilities.

So, to temporarily mitigate the issue, you can hack up the setup script to grab the state of the faceswap codebase from just BEFORE the breaking change (which is the git checkout command I linked).

I honestly don't remember exactly how and where I made the change, but I really just replaced or added a git command to make it so it's using that older code.

Like I said, the more correct way to solve this will likely be to just update the nvidia drivers on the base image... unfortunately, doing a basic 'upgrade' just breaks it, and causes it to not see the nvidia driver at all, and I haven't yet figured out how to do an upgrade that won't completely break the image. It could be as simple as running dist-upgrade instead of upgrade, but... meh I dunno, didn't play around with it too much. If I'm bored for a couple of days with nothing to do, maybe I'll try to script a base image setup from scratch that creates an image that works consistently, even through upgrades.


Re: Faceswap installer broken on GCE (and possibly elsewhere)?

Posted: Tue Dec 07, 2021 9:41 pm
by Replicon

Just so happens I wanted to kick one off, so I sshed into my image to find my old setup script.

When you edit your setup script, you want to change the clone_faceswap function to look something like this:

Code: Select all

clone_faceswap() {
    # Clone the faceswap repo
    delete_faceswap
    info "Downloading Faceswap..."
    yellow ; git clone "$DL_FACESWAP" "$DIR_FACESWAP"; (cd $DIR_FACESWAP && git checkout c7d85f89e69c74e97bf7485b064c07487d31faae)
}

Again, this is just a hack to get going again. If you end up finding out the right incantation to properly upgrade drivers on the image without hosing everything, please do reply here. :)