Faceswap 3 open for testing

Discussions about research, Faceswapping and things that don't fit in the other categories here.


User avatar
teskki
Posts: 1
Joined: Sun Sep 29, 2024 7:44 am

Re: Faceswap 3 open for testing

Post by teskki »

Hi I need help with this installation process... I tried installing the Faceswap 3 on my kali linux and this is what am getting

(base) ┌──(tejiri㉿kali)-[~]
└─$ git clone https://github.com/deepfakes/faceswap.git fs3
Cloning into 'fs3'...
remote: Enumerating objects: 14675, done.
remote: Counting objects: 100% (1062/1062), done.
remote: Compressing objects: 100% (624/624), done.
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 4480 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

User avatar
torzdf
Posts: 2761
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 141 times
Been thanked: 649 times

Re: Faceswap 3 open for testing

Post by torzdf »

@ianstephens Thanks for testing. Unfortunately I have been very busy of late, so not much time for development.

I have made a note of your feedback and will investigate as soon as I get an opportunity.

My word is final

User avatar
torzdf
Posts: 2761
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 141 times
Been thanked: 649 times

Re: Faceswap 3 open for testing

Post by torzdf »

teskki wrote: Sun Sep 29, 2024 8:02 am

Hi I need help with this installation process... I tried installing the Faceswap 3 on my kali linux and this is what am getting

(base) ┌──(tejiri㉿kali)-[~]
└─$ git clone https://github.com/deepfakes/faceswap.git fs3
Cloning into 'fs3'...
remote: Enumerating objects: 14675, done.
remote: Counting objects: 100% (1062/1062), done.
remote: Compressing objects: 100% (624/624), done.
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 4480 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

This is a network connection issue. It looks like your ISP is cutting off your connection before the download completes.

You can try either:

  • Using an internet connection elsewhere
  • Trying to enable a VPN (sometimes works for this issue)

My word is final

User avatar
Ryzen1988
Posts: 59
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Re: Faceswap 3 open for testing

Post by Ryzen1988 »

Hey wow, am i late to the party, damn. But it gives me this strange error because torchvision is already installed. I added crash report
All other masks besides BiSeNet seem to work perfect

Nevermind, fixed it by uninstalling and reinstalling different toch, now version 2.3.0

Initialized: BiSeNet
ImportError: Exception encountered when calling UpSampling2D.call().

[1mThe torchvision package is necessary to use resize with the torch backend. Please install torchvision.[0m

Arguments received by UpSampling2D.call():
• inputs=torch.Tensor(shape=torch.Size([8, 32, 32, 5]), dtype=float32)

On a more cheerful note, the manual tool feels a lot quicker and a bit more robuust. :D really looking forward to try out the new convnext and the new AdamW optimizer.

Attachments
crash_report.2024.11.16.234045270882.log
(46.96 KiB) Downloaded 3346 times
Last edited by Ryzen1988 on Sun Nov 17, 2024 10:28 am, edited 5 times in total.
User avatar
Ryzen1988
Posts: 59
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Re: Faceswap 3 open for testing

Post by Ryzen1988 »

Have some errors that appear when trying to train a model

Code: Select all

C:\Users\Admin\.conda\envs\fs3\Lib\site-packages\keras\src\backend\torch\nn.py:416: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ..\aten\src\ATen\native\cudnn\Conv_v8.cpp:919.)
outputs = tnn.conv2d(
C:\Users\Admin\.conda\envs\fs3\Lib\site-packages\keras\src\backend\torch\nn.py:416: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at ..\aten\src\ATen\native\Convolution.cpp:1032.)
outputs = tnn.conv2d(
C:\Users\Admin\.conda\envs\fs3\Lib\site-packages\torch\autograd\graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ..\aten\src\ATen\native\cudnn\Conv_v8.cpp:919.)
return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
User avatar
Playstation5fanatic
Posts: 1
Joined: Mon Dec 23, 2024 2:26 pm

Re: Faceswap 3 open for testing

Post by Playstation5fanatic »

Ryzen1988 wrote: Sat Nov 16, 2024 10:47 pm

Hey wow, am i late to the party, damn. But it gives me this strange error because torchvision is already installed. I added crash report
All other masks besides BiSeNet seem to work perfect

Nevermind, fixed it by uninstalling and reinstalling different toch, now version 2.3.0

Initialized: BiSeNet
ImportError: Exception encountered when calling UpSampling2D.call().

[1mThe torchvision package is necessary to use resize with the torch backend. Please install torchvision.[0m

Arguments received by UpSampling2D.call():
• inputs=torch.Tensor(shape=torch.Size([8, 32, 32, 5]), dtype=float32)

On a more cheerful note, the manual tool feels a lot quicker and a bit more robuust. :D really looking forward to try out the new convnext and the new AdamW optimizer.

Hey! I'm having the same issue. What exactly did you do to fix it? I tried uninstall torchvision and reinstalling and older version. Same issue. Then I updated Faceswap via the gui, still having the same issue.

Any help would be appreciated!

User avatar
spewfy
Posts: 1
Joined: Sun Jan 19, 2025 4:17 am

Re: Faceswap 3 open for testing

Post by spewfy »

Problem with Alignments

Testing on Linux only, I have a problem with alignments file. All frames are missing alignments so convert process isn't working. Testing Alignments with tools.py, I can identify the issue:

python faceswap.py extract -i test.mp4 -o output -p test.fsa -A fan -D s3fd -L VERBOSE
01/19/2025 12:08:52 INFO Aligner filtered: (features (1.00): 7)
01/19/2025 12:08:52 INFO Writing alignments to: '/home/ubuntu/fs3/test.fsa'
01/19/2025 12:08:52 INFO -------------------------
01/19/2025 12:08:52 INFO Images found: 0
01/19/2025 12:08:52 INFO Faces detected: 601
01/19/2025 12:08:52 INFO -------------------------
01/19/2025 12:08:52 INFO Process Successfully Completed. Shutting Down...

python tools.py alignments -j extract -a test.fsa -r test.mp4 -c extract -L VERBOSE
01/19/2025 12:10:55 INFO [FRAMES DATA]
01/19/2025 12:10:55 VERBOSE Video exists at: '/home/ubuntu/fs3/test.mp4'
01/19/2025 12:10:55 INFO Loading video frames from /home/ubuntu/fs3/test.mp4
01/19/2025 12:10:56 VERBOSE 601 items loaded
01/19/2025 12:10:56 INFO [EXTRACT FACES]
01/19/2025 12:10:56 VERBOSE Creating output folder at '/home/ubuntu/fs3/extract'
01/19/2025 12:11:31 INFO 601 face(s) extracted

python tools.py alignments -j missing-alignments -a test.fsa -r extract -L VERBOSE
01/19/2025 12:13:15 INFO [FRAMES DATA]
01/19/2025 12:13:15 VERBOSE Folder exists at '/home/ubuntu/fs3/extract'
01/19/2025 12:13:15 INFO Loading file list from /home/ubuntu/fs3/extract
01/19/2025 12:13:15 VERBOSE 601 items loaded
01/19/2025 12:13:15 INFO [CHECK FRAMES]
01/19/2025 12:13:15 INFO -----------------------------------------------
01/19/2025 12:13:15 INFO --- Frames missing from alignments file (601)
01/19/2025 12:13:15 INFO -----------------------------------------------
01/19/2025 12:13:15 INFO test_000001_0.png
01/19/2025 12:13:15 INFO test_000002_0.png
01/19/2025 12:13:15 INFO test_000003_0.png
01/19/2025 12:13:15 INFO test_000004_0.png
01/19/2025 12:13:15 INFO test_000005_0.png
....

Please let me know how to get around this

Thank you

User avatar
BCD16
Posts: 3
Joined: Thu Jan 23, 2025 9:47 am

Re: Faceswap 3 open for testing

Post by BCD16 »

Hello, M4 Max Macbook Pro user here,

Installed succesfully in a virtual miniforge3 environment, without any errors.

Training starts and after a couple seconds, this warning shows up:

/miniforge3/envs/fs3/lib/python3.11/site-packages/keras/src/backend/torch/optimizers/torch_adam.py:35: UserWarning: The operator 'aten::foreach_mul.Scalar' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/temp/anaconda/conda-bld/pytorch_1711403251597/work/aten/src/ATen/mps/MPSFallback.mm:13.)
torch.
foreach_mul_(m_list, self.beta_1)

And preview pictures like rainbow. I waited for 30 minutes, Loss is not going down. Sometimes training directly ends with NaN Protection error.

I see that your script installs pytorch 2.3.1 , i tried all versions between 2.2.2-2.3.1. Nothing changes. I changed optimizer to Lion or other ones, same result.

Same result with or without mixed precision enabled.

edit: OK now I got it working. Forced to install pytorch 2.6.0 in conda env, and override pytorch version check in /cli/launcher.py file.

Training works now with FP16 enabled, but significantly slower than tensorflow backend.

In my faceswap2 environment I have been using tensorflow 2.13 and tensorflow-metal 1.0.0 and i am really happy with its FP16 speed with legacy Adam optimizer. This beta version should be improved much much more.

Last edited by BCD16 on Wed Feb 26, 2025 7:23 pm, edited 1 time in total.
User avatar
burnout_
Posts: 1
Joined: Sat Jan 25, 2025 3:54 am

Re: Faceswap 3 open for testing

Post by burnout_ »

bug when installing for nvidia users. windows 10. username removed.

(fs3) C:\Users\\>cd fs3

(fs3) C:\Users\\fs3>git checkout fs3
branch 'fs3' set up to track 'origin/fs3'.
Switched to a new branch 'fs3'

(fs3) C:\Users\\fs3>conda activate fs3

(fs3) C:\Users\\fs3>python setup.py --installer --nvidia
←[32mINFO←[0m Running without root/admin privileges
←[32mINFO←[0m The tool provides tips for installation and installs required python packages
←[32mINFO←[0m Setup in Windows 10
←[32mINFO←[0m Installed Python: 3.11.11 64bit
←[32mINFO←[0m Running in Conda
←[32mINFO←[0m Running in a Virtual Environment
←[32mINFO←[0m Encoding: cp1252
←[32mINFO←[0m Installed pip: 24.2
←[32mINFO←[0m Installing packaging
packaging-24.2 | 200 KB | | 0%
←[A packaging-24.2 | 200 KB | ██ | 8%
←[A packaging-24.2 | 200 KB | █████████████████████████████████ | 96%
←[A packaging-24.2 | 200 KB | ███████████████████████████████████ | 100%
←[A packaging-24.2 | 200 KB | ███████████████████████████████████ | 100%
←[32mINFO←[0m Faceswap config written to: C:\Users\\fs3\config\.faceswap
←[32mINFO←[0m Keras config written to: C:\Users\\.keras\keras.json
INFO Installing Required Python Packages. This may take some time...
INFO Installing pywinpty>=2.0.2
ERROR Unable to install package: pywinpty>=2.0.2. Process aborted

User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

Redundant, removed for space.

Last edited by DeliciousCaramels on Tue Mar 25, 2025 4:06 am, edited 1 time in total.
User avatar
BCD16
Posts: 3
Joined: Thu Jan 23, 2025 9:47 am

Re: Faceswap 3 open for testing

Post by BCD16 »

Now I have switched from Apple Silicon to a Asus ROG G16 Zephyrus notebook with a RTX 4090 Max-Q (80 watt PL). I installed this beta and forced install newer torch(2.6.0+cu126) and keras(3.3.3).

Now with a batch size of 160 with the original DFLH128 trainer, I can reach up to 255 EGs/sec with only 80 watt GPU power(Mixed precision enabled). Almost 3x faster than old FS2 tensorflow backend. This is insane. With 2000 images input each A and B side, I am almost done in 24 hours. I could give an other 24 hours with "No warp" checked and it comes close to the perfection.

Keep up the good work man! if anyone interested I may upload my fully working conda environment and fs3 app folder. I think it will work for any Nvidia CUDA user under Windows. PM me. Thanks again!

Last edited by BCD16 on Wed Mar 19, 2025 8:06 am, edited 2 times in total.
User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

Redundant, space.

Last edited by DeliciousCaramels on Tue Mar 25, 2025 3:52 am, edited 3 times in total.
User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

Alright so these windows users must be making you lose your mind!

I have made this work now after much trial and error.

Firstly, I follow the basic instructions but before I run the install script I install pytorch into the environment via this command -

conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

The version of numpy packaged with this is 1.24.3 which causes an error, and so again before running the install script I try installing numpy 1.26.4 as per the requirements, however, this installs a different version of pytorch with cpu in the name, and it won't train. So, I install 1.26.4 via pip.

I check that I have functioning cuda by running the script from this page
https://phphe.com/blog/install-pytorch-manually

Matplotlib needs to be installed manually.

It all installs fine and I can train!

If I train a model started with Conv aware I get this warning

Code: Select all

C:\Users\admin\MiniConda3\envs\fs3\Lib\site-packages\keras\src\backend\torch\nn.py:416: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\Convolution.cpp:1032.)
outputs = tnn.conv2d(

Training iteration speed seems slower in comparison to FS2. In FS2 running my default max settings (phaze-a) I get about 22 eg/s in fs2, in fs3 I'm getting about 9 eg/s.
I get roughly 4000 iterations per hour in fs2 and 2500 in FS3 training using the same settings.

It could be the case that there is some issue with the way I have installed? Googling indicates the warning is due to a mismatch with CUDNN versions but all my cuda packages come from the same source.

Here are my packages and system settings etc., let me know if anything stands out as wrong.

Code: Select all


============ System Information ============
backend:             nvidia
encoding:            cp1252
git_branch:          fs3
git_commits:         8073752 Merge branch 'master' into fs3 | cbaad14 Bugfix: Linux installer - pin git to < 2.45 | 6fe300e pin numpy to < 2.0
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: NVIDIA GeForce RTX 3070 Laptop GPU
gpu_devices_active:  GPU_0
gpu_driver:          565.90
gpu_vram:            GPU_0: 8192MB (133MB free)
os_machine:          AMD64
os_platform:         Windows-10-10.0.19045-SP0
os_release:          10
py_conda_version:    conda 25.3.0
py_implementation:   CPython
py_version:          3.11.11
py_virtual_env:      True
sys_cores:           16
sys_processor:       Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
sys_ram:             Total: 32547MB, Available: 13585MB, Used: 18961MB, Free: 13585MB

=============== Pip Packages ===============
absl-py==2.2.0
Brotli @ file:///C:/b/abs_c415aux9ra/croot/brotli-split_1736182803933/work
certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1739515848642/work/certifi
charset-normalizer @ file:///croot/charset-normalizer_1721748349566/work
colorama==0.4.6
contourpy @ file:///C:/b/abs_f2u2o_8s9g/croot/contourpy_1732540071787/work
cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work
fastcluster==1.2.6
ffmpy==0.5.0
filelock @ file:///C:/b/abs_f2gie28u58/croot/filelock_1700591233643/work
fonttools @ file:///C:/b/abs_4crkswws2h/croot/fonttools_1737040078745/work
gmpy2 @ file:///C:/b/abs_d8ki0o0h97/croot/gmpy2_1738085498525/work
grpcio==1.71.0
h5py==3.13.0
idna @ file:///C:/b/abs_aad84bnnw5/croot/idna_1714398896795/work
imageio==2.37.0
imageio-ffmpeg==0.6.0
Jinja2 @ file:///C:/b/abs_920kup4e6u/croot/jinja2_1741711580669/work
joblib==1.4.2
keras==3.3.3
kiwisolver @ file:///C:/b/abs_faf90xet7a/croot/kiwisolver_1737040915779/work
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe @ file:///C:/b/abs_a0ma7ge0jc/croot/markupsafe_1738584052792/work
matplotlib==3.10.0
mdurl==0.1.2
mkl-fft==1.3.1
mkl-random @ file:///C:/ci_311/mkl_random_1676481991689/work
mkl-service==2.4.0
ml_dtypes==0.5.1
mpmath @ file:///C:/b/abs_7833jrbiox/croot/mpmath_1690848321154/work
namex==0.0.8
networkx @ file:///C:/b/abs_b054htfn9t/croot/networkx_1737043671910/work
numexpr==2.10.2
numpy==1.26.4
nvidia-ml-py==12.570.86
opencv-python==4.11.0.86
optree==0.14.1
packaging @ file:///C:/b/abs_3by6s2fa66/croot/packaging_1734472138782/work
pillow @ file:///C:/b/abs_b50vowcrzo/croot/pillow_1738010273782/work
protobuf==6.30.1
psutil==7.0.0
Pygments==2.19.1
pyparsing @ file:///C:/b/abs_40z8gyj9wi/croot/pyparsing_1731445739241/work
PyQt6==6.7.1
PyQt6_sip @ file:///C:/b/abs_28s7k4h_hl/croot/pyqt-split_1740498234166/work/pyqt_sip
PySocks @ file:///C:/ci_311/pysocks_1676425991111/work
python-dateutil @ file:///C:/b/abs_3au_koqnbs/croot/python-dateutil_1716495777160/work
pywin32==310
pywinpty==2.0.15
PyYAML @ file:///C:/b/abs_14xkfs39bx/croot/pyyaml_1728657968772/work
requests @ file:///C:/b/abs_c3508vg8ez/croot/requests_1731000584867/work
rich==13.9.4
scikit-learn==1.6.1
scipy==1.15.2
sip @ file:///C:/b/abs_5cto136kse/croot/sip_1738856220313/work
six @ file:///tmp/build/80754af9/six_1644875935023/work
sympy @ file:///C:/b/abs_b4u17p23yg/croot/sympy_1738108511395/work
tensorboard==2.19.0
tensorboard-data-server==0.7.2
threadpoolctl==3.6.0
torch==2.3.1
torchaudio==2.3.1
torchvision==0.18.1
tornado @ file:///C:/b/abs_7cyu943ybx/croot/tornado_1733960510898/work
tqdm==4.67.1
typing_extensions @ file:///C:/b/abs_0ffjxtihug/croot/typing_extensions_1734714875646/work
unicodedata2 @ file:///C:/b/abs_dfnftvxi4k/croot/unicodedata2_1736543771112/work
urllib3 @ file:///C:/b/abs_7bst06lizn/croot/urllib3_1737133657081/work
Werkzeug==3.1.3
win-inet-pton @ file:///C:/ci_311/win_inet_pton_1676425458225/work

============== Conda Packages ==============
# packages in environment at C:\Users\daniel\MiniConda3\envs\fs3:
#
# Name                    Version                   Build  Channel
absl-py                   2.2.0                    pypi_0    pypi
blas                      1.0                         mkl  
brotli-python 1.0.9 py311h5da7b33_9
bzip2 1.0.8 h2bbff1b_6
ca-certificates 2025.2.25 haa95532_0
certifi 2025.1.31 pyhd8ed1ab_0 conda-forge charset-normalizer 3.3.2 pyhd3eb1b0_0
colorama 0.4.6 pypi_0 pypi contourpy 1.3.1 py311h214f63a_0
cuda-cccl 12.8.90 0 nvidia cuda-cccl_win-64 12.8.90 0 nvidia cuda-cudart 12.1.105 0 nvidia cuda-cudart-dev 12.1.105 0 nvidia cuda-cupti 12.1.105 0 nvidia cuda-libraries 12.1.0 0 nvidia cuda-libraries-dev 12.1.0 0 nvidia cuda-nvrtc 12.1.105 0 nvidia cuda-nvrtc-dev 12.1.105 0 nvidia cuda-nvtx 12.1.105 0 nvidia cuda-opencl 12.8.90 0 nvidia cuda-opencl-dev 12.8.90 0 nvidia cuda-profiler-api 12.8.90 0 nvidia cuda-runtime 12.1.0 0 nvidia cuda-version 12.8 3 nvidia cycler 0.11.0 pyhd3eb1b0_0
fastcluster 1.2.6 pypi_0 pypi ffmpy 0.5.0 pypi_0 pypi filelock 3.13.1 py311haa95532_0
fonttools 4.55.3 py311h827c3e9_0
freetype 2.12.1 ha860e81_0
git 2.45.2 haa95532_1
gmp 6.3.0 h537511b_0
gmpy2 2.2.1 py311h827c3e9_0
grpcio 1.71.0 pypi_0 pypi h5py 3.13.0 pypi_0 pypi icu 73.1 h6c2663c_0
idna 3.7 py311haa95532_0
imageio 2.37.0 pypi_0 pypi imageio-ffmpeg 0.6.0 pypi_0 pypi intel-openmp 2021.4.0 haa95532_3556
jinja2 3.1.6 py311haa95532_0
joblib 1.4.2 pypi_0 pypi jpeg 9e h827c3e9_3
keras 3.3.3 pypi_0 pypi khronos-opencl-icd-loader 2024.05.08 h8cc25b3_0
kiwisolver 1.4.8 py311h5da7b33_0
krb5 1.20.1 h5b6d351_0
lcms2 2.16 hb4a4139_0
lerc 4.0.0 h5da7b33_0
libcublas 12.1.0.26 0 nvidia libcublas-dev 12.1.0.26 0 nvidia libcufft 11.0.2.4 0 nvidia libcufft-dev 11.0.2.4 0 nvidia libcurand 10.3.9.90 0 nvidia libcurand-dev 10.3.9.90 0 nvidia libcusolver 11.4.4.55 0 nvidia libcusolver-dev 11.4.4.55 0 nvidia libcusparse 12.0.2.55 0 nvidia libcusparse-dev 12.0.2.55 0 nvidia libdeflate 1.22 h5bf469e_0
libffi 3.4.4 hd77b12b_1
libjpeg-turbo 2.0.0 h196d8e1_0
libnpp 12.0.2.50 0 nvidia libnpp-dev 12.0.2.50 0 nvidia libnvjitlink 12.1.105 0 nvidia libnvjitlink-dev 12.1.105 0 nvidia libnvjpeg 12.1.1.14 0 nvidia libnvjpeg-dev 12.1.1.14 0 nvidia libpng 1.6.39 h8cc25b3_0
libpq 17.4 h70ee33d_0
libtiff 4.5.1 h44ae7cf_1
libuv 1.48.0 h827c3e9_0
libwebp-base 1.3.2 h3d04722_1
libzlib 1.2.13 hcfcfb64_4 conda-forge libzlib-wapi 1.2.13 hcfcfb64_4 conda-forge lz4-c 1.9.4 h2bbff1b_1
markdown 3.7 pypi_0 pypi markdown-it-py 3.0.0 pypi_0 pypi markupsafe 3.0.2 py311h827c3e9_0
matplotlib 3.10.0 py311haa95532_0
matplotlib-base 3.10.0 py311he19b0ae_0
mdurl 0.1.2 pypi_0 pypi mkl 2021.4.0 haa95532_640
mkl-service 2.4.0 py311h2bbff1b_0
mkl_fft 1.3.1 py311h743a336_0
mkl_random 1.2.2 py311heda8569_0
ml-dtypes 0.5.1 pypi_0 pypi mpc 1.3.1 h827c3e9_0
mpfr 4.2.1 h56c3642_0
mpmath 1.3.0 py311haa95532_0
namex 0.0.8 pypi_0 pypi networkx 3.4.2 py311haa95532_0
numexpr 2.10.2 pypi_0 pypi numpy 1.26.4 pypi_0 pypi nvidia-ml-py 12.570.86 pypi_0 pypi opencv-python 4.11.0.86 pypi_0 pypi openjpeg 2.5.2 hae555c5_0
openssl 3.1.0 hcfcfb64_3 conda-forge optree 0.14.1 pypi_0 pypi packaging 24.2 py311haa95532_0
pillow 11.1.0 py311h096bfcc_0
pip 25.0 py311haa95532_0
protobuf 6.30.1 pypi_0 pypi psutil 7.0.0 pypi_0 pypi pygments 2.19.1 pypi_0 pypi pyparsing 3.2.0 py311haa95532_0
pyqt 6.7.1 py311h5da7b33_0
pyqt6-sip 13.9.1 py311h827c3e9_0
pysocks 1.7.1 py311haa95532_0
python 3.11.11 h4607a30_0
python-dateutil 2.9.0post0 py311haa95532_2
pytorch 2.3.1 py3.11_cuda12.1_cudnn8_0 pytorch pytorch-cuda 12.1 hde6ce7c_6 pytorch pytorch-mutex 1.0 cuda pytorch pywin32 310 pypi_0 pypi pyyaml 6.0.2 py311h827c3e9_0
qtbase 6.7.2 h0804d20_1
qtdeclarative 6.7.2 h5da7b33_0
qtsvg 6.7.2 hf2fb9eb_0
qttools 6.7.2 h0de5f00_0
qtwebchannel 6.7.2 h5da7b33_0
qtwebsockets 6.7.2 h5da7b33_0
requests 2.32.3 py311haa95532_1
rich 13.9.4 pypi_0 pypi scikit-learn 1.6.1 pypi_0 pypi scipy 1.15.2 pypi_0 pypi setuptools 75.8.0 py311haa95532_0
sip 6.10.0 py311h5da7b33_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.45.3 h2bbff1b_0
sympy 1.13.3 py311haa95532_1
tensorboard 2.19.0 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi threadpoolctl 3.6.0 pypi_0 pypi tk 8.6.14 h0416ee5_0
torchaudio 2.3.1 pypi_0 pypi torchvision 0.18.1 pypi_0 pypi tornado 6.4.2 py311h827c3e9_0
tqdm 4.67.1 pypi_0 pypi typing_extensions 4.12.2 py311haa95532_0
tzdata 2025a h04d1e81_0
ucrt 10.0.22621.0 h57928b3_1 conda-forge unicodedata2 15.1.0 py311h827c3e9_1
urllib3 2.3.0 py311haa95532_0
vc 14.42 haa95532_4
vs2015_runtime 14.42.34433 he0abc0d_4
werkzeug 3.1.3 pypi_0 pypi wheel 0.45.1 py311haa95532_0
win_inet_pton 1.1.0 py311haa95532_0
xz 5.6.4 h4754444_1
yaml 0.2.5 he774522_0
zlib 1.2.13 hcfcfb64_4 conda-forge zlib-wapi 1.2.13 hcfcfb64_4 conda-forge zstd 1.5.6 h8880b57_0

On the plus side, I have a model running in effect the dny256 presets with efficientnetv2/b3 encoder, and at 150k iterations running mixed precision I got a NAN in fs2 (4 learning rate, adabelief, -13 epsilon), and even rolling back to 130k iterations I couldn't run training speed over 1.5 with MP disabled, it would NAN after about 100 iterations so I abandoned the model.

Loaded the 150k iteration version into FS3, MP, 4e5 learning rate, so far no issues after 1k iterations and so there seems to be some benefit to the new back end.
**Edit. As you would expect, despite there being no NANs, the model is ruined and after 5000 iterations the images look cursed, purple and green. Interesting though that no NANs occurred, could it be the case the NAN protection is not working in FS3?

**edit two : started a new model using efficientnetb3 with the RMS-Prop optimiser (a paper states its the most accurate model with b3? Give it a go!)
Epsilon at -4, learning rate 4e5 and MP enabled, got a NAN in less than 10k iterations. Wild!

Another thing, it appears there is some error with DFL-SAE model compatibility; created a new model at 256 res and all other settings standard, batch size 4. Training speed is extremely slow, it runs about 1 iteration every 3 seconds, and exiting training causes a crash. I can investigate/ provide more info on this but I would suspect it relates to an incorrect package?

Edit three : Thinking about it, there are a lot more options for the optimizers. I get much better training rates (about 18, and I can run a batch size of 8) running RMSProp which I believe is because Adabelief is relatively vram heavy. Is it plausible that the slower training rate with my "default" settings is due to the additional optimiser options?

Edit four: RMSprop needs some more research, nothing but NANS and the rate of actual change is woeful, 10k iterations was my best go today and it looked nothing like the results I have been getting with adab/ Efficient, not even to the point of the "hollowed out" faces.
Have tried everything from EEs up to -8, learning rates down to 1; turned on gradient clipping. So I instead tried AdamW, so far after a mere 1k iterations I had the hollowed out/ pumpkin head looking faces. After the 1250 iteration model update I have the blown out/ bright red and yellow faces. Is this model collapsing already?

A lot of interesting differences going on here in general. I hope all users get a chance to play with this and hopefully someone with a little more computing power than I have finds some cool stuff :)

Last edited by DeliciousCaramels on Tue Mar 25, 2025 1:52 pm, edited 8 times in total.
User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

Just to help people not have to dig through my post here are further installation instructions that worked for me on Windows 10 / nvidia backend.

As you follow Torzdfs instructions; when you get past the part that you activate your Conda environment (ie, you are in your "fs3" environment) add the following commands before you run the install script.

conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

conda install matplotlib

pip install numpy==1.26.4

conda install pywinpty

Then run install script.

I just tested this on a system that doesn't have any other faceswaps installed, ie, a brand new Conda install etc. and it works.

Good luck all and have fun!

User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

It seems there may be some issues. Not certain at this point as obviously this is a very customisable piece of software. However, I think they relate to vram allocation.

1) Installed FS3 on my "extraction" box (it has an 11gb 1080ti). FS2 will run parallel extraction with 7 batch size s3fd and 12 fan align; this assigns about 9.1gb of the vram to do this.
FS3 I can only run 3/10 and it will only assign about 5.9gb of vram. I also tested both using identical settings (ie, I tried extraction in FS2 with 3/10 batch sizes) and the actual extraction itself is about 10-15% slower in FS3.

2) Conversely, FS3 training will assign more Vram. I have an 8gb 3070 in my "training" laptop and FS3 will assign just under 7.3gb of Vram at max, compared to 7.1gb in regular. This does allow me to run slightly higher batch sizes.

3) Every model I have made leads to crashing/ Nans, and it happens quickly. So far the most long lasting model was 35k iterations, however the model itself started gaining loss after 15k iterations and at 27k the B face loss jumped immediately from 0.3 to 2.0! Just playing with settings, loss functions etc. Current model I am training the loss has again started to climb after 10k iterations.

Of course in all cases I am running mixed precision. Perhaps tomorrow I will start a model without it and see what happens.

*edit.
Well it didn't take long. To avoid any potential issues with phaze-a settings I started a new model, DFL-SAE at 160 with the DF arch. Without MP I can get a batch size of 2, which is lower than what I could get on my old laptop with a 6gb 1070 (could run at 3), and I got a NAN warning in 768 iterations.

*Edit 2.
Ok, well, problem solved. I'm pretty sure that LPIPS is an issue here, which makes sense given that this environment has newer cudatoolkits relative to my FS2 install. Will start a new phaze-a with just MS-SSIM and MAE and see how it goes.

*Edit 3.
Training VRAM allocation is significantly better, I can run my regular model at a batch size of 10 where in FS2 I can only do 8. It will actually allocate up to 7.7gb of my 8gb Vram, very impressed. So far dumping LPIPs I have 60k iterations in a new model today and no problems at all, getting a faster training rate than ever before, 28 eg/s on Phaze a/ effnetv2/b3.

Extraction speed issue and LPIPS redundancy aside, I highly recommend people install this. Excellent work !!!

Last edited by DeliciousCaramels on Thu Mar 27, 2025 9:54 am, edited 4 times in total.
User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

Thought I would try a quick convert to see how the new model is looking and I came up with an error - I tried to run preview mode and the output was slightly more verbose.

Code: Select all

Traceback (most recent call last):
  File "D:\fs3\lib\convert.py", line 203, in process
    image = self._patch_image(item)
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\fs3\lib\convert.py", line 267, in _patch_image
    new_image, background = self._get_new_image(predicted, frame_size)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\fs3\lib\convert.py", line 361, in _get_new_image
    predicted_mask = new_face[:, :, -1] if new_face.shape[2] == 4 else None
                                           ~~~~~~~~~~~~~~^^^IndexError: tuple index out of range

I'm not running predicted mask at all, I don't run learn mask cause I found it incompatible with "conv aware init".

User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

Been playing around again. I found on another model that the reason I was getting NaNs using LPIPS on my model was some of the faces were not well aligned. I understand that alignment doesn't have anything to do with training but removing about 500 images that had messed up alignments (even though faces appeared masked OK the obscuring or angling of the face meant the aligner was just a blob of points) allowed me to continue training it, so I thought I'd give FS3 another go.

Currently it seems running the dataset that was causing immediate NaN's previously absent the poorly aligned images allows training, no problems at all! So there must have been some images that for whatever reason when fed into the encoder just make it nope.

I've started new models to test back to back on FS2 and FS3 and same settings I can run FS3 at 7 i/s vs 4. I'm going to let FS3 run this model as long as possible now to see if I get NaNs as it progresses.

User avatar
BCD16
Posts: 3
Joined: Thu Jan 23, 2025 9:47 am

Re: Faceswap 3 open for testing

Post by BCD16 »

Hello everyone. I am adding my own Phaze-A 192px preset below. This preset was written by me on the basis of DeepFaceLab's original SAEHD LIAE-UDT preset. Save it as a .json file and load in Phaze-A. Use %100 face covering while training.

Use Bisenet-Fp masker for both A and B faces while extraction. Make sure you use same masker while training and "Warp To Landmarks" option enabled.

Make sure Mask type in Settings-Train-Global-Loss page selected as "Bisenet-Fp Face"

I am getting extremely good results. My GPU is a RTX 4090, and I can train 180 EGs/sec with a batch size 64 (mixed precision enabled). It is extremely good for a 192px preset.
Some packages I force upgraded and works well with FS3 beta branch. (For NVIDIA RTX 3000 and 4000 series these should work)

keras 3.3.3 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-ml-py 12.570.86 pypi_0 pypi
torch 2.8.0.dev20250327+cu128 pypi_0 pypi
torchaudio 2.6.0.dev20250401+cu128 pypi_0 pypi
torchvision 0.22.0.dev20250401+cu128 pypi_0 pypi

Code: Select all

{
  "output_size": 192,
  "shared_fc": "half",
  "enable_gblock": false,
  "split_fc": true,
  "split_gblock": false,
  "split_decoders": false,
  "enc_architecture": "fs_original",
  "enc_scaling": 13,
  "enc_load_weights": false,
  "bottleneck_type": "dense",
  "bottleneck_norm": "none",
  "bottleneck_size": 256,
  "bottleneck_in_encoder": false,
  "fc_depth": 1,
  "fc_min_filters": 512,
  "fc_max_filters": 512,
  "fc_dimensions": 8,
  "fc_filter_slope": -0.5,
  "fc_dropout": 0.0,
  "fc_upsampler": "subpixel",
  "fc_upsamples": 1,
  "fc_upsample_filters": 512,
  "fc_gblock_depth": 3,
  "fc_gblock_min_nodes": 512,
  "fc_gblock_max_nodes": 512,
  "fc_gblock_filter_slope": -0.5,
  "fc_gblock_dropout": 0.0,
  "dec_upscale_method": "subpixel",
  "dec_upscales_in_fc": 0,
  "dec_norm": "none",
  "dec_min_filters": 64,
  "dec_max_filters": 512,
  "dec_slope_mode": "full",
  "dec_filter_slope": -0.33,
  "dec_res_blocks": 1,
  "dec_output_kernel": 1,
  "dec_gaussian": false,
  "dec_skip_last_residual": false,
  "freeze_layers": "encoder",
  "load_layers": "encoder",
  "fs_original_depth": 4,
  "fs_original_min_filters": 64,
  "fs_original_max_filters": 512,
  "fs_original_use_alt": false,
  "mobilenet_width": 1.0,
  "mobilenet_depth": 1,
  "mobilenet_dropout": 0.001,
  "__filetype": "faceswap_preset",
  "__section": "train|model|phaze_a"
}
User avatar
DeliciousCaramels
Posts: 11
Joined: Wed Mar 05, 2025 7:57 am

Re: Faceswap 3 open for testing

Post by DeliciousCaramels »

Decided to play around a bit with extract to see if I could generate some useful testing information.

I note something interesting. In FS2 I can extract in parallel, using S3FD/ FAN, with batch sizes 5/14 respectively, using refeed setting of 4. I apply masks after extraction as I like to clean up the alignments first.

In FS3, it will only extract in parallel at 1/5. I cannot increase either the detector or aligner even a single BS point higher.
In single process, I can bump FS2 to 5/16, which does give me more stability for longer videos.

However in FS3, I can run detect at BS6, but the aligner will run out of memory/ crash if I set it above 5.
Interestingly, when I set single process in FS3, the process starts much faster than FS2; it begins detection within 10 seconds, whereas FS2 takes a good 30 or 40 seconds generally.

It seems to me from testing that the FAN aligner isn't working well with the new backend and I think it is something to do with VRAM allocation.

I did some A/B testing using parallel processing, and FS3's maximum setting of 1/5, and just let both extract the first 400 frames of the same video - all settings equal.

FS2 extraction took 1:50, during which time it consumed 6.7gb of VRAM

FS3 extraction took 2:59, during which time it consumed 4.6gb of VRAM.

Further, I did the same test running single process with the detector at a batch size of 5.

FS2 detected 400 frames in 23 seconds, again consuming 6.7gb of VRAM.

FS3 took 18 seconds and consumed only 6gb of VRAM. So detector works better!

I hope this is helpful.

Last edited by DeliciousCaramels on Tue Apr 29, 2025 12:33 pm, edited 1 time in total.
Post Reply