Alright so these windows users must be making you lose your mind!
I have made this work now after much trial and error.
Firstly, I follow the basic instructions but before I run the install script I install pytorch into the environment via this command -
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
The version of numpy packaged with this is 1.24.3 which causes an error, and so again before running the install script I try installing numpy 1.26.4 as per the requirements, however, this installs a different version of pytorch with cpu in the name, and it won't train. So, I install 1.26.4 via pip.
I check that I have functioning cuda by running the script from this page
https://phphe.com/blog/install-pytorch-manually
Matplotlib needs to be installed manually.
It all installs fine and I can train!
If I train a model started with Conv aware I get this warning
Code: Select all
C:\Users\admin\MiniConda3\envs\fs3\Lib\site-packages\keras\src\backend\torch\nn.py:416: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\Convolution.cpp:1032.)
outputs = tnn.conv2d(
Training iteration speed seems slower in comparison to FS2. In FS2 running my default max settings (phaze-a) I get about 22 eg/s in fs2, in fs3 I'm getting about 9 eg/s.
I get roughly 4000 iterations per hour in fs2 and 2500 in FS3 training using the same settings.
It could be the case that there is some issue with the way I have installed? Googling indicates the warning is due to a mismatch with CUDNN versions but all my cuda packages come from the same source.
Here are my packages and system settings etc., let me know if anything stands out as wrong.
Code: Select all
============ System Information ============
backend: nvidia
encoding: cp1252
git_branch: fs3
git_commits: 8073752 Merge branch 'master' into fs3 | cbaad14 Bugfix: Linux installer - pin git to < 2.45 | 6fe300e pin numpy to < 2.0
gpu_cuda: No global version found. Check Conda packages for Conda Cuda
gpu_cudnn: No global version found. Check Conda packages for Conda cuDNN
gpu_devices: GPU_0: NVIDIA GeForce RTX 3070 Laptop GPU
gpu_devices_active: GPU_0
gpu_driver: 565.90
gpu_vram: GPU_0: 8192MB (133MB free)
os_machine: AMD64
os_platform: Windows-10-10.0.19045-SP0
os_release: 10
py_conda_version: conda 25.3.0
py_implementation: CPython
py_version: 3.11.11
py_virtual_env: True
sys_cores: 16
sys_processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
sys_ram: Total: 32547MB, Available: 13585MB, Used: 18961MB, Free: 13585MB
=============== Pip Packages ===============
absl-py==2.2.0
Brotli @ file:///C:/b/abs_c415aux9ra/croot/brotli-split_1736182803933/work
certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1739515848642/work/certifi
charset-normalizer @ file:///croot/charset-normalizer_1721748349566/work
colorama==0.4.6
contourpy @ file:///C:/b/abs_f2u2o_8s9g/croot/contourpy_1732540071787/work
cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work
fastcluster==1.2.6
ffmpy==0.5.0
filelock @ file:///C:/b/abs_f2gie28u58/croot/filelock_1700591233643/work
fonttools @ file:///C:/b/abs_4crkswws2h/croot/fonttools_1737040078745/work
gmpy2 @ file:///C:/b/abs_d8ki0o0h97/croot/gmpy2_1738085498525/work
grpcio==1.71.0
h5py==3.13.0
idna @ file:///C:/b/abs_aad84bnnw5/croot/idna_1714398896795/work
imageio==2.37.0
imageio-ffmpeg==0.6.0
Jinja2 @ file:///C:/b/abs_920kup4e6u/croot/jinja2_1741711580669/work
joblib==1.4.2
keras==3.3.3
kiwisolver @ file:///C:/b/abs_faf90xet7a/croot/kiwisolver_1737040915779/work
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe @ file:///C:/b/abs_a0ma7ge0jc/croot/markupsafe_1738584052792/work
matplotlib==3.10.0
mdurl==0.1.2
mkl-fft==1.3.1
mkl-random @ file:///C:/ci_311/mkl_random_1676481991689/work
mkl-service==2.4.0
ml_dtypes==0.5.1
mpmath @ file:///C:/b/abs_7833jrbiox/croot/mpmath_1690848321154/work
namex==0.0.8
networkx @ file:///C:/b/abs_b054htfn9t/croot/networkx_1737043671910/work
numexpr==2.10.2
numpy==1.26.4
nvidia-ml-py==12.570.86
opencv-python==4.11.0.86
optree==0.14.1
packaging @ file:///C:/b/abs_3by6s2fa66/croot/packaging_1734472138782/work
pillow @ file:///C:/b/abs_b50vowcrzo/croot/pillow_1738010273782/work
protobuf==6.30.1
psutil==7.0.0
Pygments==2.19.1
pyparsing @ file:///C:/b/abs_40z8gyj9wi/croot/pyparsing_1731445739241/work
PyQt6==6.7.1
PyQt6_sip @ file:///C:/b/abs_28s7k4h_hl/croot/pyqt-split_1740498234166/work/pyqt_sip
PySocks @ file:///C:/ci_311/pysocks_1676425991111/work
python-dateutil @ file:///C:/b/abs_3au_koqnbs/croot/python-dateutil_1716495777160/work
pywin32==310
pywinpty==2.0.15
PyYAML @ file:///C:/b/abs_14xkfs39bx/croot/pyyaml_1728657968772/work
requests @ file:///C:/b/abs_c3508vg8ez/croot/requests_1731000584867/work
rich==13.9.4
scikit-learn==1.6.1
scipy==1.15.2
sip @ file:///C:/b/abs_5cto136kse/croot/sip_1738856220313/work
six @ file:///tmp/build/80754af9/six_1644875935023/work
sympy @ file:///C:/b/abs_b4u17p23yg/croot/sympy_1738108511395/work
tensorboard==2.19.0
tensorboard-data-server==0.7.2
threadpoolctl==3.6.0
torch==2.3.1
torchaudio==2.3.1
torchvision==0.18.1
tornado @ file:///C:/b/abs_7cyu943ybx/croot/tornado_1733960510898/work
tqdm==4.67.1
typing_extensions @ file:///C:/b/abs_0ffjxtihug/croot/typing_extensions_1734714875646/work
unicodedata2 @ file:///C:/b/abs_dfnftvxi4k/croot/unicodedata2_1736543771112/work
urllib3 @ file:///C:/b/abs_7bst06lizn/croot/urllib3_1737133657081/work
Werkzeug==3.1.3
win-inet-pton @ file:///C:/ci_311/win_inet_pton_1676425458225/work
============== Conda Packages ==============
# packages in environment at C:\Users\daniel\MiniConda3\envs\fs3:
#
# Name Version Build Channel
absl-py 2.2.0 pypi_0 pypi
blas 1.0 mkl
brotli-python 1.0.9 py311h5da7b33_9
bzip2 1.0.8 h2bbff1b_6
ca-certificates 2025.2.25 haa95532_0
certifi 2025.1.31 pyhd8ed1ab_0 conda-forge
charset-normalizer 3.3.2 pyhd3eb1b0_0
colorama 0.4.6 pypi_0 pypi
contourpy 1.3.1 py311h214f63a_0
cuda-cccl 12.8.90 0 nvidia
cuda-cccl_win-64 12.8.90 0 nvidia
cuda-cudart 12.1.105 0 nvidia
cuda-cudart-dev 12.1.105 0 nvidia
cuda-cupti 12.1.105 0 nvidia
cuda-libraries 12.1.0 0 nvidia
cuda-libraries-dev 12.1.0 0 nvidia
cuda-nvrtc 12.1.105 0 nvidia
cuda-nvrtc-dev 12.1.105 0 nvidia
cuda-nvtx 12.1.105 0 nvidia
cuda-opencl 12.8.90 0 nvidia
cuda-opencl-dev 12.8.90 0 nvidia
cuda-profiler-api 12.8.90 0 nvidia
cuda-runtime 12.1.0 0 nvidia
cuda-version 12.8 3 nvidia
cycler 0.11.0 pyhd3eb1b0_0
fastcluster 1.2.6 pypi_0 pypi
ffmpy 0.5.0 pypi_0 pypi
filelock 3.13.1 py311haa95532_0
fonttools 4.55.3 py311h827c3e9_0
freetype 2.12.1 ha860e81_0
git 2.45.2 haa95532_1
gmp 6.3.0 h537511b_0
gmpy2 2.2.1 py311h827c3e9_0
grpcio 1.71.0 pypi_0 pypi
h5py 3.13.0 pypi_0 pypi
icu 73.1 h6c2663c_0
idna 3.7 py311haa95532_0
imageio 2.37.0 pypi_0 pypi
imageio-ffmpeg 0.6.0 pypi_0 pypi
intel-openmp 2021.4.0 haa95532_3556
jinja2 3.1.6 py311haa95532_0
joblib 1.4.2 pypi_0 pypi
jpeg 9e h827c3e9_3
keras 3.3.3 pypi_0 pypi
khronos-opencl-icd-loader 2024.05.08 h8cc25b3_0
kiwisolver 1.4.8 py311h5da7b33_0
krb5 1.20.1 h5b6d351_0
lcms2 2.16 hb4a4139_0
lerc 4.0.0 h5da7b33_0
libcublas 12.1.0.26 0 nvidia
libcublas-dev 12.1.0.26 0 nvidia
libcufft 11.0.2.4 0 nvidia
libcufft-dev 11.0.2.4 0 nvidia
libcurand 10.3.9.90 0 nvidia
libcurand-dev 10.3.9.90 0 nvidia
libcusolver 11.4.4.55 0 nvidia
libcusolver-dev 11.4.4.55 0 nvidia
libcusparse 12.0.2.55 0 nvidia
libcusparse-dev 12.0.2.55 0 nvidia
libdeflate 1.22 h5bf469e_0
libffi 3.4.4 hd77b12b_1
libjpeg-turbo 2.0.0 h196d8e1_0
libnpp 12.0.2.50 0 nvidia
libnpp-dev 12.0.2.50 0 nvidia
libnvjitlink 12.1.105 0 nvidia
libnvjitlink-dev 12.1.105 0 nvidia
libnvjpeg 12.1.1.14 0 nvidia
libnvjpeg-dev 12.1.1.14 0 nvidia
libpng 1.6.39 h8cc25b3_0
libpq 17.4 h70ee33d_0
libtiff 4.5.1 h44ae7cf_1
libuv 1.48.0 h827c3e9_0
libwebp-base 1.3.2 h3d04722_1
libzlib 1.2.13 hcfcfb64_4 conda-forge
libzlib-wapi 1.2.13 hcfcfb64_4 conda-forge
lz4-c 1.9.4 h2bbff1b_1
markdown 3.7 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 3.0.2 py311h827c3e9_0
matplotlib 3.10.0 py311haa95532_0
matplotlib-base 3.10.0 py311he19b0ae_0
mdurl 0.1.2 pypi_0 pypi
mkl 2021.4.0 haa95532_640
mkl-service 2.4.0 py311h2bbff1b_0
mkl_fft 1.3.1 py311h743a336_0
mkl_random 1.2.2 py311heda8569_0
ml-dtypes 0.5.1 pypi_0 pypi
mpc 1.3.1 h827c3e9_0
mpfr 4.2.1 h56c3642_0
mpmath 1.3.0 py311haa95532_0
namex 0.0.8 pypi_0 pypi
networkx 3.4.2 py311haa95532_0
numexpr 2.10.2 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-ml-py 12.570.86 pypi_0 pypi
opencv-python 4.11.0.86 pypi_0 pypi
openjpeg 2.5.2 hae555c5_0
openssl 3.1.0 hcfcfb64_3 conda-forge
optree 0.14.1 pypi_0 pypi
packaging 24.2 py311haa95532_0
pillow 11.1.0 py311h096bfcc_0
pip 25.0 py311haa95532_0
protobuf 6.30.1 pypi_0 pypi
psutil 7.0.0 pypi_0 pypi
pygments 2.19.1 pypi_0 pypi
pyparsing 3.2.0 py311haa95532_0
pyqt 6.7.1 py311h5da7b33_0
pyqt6-sip 13.9.1 py311h827c3e9_0
pysocks 1.7.1 py311haa95532_0
python 3.11.11 h4607a30_0
python-dateutil 2.9.0post0 py311haa95532_2
pytorch 2.3.1 py3.11_cuda12.1_cudnn8_0 pytorch
pytorch-cuda 12.1 hde6ce7c_6 pytorch
pytorch-mutex 1.0 cuda pytorch
pywin32 310 pypi_0 pypi
pyyaml 6.0.2 py311h827c3e9_0
qtbase 6.7.2 h0804d20_1
qtdeclarative 6.7.2 h5da7b33_0
qtsvg 6.7.2 hf2fb9eb_0
qttools 6.7.2 h0de5f00_0
qtwebchannel 6.7.2 h5da7b33_0
qtwebsockets 6.7.2 h5da7b33_0
requests 2.32.3 py311haa95532_1
rich 13.9.4 pypi_0 pypi
scikit-learn 1.6.1 pypi_0 pypi
scipy 1.15.2 pypi_0 pypi
setuptools 75.8.0 py311haa95532_0
sip 6.10.0 py311h5da7b33_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.45.3 h2bbff1b_0
sympy 1.13.3 py311haa95532_1
tensorboard 2.19.0 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
threadpoolctl 3.6.0 pypi_0 pypi
tk 8.6.14 h0416ee5_0
torchaudio 2.3.1 pypi_0 pypi
torchvision 0.18.1 pypi_0 pypi
tornado 6.4.2 py311h827c3e9_0
tqdm 4.67.1 pypi_0 pypi
typing_extensions 4.12.2 py311haa95532_0
tzdata 2025a h04d1e81_0
ucrt 10.0.22621.0 h57928b3_1 conda-forge
unicodedata2 15.1.0 py311h827c3e9_1
urllib3 2.3.0 py311haa95532_0
vc 14.42 haa95532_4
vs2015_runtime 14.42.34433 he0abc0d_4
werkzeug 3.1.3 pypi_0 pypi
wheel 0.45.1 py311haa95532_0
win_inet_pton 1.1.0 py311haa95532_0
xz 5.6.4 h4754444_1
yaml 0.2.5 he774522_0
zlib 1.2.13 hcfcfb64_4 conda-forge
zlib-wapi 1.2.13 hcfcfb64_4 conda-forge
zstd 1.5.6 h8880b57_0
On the plus side, I have a model running in effect the dny256 presets with efficientnetv2/b3 encoder, and at 150k iterations running mixed precision I got a NAN in fs2 (4 learning rate, adabelief, -13 epsilon), and even rolling back to 130k iterations I couldn't run training speed over 1.5 with MP disabled, it would NAN after about 100 iterations so I abandoned the model.
Loaded the 150k iteration version into FS3, MP, 4e5 learning rate, so far no issues after 1k iterations and so there seems to be some benefit to the new back end.
**Edit. As you would expect, despite there being no NANs, the model is ruined and after 5000 iterations the images look cursed, purple and green. Interesting though that no NANs occurred, could it be the case the NAN protection is not working in FS3?
**edit two : started a new model using efficientnetb3 with the RMS-Prop optimiser (a paper states its the most accurate model with b3? Give it a go!)
Epsilon at -4, learning rate 4e5 and MP enabled, got a NAN in less than 10k iterations. Wild!
Another thing, it appears there is some error with DFL-SAE model compatibility; created a new model at 256 res and all other settings standard, batch size 4. Training speed is extremely slow, it runs about 1 iteration every 3 seconds, and exiting training causes a crash. I can investigate/ provide more info on this but I would suspect it relates to an incorrect package?
Edit three : Thinking about it, there are a lot more options for the optimizers. I get much better training rates (about 18, and I can run a batch size of 8) running RMSProp which I believe is because Adabelief is relatively vram heavy. Is it plausible that the slower training rate with my "default" settings is due to the additional optimiser options?
Edit four: RMSprop needs some more research, nothing but NANS and the rate of actual change is woeful, 10k iterations was my best go today and it looked nothing like the results I have been getting with adab/ Efficient, not even to the point of the "hollowed out" faces.
Have tried everything from EEs up to -8, learning rates down to 1; turned on gradient clipping. So I instead tried AdamW, so far after a mere 1k iterations I had the hollowed out/ pumpkin head looking faces. After the 1250 iteration model update I have the blown out/ bright red and yellow faces. Is this model collapsing already?
A lot of interesting differences going on here in general. I hope all users get a chance to play with this and hopefully someone with a little more computing power than I have finds some cool stuff 