Page 1 of 1
Out of Memory & Python Crash
Posted: Thu Aug 29, 2019 4:32 pm
by reddc143c
Hi,
I'm trying to train my first model, however am running into an issue with the following error message (and subsequent crash of python).
Code: Select all
2019-08-29 16:19:54.889887: E tensorflow/stream_executor/cuda/cuda_driver:828] failed to allocate 12.32G (13231885056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-08-29 16:19:55.732980: E tensorflow/stream_executor/cuda/cuda_driver:828] failed to allocate 11.09G (11908696064 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-08-29 16:19:56.563352: E tensorflow/stream_executor/cuda/cuda_driver:828] failed to allocate 9.98G (10717826048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
The machine has a NVIDIA Tesla V100 (16GB) so I'm not sure why it's failing to allocate. Any help would be appreciated.
Thanks
Re: Out of Memory & Python Crash
Posted: Thu Aug 29, 2019 4:50 pm
by torzdf
Please provide the crash report from your faceswap folder. There isn't enough to go on here.
Re: Out of Memory & Python Crash
Posted: Thu Aug 29, 2019 7:51 pm
by reddc143c
Unfortunately I don't think it even gets to a point where it generates a crash report since python crashes (unless I'm looking in the wrong spot? should it be in the install directory e.g., C:\Users\[user]\faceswap?).
Here's a windows event log entry of the python crash:
Code: Select all
- <Event xmlns="http://schemas.microsoft. com/win/2004/08/events/event">
- <System>
<Provider Name="Application Error" />
<EventID Qualifiers="0">1000</EventID>
<Level>2</Level>
<Task>100</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2019-08-29T16:16:39.387881700Z" />
<EventRecordID>969</EventRecordID>
<Channel>Application</Channel>
<Computer>nv-window-server-2016-425-31-v201904160130</Computer>
<Security />
</System>
- <EventData>
<Data>python.exe</Data>
<Data>3.6.9150.1013</Data>
<Data>5d409475</Data>
<Data>ucrtbase.dll</Data>
<Data>10.0.14393.2879</Data>
<Data>5c89e9e2</Data>
<Data>c0000409</Data>
<Data>000000000006e92e</Data>
<Data>1164</Data>
<Data>01d55e851c08b53e</Data>
<Data>C:\Users\[user]\MiniConda3\envs\faceswap\python.exe</Data>
<Data>C:\Windows\System32\ucrtbase.dll</Data>
<Data>4beb4a3e-1656-4ca1-91d6-f6e655315b46</Data>
<Data />
<Data />
</EventData>
</Event>
and system info:
Code: Select all
============ System Information ============
encoding: cp1252
git_branch: master
git_commits: 1c3b9d9 Bugfix: Clean font list selection for config
gpu_cuda: No global version found. Check Conda packages for Conda Cuda
gpu_cudnn: No global version found. Check Conda packages for Conda cuDNN
gpu_devices: GPU_0: Tesla V100-SXM2-16GB
gpu_devices_active: GPU_0
gpu_driver: 425.31
gpu_vram: GPU_0: 16384MB
os_machine: AMD64
os_platform: Windows-10-10.0.14393-SP0
os_release: 10
py_command: C:\Users\[user]\faceswap/faceswap. py gui
py_conda_version: conda 4.7.11
py_implementation: CPython
py_version: 3.6.9
py_virtual_env: True
sys_cores: 4
sys_processor: Intel64 Family 6 Model 79 Stepping 0, GenuineIntel
sys_ram: Total: 16383MB, Available: 14681MB, Used: 1701MB, Free: 14681MB
=============== Pip Packages ===============
absl-py==0.7.1
astor==0.8.0
certifi==2019.6.16
cloudpickle==1.2.1
cycler==0.10.0
cytoolz==0.10.0
dask==2.3.0
decorator==4.4.0
fastcluster==1.1.25
ffmpy==0.2.2
gast==0.2.2
grpcio==1.16.1
h5py==2.9.0
imageio==2.5.0
imageio-ffmpeg==0.3.0
joblib==0.13.2
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==2.2.2
mkl-fft==1.0.14
mkl-random==1.0.2
mkl-service==2.0.2
networkx==2.3
numpy==1.16.2
nvidia-ml-py3==7.352.1
olefile==0.46
opencv-python==4.1.0.25
pathlib==1.0.1
Pillow==6.1.0
protobuf==3.8.0
psutil==5.6.3
pyparsing==2.4.2
pyreadline==2.1
python-dateutil==2.8.0
pytz==2019.2
PyWavelets==1.0.3
pywin32==223
PyYAML==5.1.2
scikit-image==0.15.0
scikit-learn==0.21.2
scipy==1.3.1
six==1.12.0
tensorboard==1.14.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0
termcolor==1.1.0
toolz==0.10.0
toposort==1.5
tornado==6.0.3
tqdm==4.32.1
Werkzeug==0.15.5
wincertstore==0.2
wrapt==1.11.2
============== Conda Packages ==============
# packages in environment at C:\Users\[user]\MiniConda3\envs\faceswap:
#
# Name Version Build Channel
_tflow_select 2.1.0 gpu
absl-py 0.7.1 py36_0
astor 0.8.0 py36_0
blas 1.0 mkl
ca-certificates 2019.5.15 1
certifi 2019.6.16 py36_1
cloudpickle 1.2.1 py_0
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
cycler 0.10.0 py36h009560c_0
cytoolz 0.10.0 py36he774522_0
dask-core 2.3.0 py_0
decorator 4.4.0 py36_1
fastcluster 1.1.25 py36h830ac7b_1000 conda-forge
ffmpeg 4.2 h6538335_0 conda-forge
ffmpy 0.2.2 pypi_0 pypi
freetype 2.9.1 ha9979f8_1
gast 0.2.2 py36_0
grpcio 1.16.1 py36h351948d_1
h5py 2.9.0 py36h5e291fa_0
hdf5 1.10.4 h7ebc959_0
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha66f8fd_1
imageio 2.5.0 py36_0
imageio-ffmpeg 0.3.0 py_0 conda-forge
intel-openmp 2019.4 245
joblib 0.13.2 py36_0
jpeg 9b hb83a4c4_2
keras 2.2.4 0
keras-applications 1.0.8 py_0
keras-base 2.2.4 py36_0
keras-preprocessing 1.1.0 py_1
kiwisolver 1.1.0 py36ha925a31_0
libmklml 2019.0.5 0
libpng 1.6.37 h2a8f88b_0
libprotobuf 3.8.0 h7bd577a_0
libtiff 4.0.10 hb898794_2
markdown 3.1.1 py36_0
matplotlib 2.2.2 py36had4c4a9_2
mkl 2019.4 245
mkl-service 2.0.2 py36he774522_0
mkl_fft 1.0.14 py36h14836fe_0
mkl_random 1.0.2 py36h343c172_0
networkx 2.3 py_0
numpy 1.16.2 py36h19fb1c0_0
numpy-base 1.16.2 py36hc3f5095_0
nvidia-ml-py3 7.352.1 pypi_0 pypi
olefile 0.46 py36_0
opencv-python 4.1.0.25 pypi_0 pypi
openssl 1.1.1c he774522_1
pathlib 1.0.1 py36_1
pillow 6.1.0 py36hdc69c19_0
pip 19.2.2 py36_0
protobuf 3.8.0 py36h33f27b4_0
psutil 5.6.3 py36he774522_0
pyparsing 2.4.2 py_0
pyqt 5.9.2 py36h6538335_2
pyreadline 2.1 py36_1
python 3.6.9 h5500b2f_0
python-dateutil 2.8.0 py36_0
pytz 2019.2 py_0
pywavelets 1.0.3 py36h8c2d366_1
pywin32 223 py36hfa6e2cd_1
pyyaml 5.1.2 py36he774522_0
qt 5.9.7 vc14h73c81de_0
scikit-image 0.15.0 py36ha925a31_0
scikit-learn 0.21.2 py36h6288b17_0
scipy 1.3.1 py36h29ff71c_0
setuptools 41.0.1 py36_0
sip 4.19.8 py36h6538335_0
six 1.12.0 py36_0
sqlite 3.29.0 he774522_0
tensorboard 1.14.0 py36he3c9ec2_0
tensorflow 1.14.0 gpu_py36h305fd99_0
tensorflow-base 1.14.0 gpu_py36h55fc52a_0
tensorflow-estimator 1.14.0 py_0
tensorflow-gpu 1.14.0 h0d30ee6_0
termcolor 1.1.0 py36_1
tk 8.6.8 hfa6e2cd_0
toolz 0.10.0 py_0
toposort 1.5 py_3 conda-forge
tornado 6.0.3 py36he774522_0
tqdm 4.32.1 py_0
vc 14.1 h0510ff6_4
vs2015_runtime 14.15.26706 h3a45250_4
werkzeug 0.15.5 py_0
wheel 0.33.4 py36_0
wincertstore 0.2 py36h7fe50ca_0
wrapt 1.11.2 py36he774522_0
xz 5.2.4 h2fa13f4_4
yaml 0.1.7 hc54c509_2
zlib 1.2.11 h62dcd97_3
zstd 1.3.7 h508b16e_0
anything else I could provide that might help?
Re: Out of Memory & Python Crash
Posted: Thu Aug 29, 2019 8:48 pm
by bryanlyon
In the faceswap folder, either the crash log or faceswap.log if no crash log exists.
Re: Out of Memory & Python Crash
Posted: Thu Aug 29, 2019 9:46 pm
by torzdf
Ultimately if FS says you're out of VRAM, you're out of VRAM.
Without knowing the specifics of the model you're training, its settings, the batch size, it is impossible to give you any more than this.
See:
https://faceswap.dev/forum/app.php/faqpage#f3r10
Re: Out of Memory & Python Crash
Posted: Fri Aug 30, 2019 4:44 pm
by reddc143c
Oh, found it.
Code: Select all
08/29/2019 16:19:49 MainProcess MainThread logger log_setup INFO Log level set to: INFO
08/29/2019 16:19:51 MainProcess MainThread train get_images INFO Model A Directory: C:\Users\[user]\Desktop\faceswap\faces\bjm
08/29/2019 16:19:51 MainProcess MainThread train get_images INFO Model B Directory: C:\Users\[user]\Desktop\faceswap\faces\jrt
08/29/2019 16:19:51 MainProcess MainThread train process INFO Training data directory: C:\Users\[user]\Desktop\faceswap\models
08/29/2019 16:19:51 MainProcess MainThread train monitor INFO ===================================================
08/29/2019 16:19:51 MainProcess MainThread train monitor INFO Starting
08/29/2019 16:19:51 MainProcess MainThread train monitor INFO Press 'Terminate' to save and quit
08/29/2019 16:19:51 MainProcess MainThread train monitor INFO ===================================================
08/29/2019 16:19:52 MainProcess training_0 train training INFO Loading data, this may take a while...
08/29/2019 16:19:52 MainProcess training_0 plugin_loader _import INFO Loading Model from Original plugin...
08/29/2019 16:19:52 MainProcess training_0 _base load WARNING No existing state file found. Generating.
08/29/2019 16:19:52 MainProcess training_0 deprecation_wrapper __getattr__ WARNING From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.\n
08/29/2019 16:19:52 MainProcess training_0 deprecation_wrapper __getattr__ WARNING From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.\n
08/29/2019 16:19:52 MainProcess training_0 deprecation_wrapper __getattr__ WARNING From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.\n
08/29/2019 16:19:52 MainProcess training_0 deprecation_wrapper __getattr__ WARNING From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.\n
08/29/2019 16:19:52 MainProcess training_0 deprecation_wrapper __getattr__ WARNING From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.\n
Re: Out of Memory & Python Crash
Posted: Fri Aug 30, 2019 4:47 pm
by reddc143c
torzdf wrote: ↑Thu Aug 29, 2019 9:46 pm
Ultimately if FS says you're out of VRAM, you're out of VRAM.
Without knowing the specifics of the model you're training, its settings, the batch size, it is impossible to give you any more than this.
See:
https://faceswap.dev/forum/app.php/faqpage#f3r10
I get that, just curious why. I am using default settings(original trainer, batch size is 32, tried 16 as well).. I tried running this on my local PC with a GTX 1070 and runs fine, but decided to try running it in a cloud instance with a higher powered (and more vram) GPU.
Re: Out of Memory & Python Crash
Posted: Fri Aug 30, 2019 11:53 pm
by bryanlyon
This information is unfortunately insufficient. We'd really need a full log of the crash or operation to diagnose.
The only thing I can say is that it's probably your system itself, a 4 core (Probably only dual with hyperthreading) and 16gb of system ram (With 14gb free) is probably simply not enough to push a v100 in any way shape or form. In addition, V100 drivers in Windows are not really very robust.
For advice on CPUs please see our hardware guide at https://faceswap.dev/forum/viewtopic.php?f=16&t=10 . But I believe that your current setup is at fault but would really need a crash log to see why.