Out of Memory & Python Crash

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
reddc143c
Posts: 4
Joined: Thu Aug 29, 2019 4:27 pm

Out of Memory & Python Crash

Post by reddc143c »

Hi,

I'm trying to train my first model, however am running into an issue with the following error message (and subsequent crash of python).

Code: Select all

2019-08-29 16:19:54.889887: E tensorflow/stream_executor/cuda/cuda_driver:828] failed to allocate 12.32G (13231885056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-08-29 16:19:55.732980: E tensorflow/stream_executor/cuda/cuda_driver:828] failed to allocate 11.09G (11908696064 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-08-29 16:19:56.563352: E tensorflow/stream_executor/cuda/cuda_driver:828] failed to allocate 9.98G (10717826048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

The machine has a NVIDIA Tesla V100 (16GB) so I'm not sure why it's failing to allocate. Any help would be appreciated.

Thanks

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory & Python Crash

Post by torzdf »

Please provide the crash report from your faceswap folder. There isn't enough to go on here.

My word is final

User avatar
reddc143c
Posts: 4
Joined: Thu Aug 29, 2019 4:27 pm

Re: Out of Memory & Python Crash

Post by reddc143c »

Unfortunately I don't think it even gets to a point where it generates a crash report since python crashes (unless I'm looking in the wrong spot? should it be in the install directory e.g., C:\Users\[user]\faceswap?).

Here's a windows event log entry of the python crash:

Code: Select all

- <Event xmlns="http://schemas.microsoft. com/win/2004/08/events/event">
- <System>
  <Provider Name="Application Error" /> 
  <EventID Qualifiers="0">1000</EventID> 
  <Level>2</Level> 
  <Task>100</Task> 
  <Keywords>0x80000000000000</Keywords> 
  <TimeCreated SystemTime="2019-08-29T16:16:39.387881700Z" /> 
  <EventRecordID>969</EventRecordID> 
  <Channel>Application</Channel> 
  <Computer>nv-window-server-2016-425-31-v201904160130</Computer> 
  <Security /> 
  </System>
- <EventData>
  <Data>python.exe</Data> 
  <Data>3.6.9150.1013</Data> 
  <Data>5d409475</Data> 
  <Data>ucrtbase.dll</Data> 
  <Data>10.0.14393.2879</Data> 
  <Data>5c89e9e2</Data> 
  <Data>c0000409</Data> 
  <Data>000000000006e92e</Data> 
  <Data>1164</Data> 
  <Data>01d55e851c08b53e</Data> 
  <Data>C:\Users\[user]\MiniConda3\envs\faceswap\python.exe</Data> 
  <Data>C:\Windows\System32\ucrtbase.dll</Data> 
  <Data>4beb4a3e-1656-4ca1-91d6-f6e655315b46</Data> 
  <Data /> 
  <Data /> 
  </EventData>
  </Event>

and system info:

Code: Select all

============ System Information ============
encoding:            cp1252
git_branch:          master
git_commits:         1c3b9d9 Bugfix: Clean font list selection for config
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: Tesla V100-SXM2-16GB
gpu_devices_active:  GPU_0
gpu_driver:          425.31
gpu_vram:            GPU_0: 16384MB
os_machine:          AMD64
os_platform:         Windows-10-10.0.14393-SP0
os_release:          10
py_command:          C:\Users\[user]\faceswap/faceswap. py gui
py_conda_version:    conda 4.7.11
py_implementation:   CPython
py_version:          3.6.9
py_virtual_env:      True
sys_cores:           4
sys_processor:       Intel64 Family 6 Model 79 Stepping 0, GenuineIntel
sys_ram:             Total: 16383MB, Available: 14681MB, Used: 1701MB, Free: 14681MB

=============== Pip Packages ===============
absl-py==0.7.1
astor==0.8.0
certifi==2019.6.16
cloudpickle==1.2.1
cycler==0.10.0
cytoolz==0.10.0
dask==2.3.0
decorator==4.4.0
fastcluster==1.1.25
ffmpy==0.2.2
gast==0.2.2
grpcio==1.16.1
h5py==2.9.0
imageio==2.5.0
imageio-ffmpeg==0.3.0
joblib==0.13.2
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==2.2.2
mkl-fft==1.0.14
mkl-random==1.0.2
mkl-service==2.0.2
networkx==2.3
numpy==1.16.2
nvidia-ml-py3==7.352.1
olefile==0.46
opencv-python==4.1.0.25
pathlib==1.0.1
Pillow==6.1.0
protobuf==3.8.0
psutil==5.6.3
pyparsing==2.4.2
pyreadline==2.1
python-dateutil==2.8.0
pytz==2019.2
PyWavelets==1.0.3
pywin32==223
PyYAML==5.1.2
scikit-image==0.15.0
scikit-learn==0.21.2
scipy==1.3.1
six==1.12.0
tensorboard==1.14.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0
termcolor==1.1.0
toolz==0.10.0
toposort==1.5
tornado==6.0.3
tqdm==4.32.1
Werkzeug==0.15.5
wincertstore==0.2
wrapt==1.11.2

============== Conda Packages ==============
# packages in environment at C:\Users\[user]\MiniConda3\envs\faceswap:
#
# Name                    Version                   Build  Channel
_tflow_select             2.1.0                       gpu  
absl-py 0.7.1 py36_0
astor 0.8.0 py36_0
blas 1.0 mkl
ca-certificates 2019.5.15 1
certifi 2019.6.16 py36_1
cloudpickle 1.2.1 py_0
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
cycler 0.10.0 py36h009560c_0
cytoolz 0.10.0 py36he774522_0
dask-core 2.3.0 py_0
decorator 4.4.0 py36_1
fastcluster 1.1.25 py36h830ac7b_1000 conda-forge ffmpeg 4.2 h6538335_0 conda-forge ffmpy 0.2.2 pypi_0 pypi freetype 2.9.1 ha9979f8_1
gast 0.2.2 py36_0
grpcio 1.16.1 py36h351948d_1
h5py 2.9.0 py36h5e291fa_0
hdf5 1.10.4 h7ebc959_0
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha66f8fd_1
imageio 2.5.0 py36_0
imageio-ffmpeg 0.3.0 py_0 conda-forge intel-openmp 2019.4 245
joblib 0.13.2 py36_0
jpeg 9b hb83a4c4_2
keras 2.2.4 0
keras-applications 1.0.8 py_0
keras-base 2.2.4 py36_0
keras-preprocessing 1.1.0 py_1
kiwisolver 1.1.0 py36ha925a31_0
libmklml 2019.0.5 0
libpng 1.6.37 h2a8f88b_0
libprotobuf 3.8.0 h7bd577a_0
libtiff 4.0.10 hb898794_2
markdown 3.1.1 py36_0
matplotlib 2.2.2 py36had4c4a9_2
mkl 2019.4 245
mkl-service 2.0.2 py36he774522_0
mkl_fft 1.0.14 py36h14836fe_0
mkl_random 1.0.2 py36h343c172_0
networkx 2.3 py_0
numpy 1.16.2 py36h19fb1c0_0
numpy-base 1.16.2 py36hc3f5095_0
nvidia-ml-py3 7.352.1 pypi_0 pypi olefile 0.46 py36_0
opencv-python 4.1.0.25 pypi_0 pypi openssl 1.1.1c he774522_1
pathlib 1.0.1 py36_1
pillow 6.1.0 py36hdc69c19_0
pip 19.2.2 py36_0
protobuf 3.8.0 py36h33f27b4_0
psutil 5.6.3 py36he774522_0
pyparsing 2.4.2 py_0
pyqt 5.9.2 py36h6538335_2
pyreadline 2.1 py36_1
python 3.6.9 h5500b2f_0
python-dateutil 2.8.0 py36_0
pytz 2019.2 py_0
pywavelets 1.0.3 py36h8c2d366_1
pywin32 223 py36hfa6e2cd_1
pyyaml 5.1.2 py36he774522_0
qt 5.9.7 vc14h73c81de_0
scikit-image 0.15.0 py36ha925a31_0
scikit-learn 0.21.2 py36h6288b17_0
scipy 1.3.1 py36h29ff71c_0
setuptools 41.0.1 py36_0
sip 4.19.8 py36h6538335_0
six 1.12.0 py36_0
sqlite 3.29.0 he774522_0
tensorboard 1.14.0 py36he3c9ec2_0
tensorflow 1.14.0 gpu_py36h305fd99_0
tensorflow-base 1.14.0 gpu_py36h55fc52a_0
tensorflow-estimator 1.14.0 py_0
tensorflow-gpu 1.14.0 h0d30ee6_0
termcolor 1.1.0 py36_1
tk 8.6.8 hfa6e2cd_0
toolz 0.10.0 py_0
toposort 1.5 py_3 conda-forge tornado 6.0.3 py36he774522_0
tqdm 4.32.1 py_0
vc 14.1 h0510ff6_4
vs2015_runtime 14.15.26706 h3a45250_4
werkzeug 0.15.5 py_0
wheel 0.33.4 py36_0
wincertstore 0.2 py36h7fe50ca_0
wrapt 1.11.2 py36he774522_0
xz 5.2.4 h2fa13f4_4
yaml 0.1.7 hc54c509_2
zlib 1.2.11 h62dcd97_3
zstd 1.3.7 h508b16e_0

anything else I could provide that might help?

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Out of Memory & Python Crash

Post by bryanlyon »

In the faceswap folder, either the crash log or faceswap.log if no crash log exists.

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory & Python Crash

Post by torzdf »

Ultimately if FS says you're out of VRAM, you're out of VRAM.

Without knowing the specifics of the model you're training, its settings, the batch size, it is impossible to give you any more than this.

See:
https://faceswap.dev/forum/app.php/faqpage#f3r10

My word is final

User avatar
reddc143c
Posts: 4
Joined: Thu Aug 29, 2019 4:27 pm

Re: Out of Memory & Python Crash

Post by reddc143c »

Oh, found it.

Code: Select all

08/29/2019 16:19:49 MainProcess     MainThread      logger          log_setup                 INFO     Log level set to: INFO
08/29/2019 16:19:51 MainProcess     MainThread      train           get_images                INFO     Model A Directory: C:\Users\[user]\Desktop\faceswap\faces\bjm
08/29/2019 16:19:51 MainProcess     MainThread      train           get_images                INFO     Model B Directory: C:\Users\[user]\Desktop\faceswap\faces\jrt
08/29/2019 16:19:51 MainProcess     MainThread      train           process                   INFO     Training data directory: C:\Users\[user]\Desktop\faceswap\models
08/29/2019 16:19:51 MainProcess     MainThread      train           monitor                   INFO     ===================================================
08/29/2019 16:19:51 MainProcess     MainThread      train           monitor                   INFO       Starting
08/29/2019 16:19:51 MainProcess     MainThread      train           monitor                   INFO       Press 'Terminate' to save and quit
08/29/2019 16:19:51 MainProcess     MainThread      train           monitor                   INFO     ===================================================
08/29/2019 16:19:52 MainProcess     training_0      train           training                  INFO     Loading data, this may take a while...
08/29/2019 16:19:52 MainProcess     training_0      plugin_loader   _import                   INFO     Loading Model from Original plugin...
08/29/2019 16:19:52 MainProcess     training_0      _base           load                      WARNING  No existing state file found. Generating.
08/29/2019 16:19:52 MainProcess     training_0      deprecation_wrapper __getattr__               WARNING  From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.\n
08/29/2019 16:19:52 MainProcess     training_0      deprecation_wrapper __getattr__               WARNING  From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.\n
08/29/2019 16:19:52 MainProcess     training_0      deprecation_wrapper __getattr__               WARNING  From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.\n
08/29/2019 16:19:52 MainProcess     training_0      deprecation_wrapper __getattr__               WARNING  From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.\n
08/29/2019 16:19:52 MainProcess     training_0      deprecation_wrapper __getattr__               WARNING  From C:\Users\[user]\MiniConda3\envs\faceswap\lib\site-packages\keras\backend\tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.\n

User avatar
reddc143c
Posts: 4
Joined: Thu Aug 29, 2019 4:27 pm

Re: Out of Memory & Python Crash

Post by reddc143c »

torzdf wrote: Thu Aug 29, 2019 9:46 pm

Ultimately if FS says you're out of VRAM, you're out of VRAM.

Without knowing the specifics of the model you're training, its settings, the batch size, it is impossible to give you any more than this.

See:
https://faceswap.dev/forum/app.php/faqpage#f3r10

I get that, just curious why. I am using default settings(original trainer, batch size is 32, tried 16 as well).. I tried running this on my local PC with a GTX 1070 and runs fine, but decided to try running it in a cloud instance with a higher powered (and more vram) GPU.

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Out of Memory & Python Crash

Post by bryanlyon »

This information is unfortunately insufficient. We'd really need a full log of the crash or operation to diagnose.

The only thing I can say is that it's probably your system itself, a 4 core (Probably only dual with hyperthreading) and 16gb of system ram (With 14gb free) is probably simply not enough to push a v100 in any way shape or form. In addition, V100 drivers in Windows are not really very robust.

For advice on CPUs please see our hardware guide at https://faceswap.dev/forum/viewtopic.php?f=16&t=10 . But I believe that your current setup is at fault but would really need a crash log to see why.

Locked