Page 1 of 1

OOM out of memory during convert but not training

Posted: Tue Apr 14, 2020 10:27 pm
by superjj
I've trained my model to about 91k iterations with no crashes. But when I try to convert, faceswap crashes with an OOM error in the logs. Conversion seems to work when I select a small range of frames to convert, maybe 15 frames at a time. But that's barely half a second of video.

Has anyone trained fine, but end up with OOM crashes during conversion?

I'm using a GTX 1650 Super 4gb, and training on the Dlight model with resource-saving options turned on.

Thanks!

Re: OOM out of memory during convert but not training

Posted: Wed Apr 15, 2020 9:28 am
by torzdf
I have never seen this before, sadly.

Re: OOM out of memory during convert but not training

Posted: Wed Apr 15, 2020 7:59 pm
by bryanlyon
Were you having to use Allow Growth during training? In which case, you might be running into a weird issue we've noticed on some people's setups.

Re: OOM out of memory during convert but not training

Posted: Wed Apr 15, 2020 9:08 pm
by PLAY-911
superjj wrote:
Tue Apr 14, 2020 10:27 pm
I've trained my model to about 91k iterations with no crashes. But when I try to convert, faceswap crashes with an OOM error in the logs. Conversion seems to work when I select a small range of frames to convert, maybe 15 frames at a time. But that's barely half a second of video.

Has anyone trained fine, but end up with OOM crashes during conversion?

I'm using a GTX 1650 Super 4gb, and training on the Dlight model with resource-saving options turned on.

Thanks!
Are you in Windows? I had problems with virtual memory assigned by windows

Re: OOM out of memory during convert but not training

Posted: Thu Apr 16, 2020 12:14 am
by superjj
bryanlyon wrote:
Wed Apr 15, 2020 7:59 pm
Were you having to use Allow Growth during training? In which case, you might be running into a weird issue we've noticed on some people's setups.
Yes I had Allow Growth turned on during training.

Re: OOM out of memory during convert but not training

Posted: Thu Apr 16, 2020 9:52 am
by torzdf
Make sure you select "Allow Growth" for convert too

Re: OOM out of memory during convert but not training

Posted: Sun May 17, 2020 2:07 am
by mgolvach
Just in case it helps, I had a similar situation. Training with DFL-SAE at 128px (max I could do) was working fine, but conversion gave me the error:

Resource exhausted: OOM when allocating tensor with shape[16,130,130,126] and type float...

I had turned on "allow growth" for conversion, but found I did not have "allow growth" checked for training. Though it seemed counterintuitive, I turned off (unchecked) "allow growth" for conversion, and that solved the problem.

I think, essentially, with regard to the "allow growth" option, you need to be consistent with training and conversion. If you train with it on (or off), you must do the same for conversion.

This may not be the case for everyone. I'm certain more GPU power would probably solve the problem as well ;)

Thanks for this board's wealth of information and help!

Mike

Re: OOM out of memory during convert but not training

Posted: Mon May 18, 2020 6:36 pm
by bryanlyon
Allow_growth does not affect your model in anyway, it only changes how Tensorflow allocates memory. You are likely running into a different issue. But we recommend leaving allow_growth off unless it's absolutely necessary to getting Faceswap running on your system.