Page 1 of 1

Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Posted: Fri Jul 03, 2020 9:32 am
by congo

Hi everybody,

I already encountered this error (Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR) before, but it could be resolved by setting TF_FORCE_GPU_ALLOW_GROWTH to true

Everything worked fine, but now I wanted to raise the face dimensions using DFL-SAE from 128 to 256. I left the batch size at 16 and as one would expect, I got a gpu-out-of-memory error. I reduced the batch size to 8, still not enough GPU mem. When reducing to 4, I got the error above (Allow Groth ist set). I further reduced the batch size and I tried reducing the face dimensions to 192, to no avail.
DFL-SAE-128 works fine, I currently started training with Villain and it seems to work fine, too. Any idea why higher dimensions in SAE do not work? I can provide the error log file if necessary.


Re: Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Posted: Sat Jul 04, 2020 8:39 am
by torzdf

Most likely memory issues. Probably on the edge of what your GPU can handle, hence the error.

If other models train without this error, then I would say that is definitely the cause.

You can try Memory Saving Gradients and/or Optimizer Savings to try to fit it in VRAM.


Re: Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Posted: Tue Jul 14, 2020 10:19 am
by congo

I had the opportunity to run it on a machine with more VRAM and it worked, so it seems you were right.