Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Training your model
Forum rules
Read the FAQs and search the forum before posting a new topic.

Please mark any answers that fixed your problems so others can find the solutions.
Locked
User avatar
congo
Posts: 16
Joined: Mon Dec 16, 2019 3:09 pm
Has thanked: 7 times
Been thanked: 1 time

Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Post by congo »

Hi everybody,

I already encountered this error (Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR) before, but it could be resolved by setting TF_FORCE_GPU_ALLOW_GROWTH to true

Everything worked fine, but now I wanted to raise the face dimensions using DFL-SAE from 128 to 256. I left the batch size at 16 and as one would expect, I got a gpu-out-of-memory error. I reduced the batch size to 8, still not enough GPU mem. When reducing to 4, I got the error above (Allow Groth ist set). I further reduced the batch size and I tried reducing the face dimensions to 192, to no avail.
DFL-SAE-128 works fine, I currently started training with Villain and it seems to work fine, too. Any idea why higher dimensions in SAE do not work? I can provide the error log file if necessary.

by torzdf » Sat Jul 04, 2020 8:39 am

Most likely memory issues. Probably on the edge of what your GPU can handle, hence the error.

If other models train without this error, then I would say that is definitely the cause.

You can try Memory Saving Gradients and/or Optimizer Savings to try to fit it in VRAM.

Go to full post

User avatar
torzdf
Posts: 992
Joined: Fri Jul 12, 2019 12:53 am
Answers: 126
Has thanked: 28 times
Been thanked: 190 times

Re: Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Post by torzdf »

Most likely memory issues. Probably on the edge of what your GPU can handle, hence the error.

If other models train without this error, then I would say that is definitely the cause.

You can try Memory Saving Gradients and/or Optimizer Savings to try to fit it in VRAM.

My word is final


User avatar
congo
Posts: 16
Joined: Mon Dec 16, 2019 3:09 pm
Has thanked: 7 times
Been thanked: 1 time

Re: Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Post by congo »

I had the opportunity to run it on a machine with more VRAM and it worked, so it seems you were right.


Locked