Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
congo
Posts: 16
Joined: Mon Dec 16, 2019 3:09 pm
Has thanked: 7 times
Been thanked: 1 time

Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Post by congo »

Hi everybody,

I already encountered this error (Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR) before, but it could be resolved by setting TF_FORCE_GPU_ALLOW_GROWTH to true

Everything worked fine, but now I wanted to raise the face dimensions using DFL-SAE from 128 to 256. I left the batch size at 16 and as one would expect, I got a gpu-out-of-memory error. I reduced the batch size to 8, still not enough GPU mem. When reducing to 4, I got the error above (Allow Groth ist set). I further reduced the batch size and I tried reducing the face dimensions to 192, to no avail.
DFL-SAE-128 works fine, I currently started training with Villain and it seems to work fine, too. Any idea why higher dimensions in SAE do not work? I can provide the error log file if necessary.

by torzdf » Sat Jul 04, 2020 8:39 am

Most likely memory issues. Probably on the edge of what your GPU can handle, hence the error.

If other models train without this error, then I would say that is definitely the cause.

You can try Memory Saving Gradients and/or Optimizer Savings to try to fit it in VRAM.

Go to full post
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Post by torzdf »

Most likely memory issues. Probably on the edge of what your GPU can handle, hence the error.

If other models train without this error, then I would say that is definitely the cause.

You can try Memory Saving Gradients and/or Optimizer Savings to try to fit it in VRAM.

My word is final

User avatar
congo
Posts: 16
Joined: Mon Dec 16, 2019 3:09 pm
Has thanked: 7 times
Been thanked: 1 time

Re: Again: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Post by congo »

I had the opportunity to run it on a machine with more VRAM and it worked, so it seems you were right.

Locked