Random TensorFlow error crashing my training

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
cnwamw
Posts: 1
Joined: Sun Mar 26, 2023 12:50 am

Random TensorFlow error crashing my training

Post by cnwamw »

Ive only gotten this error once, but that was after a long time of training,

tensorflow/core/common_runtime/device/device_event_mgr.cc:221] Unexpected Event status: 1
is this just a random error that can occur or is there a way to prevent this? (I rather not come back after a day of not being at my pc and see it crashed a hour after I left)

no other error messages was in the console

User avatar
torzdf
Posts: 2636
Joined: Fri Jul 12, 2019 12:53 am
Answers: 156
Has thanked: 128 times
Been thanked: 614 times

Re: Random TensorFlow error crashing my training

Post by torzdf »

Doing a bit of googling around, this issue appears to be driver related. As it has only happened once for you, I would not worry too much. If it happens more consistently, look to update your GPU drivers.

My word is final

Locked