Tensorflow error happens randomly while training

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
semaj4712
Posts: 1
Joined: Tue Oct 17, 2023 9:53 pm

Tensorflow error happens randomly while training

Post by semaj4712 »

I am getting the following error

2023-10-15 12:28:59.400559: F tensorflow/core/common_runtime/device/device_event_mgr.cc:221] Unexpected Event status: 1

No idea how to fix it.

This happens randomly while training, sometimes an hour in, sometimes 20 min. However if I can get it to go for longer than 4/5 hours it seems to run fine.

My drivers are up to date, my cuda is up to date...

I am not the first person with this issue however I cannot find an actual solution
https://github.com/tensorflow/tensorflow/issues/46247
https://forum.faceswap.dev/viewtopic.php?t=2591

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Tensorflow error happens randomly while training

Post by bryanlyon »

As the previous message said, this is a driver issue. Nothing to do with Faceswap and nothing we can do to prevent or work around it.

As far as I've been able to find on it though, it's almost 100% an overclocking issue. Chances are the card you have is factory overclocked and it's hitting instability because of it. Clock it back to Nvidia's recommendations and I think you'll see this message disappear completely.

Locked