Page 1 of 1

Training freeze at random iteration - HRESULT failed with 0x887a0005

Posted: Sat Mar 04, 2023 7:59 pm
by lighting

Can't find similar topic by errorcode, so start a new one.
Model Original, all settings by default. After some iteration (it may be 36 or 506 but rarely greater than 1000) process freeze. It can be stopped by "Stop" button (with no response on stop signal) or stopped by itself after some timeout. In second case in console I see error message:

Code: Select all

03/04/2023 17:49:39 INFO     Loading data, this may take a while...
03/04/2023 17:49:39 INFO     Loading Model from Original plugin...
03/04/2023 17:49:39 INFO     Using configuration saved in state file
03/04/2023 17:49:40 INFO     Loaded model from disk: 'C:\Users\user\Documents\faceswap\model\original.h5'
03/04/2023 17:49:41 INFO     Loading Trainer from Original plugin...

03/04/2023 17:49:48 INFO     [Saved models] - Average loss since last save: face_a: 0.05167, face_b: 0.04069

03/04/2023 17:49:49 INFO     [Preview Updated]

03/04/2023 17:50:40 INFO     [Saved models] - Average loss since last save: face_a: 0.05577, face_b: 0.04558

03/04/2023 17:50:40 INFO     [Preview Updated]

2023-03-04 18:10:39.223119: F tensorflow/c/logging(dot)cc:43] HRESULT failed with 0x887a0005: readback_heap->Map(0, nullptr, &readback_heap_data)
Process exited.

I try to reinstall faceswap with no visible effect. Google by errorcode give few topics on stackoverflow with no useful results for me.
Graphic card AMD RX580 8Gb. Any idea why this happined? With stop in every few hundred of iteration it almost impossible reach some training result.


Re: Training freeze at random iteration - HRESULT failed with 0x887a0005

Posted: Mon Mar 06, 2023 10:01 am
by torzdf

Unfortunately this is a timeout within DirectML and comes directly from the Tensorflow-DirectML plugin. There are some mitigation steps on the Tensorflow-DirectML github:

https://github.com/microsoft/tensorflow ... imeouts.md


Re: Training freeze at random iteration - HRESULT failed with 0x887a0005

Posted: Tue Mar 07, 2023 9:24 pm
by lighting

Thank you for link. This how to not actually solve the situation, but help to make freeze less friquent by redusing batch size to 4.


Re: Training freeze at random iteration - HRESULT failed with 0x887a0005

Posted: Thu Mar 16, 2023 11:53 pm
by bryanlyon

One of the suggestions at the link that Torzdf posted is to disable the timeout. The timeout is to prevent freezing your GPU. The suggested setting it to 10 seems like a good idea as it should still prevent freezing for longer than that, but still allow kernels that take longer than 2 seconds to run.


Re: Training freeze at random iteration - HRESULT failed with 0x887a0005

Posted: Tue Mar 21, 2023 8:24 pm
by lighting

Yes, i read that and now my delay set to 60 seconds. I did this before create this thread. As i say early - in my case really helps changing batch size.