ERROR Caught exception in thread: '_training'

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Post Reply
User avatar
kocheko
Posts: 3
Joined: Sun Dec 10, 2023 12:16 am
Has thanked: 3 times

ERROR Caught exception in thread: '_training'

Post by kocheko »

i hope someone could help, otherwise this would be 3 days of training going to the trash. i saved the project and now what i do is: open the project, wait for it to load and click TRAIN, and then this happens :

Code: Select all

03/21/2024 15:39:24 INFO     Mixed precision compatibility check (mixed_float16): OK\nYour GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: NVIDIA GeForce RTX 3060, compute capability 8.6
03/21/2024 15:39:24 INFO     Enabling Mixed Precision Training.
03/21/2024 15:39:24 CRITICAL Error caught! Exiting...
03/21/2024 15:39:24 ERROR    Caught exception in thread: '_training'
03/21/2024 15:39:27 ERROR    Got Exception on main handler:
Traceback (most recent call last):
  File "C:\Users\kocheko\faceswap\lib\cli\launcher.py", line 225, in execute_script
    process.process()
  File "C:\Users\kocheko\faceswap\scripts\train.py", line 209, in process
    self._end_thread(thread, err)
  File "C:\Users\kocheko\faceswap\scripts\train.py", line 249, in _end_thread
    thread.join()
  File "C:\Users\kocheko\faceswap\lib\multithreading.py", line 224, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "C:\Users\kocheko\faceswap\lib\multithreading.py", line 100, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\kocheko\faceswap\scripts\train.py", line 274, in _training
    raise err
  File "C:\Users\kocheko\faceswap\scripts\train.py", line 259, in _training
    model = self._load_model()
  File "C:\Users\kocheko\faceswap\scripts\train.py", line 290, in _load_model
    model.build()
  File "C:\Users\kocheko\faceswap\plugins\train\model\_base\model.py", line 255, in build
    model = self.io.load()
  File "C:\Users\kocheko\faceswap\plugins\train\model\_base\io.py", line 147, in load
    model = kmodels.load_model(self.filename, compile=False)
  File "C:\Users\kocheko\MiniConda3\envs\faceswap\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\kocheko\MiniConda3\envs\faceswap\lib\site-packages\h5py\_hl\files.py", line 562, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "C:\Users\kocheko\MiniConda3\envs\faceswap\lib\site-packages\h5py\_hl\files.py", line 235, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 102, in h5py.h5f.open
OSError: Unable to synchronously open file (bad object header version number)
03/21/2024 15:39:27 CRITICAL An unexpected crash has occurred. Crash report written to 'C:\Users\kocheko\faceswap\crash_report.2024.03.21.153924976721.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.

IS THE PROJECT LOST? almost 700.000 iterations

by torzdf » Fri Mar 22, 2024 12:40 am

Not lost, but you will need to roll back.

This error means that your model file is corrupted. Most likely training was interrupted during a save.

You have 2 options, you can either restore from backup (you can use the model tool for this), or you can delete your model folder rename your latest snapshot folder to your model folder's name and continue from there.

I suggest you check the timestamps of the backups (in the model folder) and the last snapshot, to decide which is the best way to go.

Go to full post
Last edited by torzdf on Fri Mar 22, 2024 12:38 am, edited 1 time in total.
User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: ERROR Caught exception in thread: '_training'

Post by torzdf »

Not lost, but you will need to roll back.

This error means that your model file is corrupted. Most likely training was interrupted during a save.

You have 2 options, you can either restore from backup (you can use the model tool for this), or you can delete your model folder rename your latest snapshot folder to your model folder's name and continue from there.

I suggest you check the timestamps of the backups (in the model folder) and the last snapshot, to decide which is the best way to go.

My word is final

Post Reply