Crashes on training

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
hullo
Posts: 16
Joined: Wed Aug 23, 2023 8:30 am
Been thanked: 1 time

Crashes on training

Post by hullo »

I tried realface and dlight so far. Realface crashes within a minute. Dlight crashes in a few hours. I'm on a 2022 Mac Studio base model and the fans never kicked in so I don't think it's my machine being overwhelmed. Here's what terminal looked like. Seems there are some issues with my setup but I need someone to translate :D

Code: Select all

Last login: Wed Oct 11 09:33:41 on ttys000
/Users/joshua/faceswap/faceswap_gui_launcher.command ; exit;
joshua@Joshuas-Mac-Studio ~ % /Users/joshua/faceswap/faceswap_gui_launcher.command ; exit;
Setting Faceswap backend to APPLE_SILICON
Metal device set to: Apple M1 Max

systemMemory: 32.00 GB
maxCacheSize: 10.67 GB

2023-10-11 15:36:18.830134: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-10-11 15:36:18.830260: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
10/11/2023 15:36:18 INFO     Log level set to: INFO
2023-10-11 15:36:31.016 python[36065:5736012] +[CATransaction synchronize] called within transaction
2023-10-11 15:36:39.976 python[36065:5736012] +[CATransaction synchronize] called within transaction
2023-10-11 15:36:46.476 python[36065:5736012] +[CATransaction synchronize] called within transaction
WARNING:tensorflow:From /Users/joshua/faceswap/lib/gui/analysis/event_reader.py:532: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
2023-10-11 15:38:14.679 python[36065:5736012] +[CATransaction synchronize] called within transaction
2023-10-11 15:39:50.291 python[36065:5736012] +[CATransaction synchronize] called within transaction
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
Fatal Python error: PyEval_RestoreThread: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: initialized

Thread 0x000000017793f000 (most recent call first):
  File "/Users/joshua/faceswap/lib/gui/wrapper.py", line 389 in _read_stderr
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 953 in run
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000000176933000 (most recent call first):
  File "/Users/joshua/faceswap/lib/gui/custom_widgets.py", line 417 in __call__
  File "/Users/joshua/faceswap/lib/gui/custom_widgets.py", line 260 in write
  File "/Users/joshua/faceswap/lib/gui/wrapper.py", line 371 in _read_stdout
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 953 in run
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00000001dda16080 (most recent call first):
  File "/Users/joshua/faceswap/lib/training/preview_cv.py", line 43 in add_image
  File "/Users/joshua/faceswap/lib/gui/utils/image.py", line 116 in load
  File "/Users/joshua/faceswap/lib/gui/display_command.py", line 140 in display_item_set
  File "/Users/joshua/faceswap/lib/gui/display_page.py", line 266 in _update_page
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 839 in callit
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1921 in __call__
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1349 in update_idletasks
  File "/Users/joshua/faceswap/lib/training/preview_tk.py", line 382 in set_image
  File "/Users/joshua/faceswap/lib/training/preview_tk.py", line 818 in _update_image
  File "/Users/joshua/faceswap/lib/training/preview_tk.py", line 914 in _display_preview
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 839 in callit
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1921 in __call__
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1458 in mainloop
  File "/Users/joshua/faceswap/scripts/gui.py", line 183 in process
  File "/Users/joshua/faceswap/lib/cli/launcher.py", line 225 in execute_script
  File "/Users/joshua/faceswap/faceswap.py", line 52 in _main
  File "/Users/joshua/faceswap/faceswap.py", line 56 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_osx, psutil._psutil_posix, tensorflow.python.framework.fast_tensor_util, charset_normalizer.md, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, PIL._imaging, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, numexpr.interpreter, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, matplotlib._image, matplotlib.backends._tkagg, PIL._imagingtk (total: 117)
/Users/joshua/faceswap/faceswap_gui_launcher.command: line 4: 36065 Abort trap: 6           python "/Users/joshua/faceswap/faceswap.py" gui

Saving session.../Users/joshua/anaconda3/envs/faceswap/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

...copying shared history...
...saving history...truncating history files...
...completed.

[Process completed]

User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Crashes on training

Post by torzdf »

I have not seen this crash before, and I would imagine that it is being caused by Tensorflow-metal (from Apple) or another library that we use, purely based on this:
https://stackoverflow.com/questions/667 ... alled-with

My word is final

User avatar
hullo
Posts: 16
Joined: Wed Aug 23, 2023 8:30 am
Been thanked: 1 time

Re: Crashes on training

Post by hullo »

EDIT: nvm. I misread python "3.10" as 3.1

Gonna try installing latest tensorflow metal.

EDIT 2: I now realize tensorflow and tensorflow metal are 2 different things and I'm already on the latest or near latest tf metal.

Last edited by hullo on Sun Oct 15, 2023 5:31 am, edited 2 times in total.
User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Crashes on training

Post by torzdf »

The version of Tensorflow metal you use is important. It has to correspond with the version of Tensorflow used. Currently Faceswap is on Tensorflow 2.10. This requires Tensorflow-Metal 0.60.

The only other thing I can suggest is doing the macOS equivalent of this:
https://forum.faceswap.dev/app.php/faqpage#f1r1

My word is final

Locked