Session Stats no longer appearing after a few hours of training

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
WhichWayToTheExit
Posts: 5
Joined: Wed Aug 26, 2020 1:36 am

Session Stats no longer appearing after a few hours of training

Post by WhichWayToTheExit »

I am currently training my first model (your tutorials on Extraction and Training have been invaluable, thank you), and after getting a few hours into a training session, I'm finding that when I go to periodically check on progress, the data table on the "Analysis" tab no longer refreshes or loads. I'm about 5 hours and 100k iterations in.

I have no problem letting in train further anyway, but it's fun/informative to pull up the graph and check how much my Loss is still decreasing to give me the warm fuzzy that the model is still improving (which of course it is, I know 5 hours is not really that long to train).

Any thoughts on why it would stop refreshing? Stats down in the application status bar at the bottom still updates, and Previews still update perfectly. But the Analysis pane seems to just be an empty table with the headers. Oddly though sometimes I think I see a couple rows of data flash into the table and then disappear quickly before I can even make out any text.

And for the record, it WAS refreshing earlier in the training session.

Thank you for any suggestions. And I've got to say, your software is incredible, as is your support of your user community. Thank you again!

In case it helps, a dump of my "Output System Information":

Code: Select all

============ System Information ============
encoding:            cp1252
git_branch:          master
git_commits:         1363fa8 lib.image - More information on image read errors
gpu_cuda:            No global version found. Check Conda packages for Conda Cuda
gpu_cudnn:           No global version found. Check Conda packages for Conda cuDNN
gpu_devices:         GPU_0: GeForce GTX 1080 Ti
gpu_devices_active:  GPU_0
gpu_driver:          442.74
gpu_vram:            GPU_0: 11264MB
os_machine:          AMD64
os_platform:         Windows-10-10.0.18362-SP0
os_release:          10
py_command:          C:\faceswap/faceswap.py gui
py_conda_version:    conda 4.8.4
py_implementation:   CPython
py_version:          3.8.5
py_virtual_env:      True
sys_cores:           8
sys_processor:       Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
sys_ram:             Total: 32697MB, Available: 21880MB, Used: 10817MB, Free: 21880MB

=============== Pip Packages ===============


============== Conda Packages ==============
# packages in environment at C:\Users\xxxxx\MiniConda3\envs\faceswap-nvidia:
#
# Name                    Version                   Build  Channel
absl-py                   0.10.0                   pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
blas                      1.0                         mkl  
ca-certificates 2020.6.24 0
cachetools 4.1.1 pypi_0 pypi certifi 2020.6.20 py38_0
chardet 3.0.4 pypi_0 pypi cudatoolkit 10.1.243 h74a9793_0
cudnn 7.6.5 cuda10.1_0
cycler 0.10.0 py38_0
fastcluster 1.1.26 py38hbe40bda_1 conda-forge ffmpeg 4.3.1 ha925a31_0 conda-forge ffmpy 0.2.3 pypi_0 pypi freetype 2.10.2 hd328e21_0
gast 0.3.3 pypi_0 pypi git 2.23.0 h6bb4b03_0
google-auth 1.20.1 pypi_0 pypi google-auth-oauthlib 0.4.1 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.31.0 pypi_0 pypi h5py 2.10.0 pypi_0 pypi icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha925a31_3
idna 2.10 pypi_0 pypi imageio 2.9.0 py_0
imageio-ffmpeg 0.4.2 py_0 conda-forge intel-openmp 2020.1 216
joblib 0.16.0 py_0
jpeg 9b hb83a4c4_2
keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.2.0 py38h74a9793_0
libpng 1.6.37 h2a8f88b_0
libtiff 4.1.0 h56a325e_1
lz4-c 1.9.2 h62dcd97_1
markdown 3.2.2 pypi_0 pypi matplotlib 3.2.2 0
matplotlib-base 3.2.2 py38h64f37c6_0
mkl 2020.1 216
mkl-service 2.3.0 py38hb782905_0
mkl_fft 1.1.0 py38h45dec08_0
mkl_random 1.1.1 py38h47e9c7a_0
numpy 1.19.1 py38h5510c5b_0
numpy-base 1.19.1 py38ha3acd2a_0
nvidia-ml-py3 7.352.1 pypi_0 pypi oauthlib 3.1.0 pypi_0 pypi olefile 0.46 py_0
opencv-python 4.4.0.42 pypi_0 pypi openssl 1.1.1g he774522_1
opt-einsum 3.3.0 pypi_0 pypi pathlib 1.0.1 py_1
pillow 7.2.0 py38hcc1f983_0
pip 20.2.2 py38_0
protobuf 3.13.0 pypi_0 pypi psutil 5.7.0 py38he774522_0
pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 2.4.7 py_0
pyqt 5.9.2 py38ha925a31_4
python 3.8.5 he1778fa_0
python-dateutil 2.8.1 py_0
python_abi 3.8 1_cp38 conda-forge pywin32 227 py38he774522_1
qt 5.9.7 vc14h73c81de_0
requests 2.24.0 pypi_0 pypi requests-oauthlib 1.3.0 pypi_0 pypi rsa 4.6 pypi_0 pypi scikit-learn 0.23.1 py38h25d0782_0
scipy 1.4.1 pypi_0 pypi setuptools 49.6.0 py38_0
sip 4.19.13 py38ha925a31_0
six 1.15.0 py_0
sqlite 3.33.0 h2a8f88b_0
tensorboard 2.2.2 pypi_0 pypi tensorboard-plugin-wit 1.7.0 pypi_0 pypi tensorflow-gpu 2.2.0 pypi_0 pypi tensorflow-gpu-estimator 2.2.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 2.1.0 pyh5ca1d4c_0
tk 8.6.10 he774522_0
tornado 6.0.4 py38he774522_1
tqdm 4.48.2 py_0
urllib3 1.25.10 pypi_0 pypi vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_3
werkzeug 1.0.1 pypi_0 pypi wheel 0.34.2 py38_0
wincertstore 0.2 py38_0
wrapt 1.12.1 pypi_0 pypi xz 5.2.5 h62dcd97_0
zlib 1.2.11 h62dcd97_4
zstd 1.4.5 h04227a9_0 ================= Configs ================== --------- .faceswap --------- backend: nvidia --------- convert.ini --------- [color.color_transfer] clip: True preserve_paper: True [color.manual_balance] colorspace: HSV balance_1: 0.0 balance_2: 0.0 balance_3: 0.0 contrast: 0.0 brightness: 0.0 [color.match_hist] threshold: 99.0 [mask.box_blend] type: gaussian distance: 11.0 radius: 5.0 passes: 1 [mask.mask_blend] type: normalized kernel_size: 3 passes: 4 threshold: 4 erosion: 0.0 [scaling.sharpen] method: unsharp_mask amount: 150 radius: 0.3 threshold: 5.0 [writer.ffmpeg] container: mp4 codec: libx264 crf: 23 preset: medium tune: none profile: auto level: auto skip_mux: False [writer.gif] fps: 25 loop: 0 palettesize: 256 subrectangles: False [writer.opencv] format: png draw_transparent: False jpg_quality: 75 png_compress_level: 3 [writer.pillow] format: png draw_transparent: False optimize: False gif_interlace: True jpg_quality: 75 png_compress_level: 3 tif_compression: tiff_deflate --------- extract.ini --------- [global] allow_growth: False [align.fan] batch-size: 12 [detect.cv2_dnn] confidence: 50 [detect.mtcnn] minsize: 20 threshold_1: 0.6 threshold_2: 0.7 threshold_3: 0.7 scalefactor: 0.709 batch-size: 8 [detect.s3fd] confidence: 70 batch-size: 4 [mask.unet_dfl] batch-size: 8 [mask.vgg_clear] batch-size: 6 [mask.vgg_obstructed] batch-size: 2 --------- gui.ini --------- [global] fullscreen: False tab: extract options_panel_width: 30 console_panel_height: 20 icon_size: 14 font: default font_size: 9 autosave_last_session: prompt timeout: 120 auto_load_model_stats: True --------- train.ini --------- [global] coverage: 68.75 icnr_init: True conv_aware_init: True optimizer: adam learning_rate: 5e-05 reflect_padding: False allow_growth: False mixed_precision: False convert_batchsize: 16 [global.loss] loss_function: ssim mask_loss_function: mse l2_reg_term: 100 penalized_mask_loss: True mask_type: extended mask_blur_kernel: 3 mask_threshold: 4 learn_mask: False [model.dfl_h128] lowmem: False [model.dfl_sae] input_size: 128 clipnorm: True architecture: df autoencoder_dims: 0 encoder_dims: 42 decoder_dims: 21 multiscale_decoder: False [model.dlight] features: best details: good output_size: 256 [model.original] lowmem: False [model.realface] input_size: 64 output_size: 128 dense_nodes: 1536 complexity_encoder: 128 complexity_decoder: 512 [model.unbalanced] input_size: 128 lowmem: False clipnorm: True nodes: 1024 complexity_encoder: 128 complexity_decoder_a: 384 complexity_decoder_b: 512 [model.villain] lowmem: False [trainer.original] preview_images: 14 zoom_amount: 5 rotation_range: 10 shift_range: 5 flip_chance: 50 color_lightness: 30 color_ab: 8 color_clahe_chance: 50 color_clahe_max_size: 4

User avatar
torzdf
Posts: 1495
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 51 times
Been thanked: 287 times

Re: Session Stats no longer appearing after a few hours of training

Post by torzdf »

I haven't seen this myself, and it's hard to trouble shoot, as it needs to get quite far into the training session before this issue may happen.... However, it may be that you just need to leave the tab active for a bit.

When developing Faceswap 2.0, I made some changes to how stats were collated. Previously, at every save iteration, all stats were being collated and the graph and analysis tab were getting updated. This, unfortunately, had the effect of making the GUI unresponsive over time as the amount of data to be collated increases significantly over time.

Now stats are only collated if the tab which holds the stats is active. This, unfortunately, means there may be a significant delay between clicking on a tab and receiving the stats, but it also means the GUI shouldn't become unresponsive.

My word is final


User avatar
WhichWayToTheExit
Posts: 5
Joined: Wed Aug 26, 2020 1:36 am

Re: Session Stats no longer appearing after a few hours of training

Post by WhichWayToTheExit »

Thanks very much for your response.

Based on the additional background you explained, I think you're probably right that it might just a heavy-workload sort of situation.

Because as I mentioned, it would display early in the training session, but when I came back after 5+hrs/100k iterations, it wouldn't display when I activated the tab. But if it now only collates those stats when I activate the tab, and it suddenly has 5+ hours worth of data to aggregate, that must probably take ages, all while it's trying to process the training as well. And that's an acceptable trade-off to be honest, because the rest of GUI IS entirely responsive, which is far more preferable.

Okay, well thanks again for the input, I'm not really worried about it. Everything else is working great.

Thanks again for the response.

User avatar
dheinz70
Posts: 42
Joined: Sat Aug 15, 2020 2:43 am
Has thanked: 4 times

Log and graph weirdness

Post by dheinz70 »

The Analysis tab shows more iterations than the status bar.

Also, the graph crashes or doesn't respond if you change smoothing and his the refresh button.

Screenshot from 2020-10-11 20-20-01.png
Screenshot from 2020-10-11 20-20-01.png (18.35 KiB) Viewed 7376 times

User avatar
torzdf
Posts: 1495
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 51 times
Been thanked: 287 times

Re: Log and graph weirdness

Post by torzdf »

I'll take a look. Thanks. There are some bugs in the latest stats code.

My word is final


User avatar
RisingZen
Posts: 2
Joined: Wed Oct 14, 2020 6:01 pm

Re: Log and graph weirdness

Post by RisingZen »

This is usually (also) the case when the UI freezes and I let the model train some more time, before I shut the program down.

The model wouldn't be updated, but the state file would. So from then on the additional time and iterations, lost in the model, will be carried over to following sessions.
Last edited by RisingZen on Wed Oct 14, 2020 7:11 pm, edited 1 time in total.

User avatar
cosmico
Posts: 95
Joined: Sat Jan 18, 2020 6:32 pm
Has thanked: 13 times
Been thanked: 31 times

Re: Log and graph weirdness

Post by cosmico »

I've been having similar issues but I can expand on the original post.
Lets say I trained 50,000 iterations at 100eg/s then stop. Then I decide to train another 50,000 at 100eg/s again and then stop again. At the end of the second stop that second training session will be completely gone on the analysis tab, but the faces will still be better trained than previously were from the first training session. So its recognizing the training even if it doesn't keep a record on it. THEN when I train a 3rd time, with the exact same settings, it will suddenly jump to 200eg/s. And it will slowly get lower and lower until I hit 200k overall in which in returns to normal. Its almost like it didnt keep record, but acknoledges the training happened, so now that it is keeping record, its explaining this mysterious extra training as "you just had double the eg/s for this current session" until you hit it can properly explain all that mysterious training.

User avatar
impost3r
Posts: 7
Joined: Sat Oct 31, 2020 2:53 pm
Has thanked: 2 times

Re: Log and graph weirdness

Post by impost3r »

I have the same issue as the OP.

It appears to me that viewing the Analysis page is what breaks it. Once viewed, it stops updating for the current session.


User avatar
torzdf
Posts: 1495
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 51 times
Been thanked: 287 times

Re: Log and graph weirdness

Post by torzdf »

There are bugs in the stats at the moment. I will go in and fix at some point, but as they are not mission critical I haven't yet plucked up the enthusiasm to go in and poke around.

My word is final


User avatar
dheinz70
Posts: 42
Joined: Sat Aug 15, 2020 2:43 am
Has thanked: 4 times

Re: Log and graph weirdness

Post by dheinz70 »

The two bugs I've seen:

Changing the smoothing from 0.9 causes the stats to crash

It shows more iterations than the session has done. Hope that helps.


User avatar
wentdot
Posts: 8
Joined: Thu Nov 12, 2020 2:30 pm
Has thanked: 3 times

Re: Log and graph weirdness

Post by wentdot »

impost3r wrote: Sat Oct 31, 2020 3:00 pm

I have the same issue as the OP.

It appears to me that viewing the Analysis page is what breaks it. Once viewed, it stops updating for the current session.

This is my experience as well.


User avatar
popomist
Posts: 1
Joined: Sat Jan 16, 2021 7:49 am

Getting an error each preview update

Post by popomist »

Hello I am getting the following errors every time the preview is updated, is this normal behavior? Anyone have any clue what could be the cause? Thank you.

Code: Select all

Exception in Tkinter callback
TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\pokem\faceswap\lib\gui\stats.py", line 376, in _cache_data
    loss = np.array(loss, dtype="float32")
ValueError: setting an array element with a sequence.

During handling of the above exception, another exception occurred:

TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\pokem\MiniConda3\envs\faceswap\lib\tkinter\__init__.py", line 1883, in __call__
    return self.func(*args)
  File "C:\Users\pokem\faceswap\lib\gui\display_graph.py", line 300, in refresh
    self.calcs = self.thread.get_result()  # Terminate the LongRunningTask object
  File "C:\Users\pokem\faceswap\lib\gui\utils.py", line 1151, in get_result
    raise self.err[1].with_traceback(self.err[2])
  File "C:\Users\pokem\faceswap\lib\gui\utils.py", line 1122, in run
    retval = self._target(*self._args, **self._kwargs)
  File "C:\Users\pokem\faceswap\lib\gui\stats.py", line 862, in refresh
    self._get_raw()
  File "C:\Users\pokem\faceswap\lib\gui\stats.py", line 925, in _get_raw
    loss_dict = Session.get_loss(self._session_id)
  File "C:\Users\pokem\faceswap\lib\gui\stats.py", line 168, in get_loss
    loss_dict = self._tb_logs.get_loss(session_id=session_id, is_training=self._is_training)
  File "C:\Users\pokem\faceswap\lib\gui\stats.py", line 514, in get_loss
    for sess, info in self._from_cache(session_id, is_training).items():
  File "C:\Users\pokem\faceswap\lib\gui\stats.py", line 484, in _from_cache
    self._cache_data(session_id, is_training=is_training)
  File "C:\Users\pokem\faceswap\lib\gui\stats.py", line 384, in _cache_data
    loss = np.array(loss[:-1], dtype="float32")
ValueError: setting an array element with a sequence.

User avatar
torzdf
Posts: 1495
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 51 times
Been thanked: 287 times

Re: Getting an error each preview update

Post by torzdf »

It's a GUI error (which probably means the graph won't show).

Sometimes this goes away if you close Faceswap, re-open it and restart training.

It's not a priority as it doesn't directly affect the training of the model, but I have tagged this with bug to remind me to look at this at some point.

My word is final


User avatar
14XXX88
Posts: 4
Joined: Thu Mar 25, 2021 12:42 pm

program crashes with this error message in the console

Post by 14XXX88 »

this error happened while training a Dfaker model

Code: Select all

Exception in Tkinter callback
TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\ALECT\faceswap\lib\gui\stats.py", line 378, in _cache_data
    loss = np.array(loss, dtype="float32")
ValueError: setting an array element with a sequence.

During handling of the above exception, another exception occurred:

TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\ALECT\MiniConda3\envs\faceswap\lib\tkinter\__init__.py", line 1892, in __call__
    return self.func(*args)
  File "C:\Users\ALECT\MiniConda3\envs\faceswap\lib\tkinter\__init__.py", line 814, in callit
    func(*args)
  File "C:\Users\ALECT\faceswap\lib\gui\display_graph.py", line 300, in refresh
    self.calcs = self.thread.get_result()  # Terminate the LongRunningTask object
  File "C:\Users\ALECT\faceswap\lib\gui\utils.py", line 1595, in get_result
    raise self.err[1].with_traceback(self.err[2])
  File "C:\Users\ALECT\faceswap\lib\gui\utils.py", line 1566, in run
    retval = self._target(*self._args, **self._kwargs)
  File "C:\Users\ALECT\faceswap\lib\gui\stats.py", line 864, in refresh
    self._get_raw()
  File "C:\Users\ALECT\faceswap\lib\gui\stats.py", line 927, in _get_raw
    loss_dict = Session.get_loss(self._session_id)
  File "C:\Users\ALECT\faceswap\lib\gui\stats.py", line 168, in get_loss
    loss_dict = self._tb_logs.get_loss(session_id=session_id, is_training=self._is_training)
  File "C:\Users\ALECT\faceswap\lib\gui\stats.py", line 516, in get_loss
    for sess, info in self._from_cache(session_id, is_training).items():
  File "C:\Users\ALECT\faceswap\lib\gui\stats.py", line 486, in _from_cache
    self._cache_data(session_id, is_training=is_training)
  File "C:\Users\ALECT\faceswap\lib\gui\stats.py", line 386, in _cache_data
    loss = np.array(loss[:-1], dtype="float32")
ValueError: setting an array element with a sequence.

User avatar
torzdf
Posts: 1495
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 51 times
Been thanked: 287 times

Re: program crashes with this error message in the console

Post by torzdf »

Yeah. It's a GUI bug. Doesn't impact training, just stops the graph updating for that session.

I'll fix it one day.

My word is final


User avatar
torzdf
Posts: 1495
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 51 times
Been thanked: 287 times

Re: Log and graph weirdness

Post by torzdf »

Ok, the underlying bug that effectively breaks graphing/analysis during training should now be fixed.

This isn't to say there are not other bugs (either new or old) with the graphing, but that specific error, which stops the stats updating, should be fixed.

I'm closing off this thread. For new graphing/stats bugs please start a new thread.

My word is final


Locked