Training slowdown ClipV

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Training slowdown ClipV

Post by Ryzen1988 »

So i was trying to do a good training run with ClipV (slight modified 448 preset).
Normally i calculate how many iterations can run in 24h by looking at the first 10 minutes x6x24.
Big disappointment when looking, so i started to investigate.

It seems that really in the first 30 minutes there happens an incredible slowdown in the training.
Now i know task manager is not an ideal tool for gpu usage but its fairly clear something causes slowdown.
This is the first time i notice this, but seems to be happening 100% of the times now.

For example in the first 5 minutes of training 863 iterations get completed.
When looking at minute 25-30 in that 5 minutes only 386 Iterations get completed.
Anyone else experiencing this issue with Clipv?

Attachments
gpu usage.JPG
gpu usage.JPG (19.58 KiB) Viewed 7197 times
gpu usage2.JPG
gpu usage2.JPG (19.2 KiB) Viewed 7197 times
gpu usage3.JPG
gpu usage3.JPG (19.89 KiB) Viewed 7197 times
Last edited by Ryzen1988 on Sat Sep 16, 2023 12:47 pm, edited 1 time in total.
User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Training slowdown ClipV

Post by torzdf »

No, but if you are having strange issues, then I recommend downgrading Cuda/cuDNN:
viewtopic.php?p=9271&hilit=conda+remove ... lkit#p9271

My word is final

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Re: Training slowdown ClipV

Post by Ryzen1988 »

I feel silly that fixed it.
By brain just hang on the fact that it was only with clipv :?

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Re: Training slowdown ClipV

Post by Ryzen1988 »

I was premature, with your suggested solution training speed decays more slowly but it does decay still.
Look at the Log, its every 1000 its saved, and it starts at 5 minute interval and slowly increases still

09/18/2023 18:23:29 INFO [Saved model] - Average loss since last save: face_a: 0.08979, face_b: 0.26748
09/18/2023 18:23:31 INFO [Preview Updated]
09/18/2023 18:27:57 INFO [Saved model] - Average loss since last save: face_a: 0.11090, face_b: 0.21795
09/18/2023 18:27:59 INFO [Preview Updated]
09/18/2023 18:32:35 INFO [Saved model] - Average loss since last save: face_a: 0.10716, face_b: 0.21045
09/18/2023 18:32:37 INFO [Preview Updated]
09/18/2023 18:38:00 INFO [Saved model] - Average loss since last save: face_a: 0.10275, face_b: 0.20592
09/18/2023 18:38:03 INFO [Preview Updated]
09/18/2023 18:43:45 INFO [Saved model] - Average loss since last save: face_a: 0.10358, face_b: 0.20465
09/18/2023 18:43:47 INFO [Preview Updated]
09/18/2023 18:51:02 INFO [Saved model] - Average loss since last save: face_a: 0.10256, face_b: 0.20235
09/18/2023 18:51:05 INFO [Preview Updated]
09/18/2023 19:00:06 INFO [Saved model] - Average loss since last save: face_a: 0.09863, face_b: 0.20354
09/18/2023 19:00:09 INFO [Preview Updated]
09/18/2023 19:11:28 INFO [Saved model] - Average loss since last save: face_a: 0.10003, face_b: 0.20090
09/18/2023 19:11:32 INFO [Preview Updated]
09/18/2023 19:24:04 INFO [Saved model] - Average loss since last save: face_a: 0.10050, face_b: 0.19862
09/18/2023 19:24:07 INFO [Preview Updated]
09/18/2023 19:37:21 INFO [Saved model] - Average loss since last save: face_a: 0.09910, face_b: 0.19630
09/18/2023 19:37:25 INFO [Preview Updated]

Last edited by Ryzen1988 on Mon Sep 18, 2023 5:39 pm, edited 1 time in total.
User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Training slowdown ClipV

Post by torzdf »

Honestly, I don't know then. This is not behaviour I have witnessed

My word is final

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Re: Training slowdown ClipV

Post by Ryzen1988 »

I have narrowed it down a bit, currently i'm training with Clipv_vit-l-14 and this seems to be without any issue.
So the problem only happens when choosing the clipv-farl-b-16-64 so far.
Loss functions don't influence the issue of slowdown.

I will keep trying to a/b test to see if there is anything that influences the slowdown.

Last edited by Ryzen1988 on Wed Sep 20, 2023 10:08 am, edited 1 time in total.
User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Re: Training slowdown ClipV

Post by Ryzen1988 »

Oke, i think i have narrowed it down to a specific setup.
It happens with Clip farl B and Clip L when making a model with Hybrid for upscale and upsample and LCNR int activated.
When replacing things for example with resize and upscale dyn and LCNR int off it has no slowdown issues, all other things remaining the same.
Have to narrow it down further because the native preset uses subpixel, so i am thinking it is probably the LCNR int in combination with clip that gives issues. (something i automatically turn on with subpixel or hybrid out of habit)

Locked