Page 1 of 1

Training slowdown ClipV

Posted: Sat Sep 16, 2023 12:45 pm
by Ryzen1988

So i was trying to do a good training run with ClipV (slight modified 448 preset).
Normally i calculate how many iterations can run in 24h by looking at the first 10 minutes x6x24.
Big disappointment when looking, so i started to investigate.

It seems that really in the first 30 minutes there happens an incredible slowdown in the training.
Now i know task manager is not an ideal tool for gpu usage but its fairly clear something causes slowdown.
This is the first time i notice this, but seems to be happening 100% of the times now.

For example in the first 5 minutes of training 863 iterations get completed.
When looking at minute 25-30 in that 5 minutes only 386 Iterations get completed.
Anyone else experiencing this issue with Clipv?


Re: Training slowdown ClipV

Posted: Sun Sep 17, 2023 6:47 pm
by torzdf

No, but if you are having strange issues, then I recommend downgrading Cuda/cuDNN:
viewtopic.php?p=9271&hilit=conda+remove ... lkit#p9271


Re: Training slowdown ClipV

Posted: Mon Sep 18, 2023 2:15 pm
by Ryzen1988

I feel silly that fixed it.
By brain just hang on the fact that it was only with clipv :?


Re: Training slowdown ClipV

Posted: Mon Sep 18, 2023 5:35 pm
by Ryzen1988

I was premature, with your suggested solution training speed decays more slowly but it does decay still.
Look at the Log, its every 1000 its saved, and it starts at 5 minute interval and slowly increases still

09/18/2023 18:23:29 INFO [Saved model] - Average loss since last save: face_a: 0.08979, face_b: 0.26748
09/18/2023 18:23:31 INFO [Preview Updated]
09/18/2023 18:27:57 INFO [Saved model] - Average loss since last save: face_a: 0.11090, face_b: 0.21795
09/18/2023 18:27:59 INFO [Preview Updated]
09/18/2023 18:32:35 INFO [Saved model] - Average loss since last save: face_a: 0.10716, face_b: 0.21045
09/18/2023 18:32:37 INFO [Preview Updated]
09/18/2023 18:38:00 INFO [Saved model] - Average loss since last save: face_a: 0.10275, face_b: 0.20592
09/18/2023 18:38:03 INFO [Preview Updated]
09/18/2023 18:43:45 INFO [Saved model] - Average loss since last save: face_a: 0.10358, face_b: 0.20465
09/18/2023 18:43:47 INFO [Preview Updated]
09/18/2023 18:51:02 INFO [Saved model] - Average loss since last save: face_a: 0.10256, face_b: 0.20235
09/18/2023 18:51:05 INFO [Preview Updated]
09/18/2023 19:00:06 INFO [Saved model] - Average loss since last save: face_a: 0.09863, face_b: 0.20354
09/18/2023 19:00:09 INFO [Preview Updated]
09/18/2023 19:11:28 INFO [Saved model] - Average loss since last save: face_a: 0.10003, face_b: 0.20090
09/18/2023 19:11:32 INFO [Preview Updated]
09/18/2023 19:24:04 INFO [Saved model] - Average loss since last save: face_a: 0.10050, face_b: 0.19862
09/18/2023 19:24:07 INFO [Preview Updated]
09/18/2023 19:37:21 INFO [Saved model] - Average loss since last save: face_a: 0.09910, face_b: 0.19630
09/18/2023 19:37:25 INFO [Preview Updated]


Re: Training slowdown ClipV

Posted: Wed Sep 20, 2023 9:59 am
by torzdf

Honestly, I don't know then. This is not behaviour I have witnessed


Re: Training slowdown ClipV

Posted: Wed Sep 20, 2023 10:06 am
by Ryzen1988

I have narrowed it down a bit, currently i'm training with Clipv_vit-l-14 and this seems to be without any issue.
So the problem only happens when choosing the clipv-farl-b-16-64 so far.
Loss functions don't influence the issue of slowdown.

I will keep trying to a/b test to see if there is anything that influences the slowdown.


Re: Training slowdown ClipV

Posted: Mon Sep 25, 2023 10:55 am
by Ryzen1988

Oke, i think i have narrowed it down to a specific setup.
It happens with Clip farl B and Clip L when making a model with Hybrid for upscale and upsample and LCNR int activated.
When replacing things for example with resize and upscale dyn and LCNR int off it has no slowdown issues, all other things remaining the same.
Have to narrow it down further because the native preset uses subpixel, so i am thinking it is probably the LCNR int in combination with clip that gives issues. (something i automatically turn on with subpixel or hybrid out of habit)