Locating Bottleneck between Training Iterations

ianstephens · Post by **ianstephens** » Tue Jul 06, 2021 10:29 am

We are noticing what we are perceiving as some kind of bottleneck between training iterations.

We've got a brand new 3090 FE installed in our test machine. When running StoJo model, for example - we are noticing peaks and settling in GPU usage between iterations.

For example, as the iteration is run, GPU peaks to ₉₀+% and then drops back before peaking up again at the next iteration. Of course, there will be a slight delay between iterations but this seems excessive.

: iterations.png (49.14 KiB) Viewed 9272 times

Where is the bottleneck? Our CPU is very clear during training - it's not taxed at all.

Is there any way to achieve faster loading of each iteration - the delay between iterations? Where is the bottleneck? Essentially, the GPU could be pushed a lot more but we can't find out where this delay is coming from. All of the training images are loaded into system file cache (RAM) so it's not even reading from disk.

Any advice or pointers are greatly appreciated.

ianstephens · Post by **ianstephens** » Tue Jul 06, 2021 11:20 am

Update/Test:

Lowering batch size from 21 to 6 brings much faster cycling through iterations with less delay (hardly at all) between them.

It still doesn't max out the GPU (also confirmed using nvidia-smi) but it cycles through faster.

: batch.png (38.12 KiB) Viewed 9268 times

So higher batch sizes creates a longer delay between running each iteration. But why? Where is the hold-up?

Perhaps the delay is because of the time it takes to send the entire batch of images required for each iteration across the system into the GPU for processing? Maybe our hardware is holding us back? Perhaps CPU clock speed? We have an abundance of power (2x 12 core Xeon) but perhaps it's the clock speed (only 2.7Ghz).

Post by **torzdf** » Thu Jul 08, 2021 10:49 am

I will need to look into + test this. My initial assumption would be memory copies though

ianstephens · Post by **ianstephens** » Thu Jul 08, 2021 1:22 pm

Agree - getting the batch of images to the GPU seems to be the slow part perhaps? Perhaps there are different/faster methods that could be coded. I'm no expert.

Thank you for taking a look into this. It would be super if we could make full use of the GPU and really heat things up with the crunching.

Here is some additional graphing from our other monitoring software. You can see the drops/peaks between the iterations.

: graphing-new.jpg (132.33 KiB) Viewed 9219 times

ianstephens · Post by **ianstephens** » Sat Jul 10, 2021 10:01 pm

We are noticing the GPU usage drops between iterations in Linux too.

: iterations-linux.jpg (158.45 KiB) Viewed 8955 times

This is of course an understandable condition and expected behavior (loading each batch into the GPU memory between iterations). It's the loading of the batch images that seems to cause this behavior.

But I was thinking perhaps there is a better way... Perhaps running several GPU threads (for example 2 which would halve batch size) but increase GPU usage between 2x iterations.

My thoughts simply are the GPU could be utilized more. Especially with the new-gen 30XX series. We want to keep the GPUs at 100% processing capacity.

Just a thought - you are the experts and I'm sure you'll think of a solution - We're going to send our third donation via PayPal tomorrow to the project.

Many thanks,

Ian

Faceswap Forum

Locating Bottleneck between Training Iterations

Locating Bottleneck between Training Iterations

Re: Locating Bottleneck between Training Iterations

Re: Locating Bottleneck between Training Iterations

Re: Locating Bottleneck between Training Iterations

Re: Locating Bottleneck between Training Iterations