We are noticing what we are perceiving as some kind of bottleneck between training iterations.
We've got a brand new 3090 FE installed in our test machine. When running StoJo model, for example - we are noticing peaks and settling in GPU usage between iterations.
For example, as the iteration is run, GPU peaks to 90+% and then drops back before peaking up again at the next iteration. Of course, there will be a slight delay between iterations but this seems excessive.
Where is the bottleneck? Our CPU is very clear during training - it's not taxed at all.
Is there any way to achieve faster loading of each iteration - the delay between iterations? Where is the bottleneck? Essentially, the GPU could be pushed a lot more but we can't find out where this delay is coming from. All of the training images are loaded into system file cache (RAM) so it's not even reading from disk.
Any advice or pointers are greatly appreciated.