This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.
Please mark any answers that fixed your problems so others can find the solutions.
I can say that yes, FS can EASILY saturate 4gb/sec if you're using distributed. It has to sync across the entire gradient set every batch as well as the actual model. Basically assume that one entire card's GPU ram gets sent to the other card every single batch. At 27eg/sec as BS 16 you're syncing across the card ABOUT 1.5x per second, which makes perfect sense to me.
There are really only 2 possibilities. Drivers and hardware. Hardware does seem the most reasonable. Can you tell me EXACTLY what model of motherboard you have? If it's not a motherboard designed for multiple GPUs it's quite possible that the 2nd GPU is running off the southbridge which could SEVERELY impact operation. You MIGHT be able to use an NVLink bridge to improve the card to card communication but I see conflicting information on whether the 2060 super supports nvlink.
Drivers can be VERY finicky when using low end cards in multiGPU systems. Several users can tell you about how they've struggled to use multiple GPUs while one GPU would work fine. I would recommend trying the driver included in the direct download of Cuda as that tends to be the most stable to me, BUT others have reported the opposite and it requires removing any existing drivers which could cause other issues.
The fact it drops down to 8x8 tells me it is probably mostly hardware. Just thought it was weird that running two cards is almost exactly half as productive.
Screenshot from 2020-10-19 18-49-07.png (35.19 KiB) Viewed 2988 times
Yes, this motherboard only has PCI-E Gen 2. That is a HUGE bottleneck and is almost definitely the cause of your problems. I'd say that unless you can get NVLink working, you're only going to get reasonable speeds from 1x gpu. If you're willing to spend some money you might be able to get the NVLink to work which will move that GPU-GPU communication over to that adapter but may or may not work on your cards.
I think it's important to note that your CPU is more than powerful enough and you might even be able to run 2 separate trainings at once (one on each card).
But yeah, that motherboard is definitely the cause of your issues.
Still having the issue. Check the specs, and my mb has 2 PCI Express 2.0 x16 slots. If I train on either alone gpu it screams. If I use distributed the performance is awful. Training on 1 gpu is twice as fast as training on two. So...
a batch of 8 on Villain on a single gpu is giving me 27.6 EGs/sec
a batch of 16 on distributed ( i assume it trains 8 on each) gives me 25.1 EG/sec. Shouldn't it be roughly double the EGs of a single, minus some overhead?
It takes about 10 mins to start on distributed, 2 mins to start on a single gpu.
Damm even I am having a similar kind of issue, I have searched all over the internet and even have posted on number of threads on different forum, no solution seems to work. I am really frustrated, can anyone of you here help me resolve this issue, I am very much tired now.
Still having the issue. Check the specs, and my mb has 2 PCI Express 2.0 x16 slots. If I train on either alone gpu it screams. If I use distributed the performance is awful. Training on 1 gpu is twice as fast as training on two. So...
a batch of 8 on Villain on a single gpu is giving me 27.6 EGs/sec
a batch of 16 on distributed ( i assume it trains 8 on each) gives me 25.1 EG/sec. Shouldn't it be roughly double the EGs of a single, minus some overhead?
It takes about 10 mins to start on distributed, 2 mins to start on a single gpu.
Damm even I am having a similar kind of issue, I have searched all over the internet and even have posted on number of threads on different forum, no solution seems to work. I am really frustrated, can anyone of you here help me resolve this issue, I am very much tired now.
It's likely the same problem as the previous person in this thread. Multiple GPU requires a massive amount of communication between the GPUs and so has a heavy dependence on your hardware for speed. If you want us to try to diagnose it, start with your speeds and as much information as possible on your hardware configuration so we can see what might be the problem.
My 2 cents on using two GPUs is that you really, really need a high end motherboard. I upgraded to a Ryzen 7 and a x570 mobo and dual GPUs are still flaky. I'm using Linux and I get maybe about a 20% increase in EG/s using dual 2060 supers over just using one. SO... 2 GPUs with batch of 16 only runs 20% more EG/s than a single running a batch of 8. In my book that really isn't enuf to validate buying a second card.
My suggestion to anyone who doesn't have a VERY high end computer is spend your money on ONE card, especially if this is just a hobby. YMMV.