15 to 25 minutes ? I would say No its not normal. A few minutes ar most.
Distributed with Dual 2060 supers
Read the FAQs and search the forum before posting a new topic.
This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.
Please mark any answers that fixed your problems so others can find the solutions.
Re: Distributed with Dual 2060 supers
After further testing it looks like all my problems come from the DFL-SAE model. It will only train on GPU1. Training on GPU0 or Distributed fail.
Training Villain with a batch of 16 right now distributed.
Thanks for all the help.
Re: Distributed with Dual 2060 supers
Ok I really don't know whats going on.
I just upgraded, now have 2x 2070 and 2x 1070
The 2070s running distributed are no problem.
FYI I had a somewhat unrelated driver issue, that snowballed when I tried to fix it (purge and reinstall)
Long story short, fresh install of Ubuntu.
They train like a charm. Nvidia 450 drivers.
Those "failed to allocate X.XG " messages are it just trying to allocate chunks of memory.
Leave Allow growth on wont hurt anything. Its an issue with NVIDIA drivers (and maybe TF) and as Bryan and torzdf has said there is 'no rhyme or reason for the requirement', and after 8 months I 100% agree.
Ill fool around with the other models and see is I can recreate your crashes.
I dunno what I'm doing
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060
Re: Distributed with Dual 2060 supers
Due to my snapping off the sata connector to my hdd I'm on a fresh install of Ubuntu 20.04 with the 450 driver also.
My only problem is the DFL-SAE model. (Allow growth checked everywhere it is an option)
All other models seem to work fine.
Re: Distributed with Dual 2060 supers
revisiting this post I was wondering if you had it sorted?
I have noticed the set up time on distributed takes longer if you are using slower pcie slots.
for example, if one card is running 16x and the other is running 4x on the pcie slot.
for myself I had rearranged my hardware configuration, and one card was at pcie 8x (typical) and the other was at pcie 4x. I noticed a big slowdown.
Switched back to where the 2X 2070 were at 8x pcie and it's a very reasonable 2 min delay before training begins.
I dunno what I'm doing
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060
Re: Distributed with Dual 2060 supers
Still having the issue. Check the specs, and my mb has 2 PCI Express 2.0 x16 slots. If I train on either alone gpu it screams. If I use distributed the performance is awful. Training on 1 gpu is twice as fast as training on two. So...
a batch of 8 on Villain on a single gpu is giving me 27.6 EGs/sec
a batch of 16 on distributed ( i assume it trains 8 on each) gives me 25.1 EG/sec. Shouldn't it be roughly double the EGs of a single, minus some overhead?
It takes about 10 mins to start on distributed, 2 mins to start on a single gpu.
Re: Distributed with Dual 2060 supers
I would suspect, it should be higher.
I usually get about 160% Egs of a single on Linux.
Ok, now I'm really curious.
Look in GpuZ (Windows) , or NvTop (Linux) and see what speeds that are actually communicating at, over their respective Pcie slots during training.
May say pcie x16 gen 2, or pcie x4 gen 3, something like that. The "generation" it communicates with can go up and down. Think it's some power saving feature.
May be chasing a goose but I'm curious
I dunno what I'm doing
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060
Re: Distributed with Dual 2060 supers
Hmmm, you might be on to something. This is showing only 8x PCIe when they are both in use. Specs on my MB show 2x 16...
Now with just GPU1 doing the work....
Still 8x
Re: Distributed with Dual 2060 supers
8x doesn't surprise me, but only gen 2 does.
Mine all run at pcie 8x gen 3. ( Or if I just put a single card then the top one will run x16)
Need a extra fancy mb to support gen 3 at pcie x16 on all slots.
I dunno what I'm doing
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060
Re: Distributed with Dual 2060 supers
The MSI site says 2x16. Other sites show 1x16 and 1x8. Which would explain why the drop down to 1x8. Well, gonna take out one of the cards and see if the single runs at 16x.
-edit-
Yep, one card shows 16x.
Time to start saving up for a Ryzen.....
Re: Distributed with Dual 2060 supers
If its that, Then yea, get yourself a nice B550 or x570 chipset that screams on Pcie.
Im going to test this real quick, I can force mine to 4x
I dunno what I'm doing
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060
Re: Distributed with Dual 2060 supers
PCIe 2.0 8x should be somewhere near 4 Gb/sec. Faceswap uses that much?
Re: Distributed with Dual 2060 supers
Ok , this was for some reason a painful test. Doesn't help with the startup ideas so much but.
Question: Are you increasing your batch size when using distributed? Should allow roughly a 85% higher batch and EGs/sec goes up.. I'm grasping at straws with this one.
Test Results: Villain mode, 2X2070, Batch=26, 950 iter per test.
Code: Select all
PciE Lanes @ Gen Startup delay EGs/sec
______________________________________________________
8x8 @Gen 3 131 sec 60.4
4x4 @Gen 3 145 sec 51.8
4x4 @Gen 2 144 sec 42.0
4X4 @Gen 1 144 sec 29.7
So Gen speed or lanes didn't seem to impact distributed startup time, but sure does slow down training.
Training will be impacted by the slowest card.
Sure different MB chipsets/models/settings/voodoo/temperature throttling will change the above numbers.
I dunno what I'm doing
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060
Re: Distributed with Dual 2060 supers
Yes, I do lower to 80% of what one can handle. It isn't just a startup issue. Depending on the model it can take 5-10 mins to start. It just runs really slowly once it starts. In terms of EG/s I'm getting better performance from 1 gpu doing 1/2 as many at a time.
I'm watching nvtop with the single 2060 villian batch of 10 and I see rx/tx in the 200 MB/s range getting 32 EG/s.
Will test batch of 16 on both......
Re: Distributed with Dual 2060 supers
Did a couple thousand iterations to test.
A singe gpu batch of 10, dual with a batch of 16. I'm getting better EGs/sec from the single gpu. It took 4 mins to start on distributed. 1.5 mins to start on single gpu.
I watch nvtop and I never saw the rx/tx get pegged at the theoretical 4 Gb/sec limit of a PCIE 2.0 8x transfer speed. I did see a few instances where I approached 4 Bg/sec but it never stayed there for more than a blink of an eye.
I don't think I'm overloading the FSB or Northbridge. Any ideas?
Re: Distributed with Dual 2060 supers
With your setup, a 4 min startup vs my 2.3 min startup sounds reasonable. I have a x470 chipset.
Doesn't rx/tx non stop at max rate every time NVtop displays sample, sure it jumps around.
Although, most looked like this.
I feel the need to take care of what I'm willing to say.
If I personally had 2 computers, with absolutely identical software setups, I may be feeling training speed would be hindered by hardware on distributed.
When I've tried 2x 1070 distributed , with one on a Pcie x1 slot, it took FOREVER to start training, which then was very slow (5 EGs/sec). If I used the single card on a PCIE x1 slot... it wasn't terrible. In fact, faster than distributed.
A positive thing I can mention: If your training at lets say 30EGS/s Batch 10 vs 30EGs/s Batch 20 the higher batch model should learn better. Less iterations/time to get to the same quality.
You seem to be using FaceSwap fine, do you feel there is any other possible software considerations we could be missing before making a hardware conclusion?
Do you think your cards are throttling due to heat? Mine don't go over 71 C
I dunno what I'm doing
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060
Re: Distributed with Dual 2060 supers
I added the coolbits to my xorg.conf, so I could control the fans on the cards. I cranked them all to 100% and the cards ran at 55C. Same slow results on distributed. The nvidia control panel lists 93C as the slowdown temp. Batch of 12 gave me 23 EG/s.
I'm going to test small batches. If it is a hardware bottleneck I'll run a batch of 2 distributed and a batch of 2 single for a while.
If it is the hardware such a small batch should stay well underneath the limits of my machine.
Re: Distributed with Dual 2060 supers
2000 iterations of a batch of 2 each on single and distributed. I doubt it was ever enuf data to clog the pipes. My feeling is there is some hardware limitation, but i suspect there is something else going on too.
Re: Distributed with Dual 2060 supers
Another test with the same results. 2000 iterations on Original, single batch 150, distributed batch 300. Distributed is almost exactly half as efficient.
Watching the nvtop the stats never pegged and held there. avgeraged about 70% of what the gpus could handle. So it is unlikely things were overloaded.