how does multi GPU training work

police.bike · Post by **police.bike** » Fri Jul 03, 2020 11:34 pm

I am using vast to render multiple GPU processing.

I am curious how it works.

Say if I choose a 2 GPU machine and set bs to 64.
Does it branch 32 on one and another 32 on the other ?

Or does bs indicate a 64 on each GPU. Curious to know how it works.

It was surprising that a 2X 11 G NVIDIA machines gives lesser iterations than my single GPU NVIDIA 6G machine.

Post by **torzdf** » Sat Jul 04, 2020 8:41 am

The batch is split between GPUs.

To get the benefit of multiple GPUs, you would want to up the batchsize (i.e. if you are training BS 64 on 1 GPU, you'd want to train BS 128 on 2 GPUs).

ericpan0513 · Post by **ericpan0513** » Wed Jul 29, 2020 5:23 am

So, if I run the training with batchsize=64 in one GPU, then if I have 4 same GPUs I should increase batchsize to 256,right?
But I found that the maximum of batchsize setting is only up to 256, means that even if I got like 5 or 10 GPUs on one single machine, I still can not improve the speed(?) more than it did on 4 GPUs?
Hope you can answer, thanks!

Post by **bryanlyon** » Wed Jul 29, 2020 10:58 pm

You can manually enter any number, the slider only stops there since it's a "normal" limit. If you need 512 you can enter 512, but remember that models stop learning details at very high batch sizes.

ericpan0513 · Post by **ericpan0513** » Thu Jul 30, 2020 6:44 am

OK I see, thank you !
I thought that multiple GPUs is like if I use batchsize=256 on 4 GPUs, then the detail might be the same as batchsize=64 on 1GPU, so this is wrong, right?
If it's wrong, why is this happening? Don't models split in different parts and all using a batchsize of 64?(4GPUs Bs=256)

Post by **bryanlyon** » Thu Jul 30, 2020 4:19 pm

Training uses two paths, one forward and one backward. Forward gets split to all GPUs, but the backward pass happens once with all the batches at once. As you increase the batch size, it's faster since it uses more images per backward pass, but it also becomes more noisy as the gradients interfere with each other. Splitting to multiple GPUs doesn't solve this interference issue.

Too small of batch sizes also have their problems. This is why we normally recommend batch sizes between 8 and 128.

Faceswap Forum

how does multi GPU training work

how does multi GPU training work

Re: how does multi GPU training work

Re: how does multi GPU training work

Re: how does multi GPU training work

Re: how does multi GPU training work

Re: how does multi GPU training work