how does multi GPU training work

Training your model
Forum rules
Read the FAQs and search the forum before posting a new topic.

Please mark any answers that fixed your problems so others can find the solutions.
Locked
User avatar
police.bike
Posts: 19
Joined: Tue Jun 30, 2020 3:37 pm
Has thanked: 7 times
Been thanked: 3 times

how does multi GPU training work

Post by police.bike »

I am using vast to render multiple GPU processing.

I am curious how it works.

Say if I choose a 2 GPU machine and set bs to 64.
Does it branch 32 on one and another 32 on the other ?

Or does bs indicate a 64 on each GPU. Curious to know how it works.

It was surprising that a 2X 11 G NVIDIA machines gives lesser iterations than my single GPU NVIDIA 6G machine.


User avatar
torzdf
Posts: 996
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 28 times
Been thanked: 191 times

Re: how does multi GPU training work

Post by torzdf »

The batch is split between GPUs.

To get the benefit of multiple GPUs, you would want to up the batchsize (i.e. if you are training BS 64 on 1 GPU, you'd want to train BS 128 on 2 GPUs).

My word is final


User avatar
ericpan0513
Posts: 23
Joined: Wed Jul 22, 2020 3:34 am
Has thanked: 6 times

Re: how does multi GPU training work

Post by ericpan0513 »

So, if I run the training with batchsize=64 in one GPU, then if I have 4 same GPUs I should increase batchsize to 256,right?
But I found that the maximum of batchsize setting is only up to 256, means that even if I got like 5 or 10 GPUs on one single machine, I still can not improve the speed(?) more than it did on 4 GPUs?
Hope you can answer, thanks!

User avatar
bryanlyon
Site Admin
Posts: 495
Joined: Fri Jul 12, 2019 12:49 am
Answers: 41
Location: San Francisco
Has thanked: 3 times
Been thanked: 120 times
Contact:

Re: how does multi GPU training work

Post by bryanlyon »

You can manually enter any number, the slider only stops there since it's a "normal" limit. If you need 512 you can enter 512, but remember that models stop learning details at very high batch sizes.

User avatar
ericpan0513
Posts: 23
Joined: Wed Jul 22, 2020 3:34 am
Has thanked: 6 times

Re: how does multi GPU training work

Post by ericpan0513 »

OK I see, thank you !
I thought that multiple GPUs is like if I use batchsize=256 on 4 GPUs, then the detail might be the same as batchsize=64 on 1GPU, so this is wrong, right?
If it's wrong, why is this happening? Don't models split in different parts and all using a batchsize of 64?(4GPUs Bs=256)

User avatar
bryanlyon
Site Admin
Posts: 495
Joined: Fri Jul 12, 2019 12:49 am
Answers: 41
Location: San Francisco
Has thanked: 3 times
Been thanked: 120 times
Contact:

Re: how does multi GPU training work

Post by bryanlyon »

Training uses two paths, one forward and one backward. Forward gets split to all GPUs, but the backward pass happens once with all the batches at once. As you increase the batch size, it's faster since it uses more images per backward pass, but it also becomes more noisy as the gradients interfere with each other. Splitting to multiple GPUs doesn't solve this interference issue.

Too small of batch sizes also have their problems. This is why we normally recommend batch sizes between 8 and 128.

Locked