If I have 4 GPUs, do I need to divide batch_size by 4 for getting the same result that came with 1 GPU?
I saw the memory allocated to each GPU in distribute mode is the same as when training in single GPU mode. So, I thought the actual batch size with 4 GPU distribution mode is four times in 1 GPU. That means, if I set the batch size to 16 and train the model with 4 GPU, my actual batch size is 16 x 4 which is 64.
Just want to confirm if the script auto divides the batch size.