I'm trying to use faceswap on a 6 GPU (6 x nVidia 1070 - 8GB)with a i7 9th gen CPU and 24GB of RAM.
The results are really strange. Here the average speeds while changing GPU numbers on training with a batch size of 100:
1 GPU: 2.5 iterations / s
2 GPU: 1.2 iterations /s
3 GPU: 0.4 iterations / s
4 GPU: process hanging
5 GPU: idem
6 GPU: even not tested ...
I wasn't expecting that kind of results. Any clue about what's hapening here ?
So if I understood this correctly, multiple GPU are only useful when increasing the batch size, right?
In the training guide the following is stated:
Higher batch sizes will train faster, but will lead to higher generalization. Lower batch sizes will train slower, but will distinguish differences between faces better. Adjusting the batch size at various stages of training can help.
So basically you can achieve better results using lower batch sizes, it will only be slower? Has someone experimented with the different batch sizes and could perhaps offer some guidance how long you should train with different batch sizes to get the optimal results? Of course this might be affected by number of things but any idea what the baseline might be?
Yes, experimentation has been done. We recommend using the largest batch size you can until you're not seeing any improvements, then you can do a fit train at a lower batch size. See other discussions here on how to do fit training.