At the moment of writing, Faceswap doesn't offer a solution to figure out your maximum batch size out of the box. We'll be using a technique called Binary Search Algorithm to manually find your highest achievable batch size for training a given model on your GPU.
That's just a fancy way of saying we'll use the fewest steps required to find the highest it'll go for you.
The maximum batch size you can train on depends on a couple of things:
Your chosen model for the training.
Your GPU's Video RAM.
Your other hardware may factor into it, but to a lesser extent.
So if you start training on a different model, your max batch size is probably going to be different.
Finding the highest batch size
Visualise the possible batch sizes as numbers between 1 (LOW) and 128 (HIGH).
Set batch size to the middle of the range.
Start training.
Did it crash?
Yes - Decrease HIGH to the batch size - 1, then repeat from Step 2.
No - Increase LOW to batch size + 1, then repeat from Step 2.
Repeat until your range is narrowed down to a single number that didn't crash your training. This is your maximum batch size for that model on your hardware.
Should I use the highest batch size possible?
Probably not. Best to use 1 or 2 lower than your absolute maximum.
kvrooman wrote:Increasing batch sizes from low numbers will provide speedups to a certain point. If you push the batch size to the absolute limit, Tensorflow will switch out fast but memory intensive calculation methods for slower but memory efficient versions to prevent OOM crashes... Keep increasing bs past that though and you eventually get the crash. So the fastest batch size is slightly before OOM crashes. Also, there are slight quality degradations at very large batch sizes. ( 100+ bs )
Confused? Here's an example!
Start with a range between 1 (LOW) and 128 (HIGH).
Set batch size to 64, start training... crash.
Shrink your range to 1 - 63 .
Set batch size to 32, start training... crash.
Shrink your range to 1 - 31.
Set batch size to 16, start training... stable!
Shrink your range to 17 - 31.
Set batch size to 24, start training... crash.
Shrink your range to 17 - 23.
Set batch size to 20, start training... stable!
Shrink your range to 21 - 23.
Set batch size to 22, start training... stable!
Shrink your range to 23.
Set batch size to 23, start training... crash.
Your maximum batch size is 22.
You should probably use 20 or 21.