[Tip] Finding your highest batch size

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
Surrogator
Posts: 13
Joined: Sun Jul 14, 2019 11:39 pm
Has thanked: 2 times
Been thanked: 2 times

[Tip] Finding your highest batch size

Post by Surrogator »

At the moment of writing, Faceswap doesn't offer a solution to figure out your maximum batch size out of the box. We'll be using a technique called Binary Search Algorithm to manually find your highest achievable batch size for training a given model on your GPU.

That's just a fancy way of saying we'll use the fewest steps required to find the highest it'll go for you.

The maximum batch size you can train on depends on a couple of things:

  • Your chosen model for the training.

  • Your GPU's Video RAM.

  • Your other hardware may factor into it, but to a lesser extent.

So if you start training on a different model, your max batch size is probably going to be different.

Finding the highest batch size

  1. Visualise the possible batch sizes as numbers between 1 (LOW) and 128 (HIGH).

  2. Set batch size to the middle of the range.

  3. Start training.

  4. Did it crash?

    1. Yes - Decrease HIGH to the batch size - 1, then repeat from Step 2.

    2. No - Increase LOW to batch size + 1, then repeat from Step 2.

  5. Repeat until your range is narrowed down to a single number that didn't crash your training. This is your maximum batch size for that model on your hardware.

Should I use the highest batch size possible?
Probably not. Best to use 1 or 2 lower than your absolute maximum.

kvrooman wrote:

Increasing batch sizes from low numbers will provide speedups to a certain point. If you push the batch size to the absolute limit, Tensorflow will switch out fast but memory intensive calculation methods for slower but memory efficient versions to prevent OOM crashes... Keep increasing bs past that though and you eventually get the crash. So the fastest batch size is slightly before OOM crashes. Also, there are slight quality degradations at very large batch sizes. ( 100+ bs )

Confused? Here's an example!

  • Start with a range between 1 (LOW) and 128 (HIGH).

  • Set batch size to 64, start training... crash. :shock:

  • Shrink your range to 1 - 63 .

  • Set batch size to 32, start training... crash. :shock:

  • Shrink your range to 1 - 31.

  • Set batch size to 16, start training... stable! :mrgreen:

  • Shrink your range to 17 - 31.

  • Set batch size to 24, start training... crash. :shock:

  • Shrink your range to 17 - 23.

  • Set batch size to 20, start training... stable! :mrgreen:

  • Shrink your range to 21 - 23.

  • Set batch size to 22, start training... stable! :mrgreen:

  • Shrink your range to 23.

  • Set batch size to 23, start training... crash. :shock:

  • Your maximum batch size is 22.

  • You should probably use 20 or 21.

Last edited by Surrogator on Tue Sep 17, 2019 3:11 pm, edited 1 time in total.

Tags:
User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: [Guide] Finding your highest batch size

Post by bryanlyon »

This is not recommended. Or if you do find your maximum BS you should add a "buffer" of a few extra samples to make sure you don't OOM at some random point.

User avatar
Surrogator
Posts: 13
Joined: Sun Jul 14, 2019 11:39 pm
Has thanked: 2 times
Been thanked: 2 times

Re: [Guide] Finding your highest batch size

Post by Surrogator »

bryanlyon wrote: Tue Sep 17, 2019 3:09 pm

you should add a "buffer" of a few extra samples to make sure you don't OOM at some random point.

Did you read the guide?

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: [Guide] Finding your highest batch size

Post by bryanlyon »

Yes, and it is still wrong. Just because it doesn't OOM right away does not mean it wont later. Garbage collection in GPU is not perfect and what works at one moment may later crash. Don't push to the limit, not even judging by the error messages as that only accounts for what has been allocated immediately. GPU usage also varies by numerous factors from whether or not you're running any other applications to whether Windows is feeling particularly greedy at that moment.

To be safe, it's best not to optimize to the largest BS you can handle and instead focus on your data as unless you are off by a huge amount, the gains you get will be minimal or even net negative.

User avatar
Surrogator
Posts: 13
Joined: Sun Jul 14, 2019 11:39 pm
Has thanked: 2 times
Been thanked: 2 times

Re: [Guide] Finding your highest batch size

Post by Surrogator »

bryanlyon wrote: Tue Sep 17, 2019 3:09 pm

This is not recommended.

What isn't? Finding the highest batch size? How can you find a batch size safely under the limit, if you don't know what the limit is? This guide helps you find your system's limit.

bryanlyon wrote: Tue Sep 17, 2019 3:09 pm

you should add a "buffer" of a few extra samples to make sure you don't OOM at some random point.

This guide also recommends you do exactly that.

bryanlyon wrote: Tue Sep 17, 2019 3:31 pm

it is still wrong.

Take the guide at face value: it teaches you how to find your limit. It is not wrong in how it tells you to do that. The guide doesn't tell you that you must train at your highest possible batch size. Which is why I asked if you even read the guide.

User avatar
andenixa
Posts: 2
Joined: Fri Jul 12, 2019 11:10 am

Re: [Guide] Finding your highest batch size

Post by andenixa »

From experience I recommend the following sequence:
128, 92, 64, 48, 32, 22, 16, 12, 8, 4
If the value you found, i.e., 48, gave you OOM within a few hours reduce it by 15% which should be your safe value.
Taking a literal binary search approach is redundant.

PS: I don't recommend batch-sizes of more than 48 (multiply it by number of GPUs you have, for example, 96 for dual-GPU).

User avatar
AndrewB
Posts: 8
Joined: Tue Nov 12, 2019 10:16 am
Has thanked: 1 time

Re: [Tip] Finding your highest batch size

Post by AndrewB »

I have RTX 2070S with 8gb of VRAM. For Dfaker bs=64 is fine, but Sae and RealFace crash at bs=16. All setting are by default. Is it ok for 8gb?

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: [Tip] Finding your highest batch size

Post by torzdf »

Yes, they are both heavy models. Just lower the batch size until it runs

My word is final

Locked