Is Learning Rate inversely proportional to Batch size ?

ugramund · Post by **ugramund** » Thu Mar 25, 2021 5:35 pm

Training on my CPU for more than a month now with different image sets on 3 different models :-

Original - Fastest EGs/sec batch size = 12
Dlight (128in/out) - Fastest EGs/sec batch size = 9
RealFace (64in/128out)- Fastest EGs/sec batch size =5

What I have seen (not one time but many times) that if I took half of the Fastest EGs/sec batch, for eg: in my case :-
Original :- half of fastest EGs/sec batch = 6
Dlight :- half of fastest EGs/sec batch = 4 (as 4.5 is not possible)
RealFace :- half of fastest EGs/sec batch = 2 (as 2.5 is not possible)

It can easily handle the learning rate of 0.0001 & the previews become almost perfectly sharp.
Also, I have always trained with learning rate of 6.5e-5 with my fastest EGs/sec batch & it never
came out blurry in the preview window & was always sharp.

My Observations:-
Fastest EGs/sec batch = works well with 6.5e-5 & below Learning rates
Half of fastest EGs/sec batch = works well with 0.0001 & below Learning rates

My maximum experiments were with the RealFace model & this is what I have got on my CPU training:-

Half of fastest EGs/sec batch size 2 = works well with 0.0001 Learning rate
batch size 3 = works well with 8.5e-5 Learning rate
Fastest EGs/sec batch size 5 = works well with 6.5e-5 & below Learning rates

So by this calculation, it seems like Learning Rate is inversely proportional to Batch size.

And whenever I tried a learning rate of anything 7e-5 & up with my fastest EGs/sec batch,
the preview window got blurrier & blurrier & the loss value just swings in the upper region.

During all this training time, the Eye & Mouth multiplier were set to 1.

This ratio was same in case of both Original & Dlight model where I was able to try little
higher batches than the RealFace model.

But I am just talking about the preview window as I am still experimenting with different settings
& have not yet made an actual swap video.
Also, I have tested this thing only on Original, Dlight & RealFace models on CPU training, so I cannot
say if this thing is true for other models or on GPU training.

Are any of my observations true?
Can someone with a GPU confirm it?

algeron · Post by **algeron** » Wed Mar 31, 2021 2:15 pm

I believe - someone correct me if I'm wrong - this is because batches adjust weights based on the average score of the EGS in the batch, so outliers will cancel each other out.

Faceswap Forum

Is Learning Rate inversely proportional to Batch size ?

Is Learning Rate inversely proportional to Batch size ?

Re: Is Learning Rate inversely proportional to Batch size ?