Page 1 of 1

Is Learning Rate inversely proportional to Batch size ?

Posted: Thu Mar 25, 2021 5:35 pm
by ugramund

Training on my CPU for more than a month now with different image sets on 3 different models :-

Original - Fastest EGs/sec batch size = 12
Dlight (128in/out) - Fastest EGs/sec batch size = 9
RealFace (64in/128out)- Fastest EGs/sec batch size =5

What I have seen (not one time but many times) that if I took half of the Fastest EGs/sec batch, for eg: in my case :-
Original :- half of fastest EGs/sec batch = 6
Dlight :- half of fastest EGs/sec batch = 4 (as 4.5 is not possible)
RealFace :- half of fastest EGs/sec batch = 2 (as 2.5 is not possible)

It can easily handle the learning rate of 0.0001 & the previews become almost perfectly sharp.
Also, I have always trained with learning rate of 6.5e-5 with my fastest EGs/sec batch & it never
came out blurry in the preview window & was always sharp.

My Observations:-
Fastest EGs/sec batch = works well with 6.5e-5 & below Learning rates
Half of fastest EGs/sec batch = works well with 0.0001 & below Learning rates


My maximum experiments were with the RealFace model & this is what I have got on my CPU training:-

Half of fastest EGs/sec batch size 2 = works well with 0.0001 Learning rate
batch size 3 = works well with 8.5e-5 Learning rate
Fastest EGs/sec batch size 5 = works well with 6.5e-5 & below Learning rates

So by this calculation, it seems like Learning Rate is inversely proportional to Batch size.

And whenever I tried a learning rate of anything 7e-5 & up with my fastest EGs/sec batch,
the preview window got blurrier & blurrier & the loss value just swings in the upper region.

During all this training time, the Eye & Mouth multiplier were set to 1.

This ratio was same in case of both Original & Dlight model where I was able to try little
higher batches than the RealFace model.


But I am just talking about the preview window as I am still experimenting with different settings
& have not yet made an actual swap video.
Also, I have tested this thing only on Original, Dlight & RealFace models on CPU training, so I cannot
say if this thing is true for other models or on GPU training.

Are any of my observations true?
Can someone with a GPU confirm it?


Re: Is Learning Rate inversely proportional to Batch size ?

Posted: Wed Mar 31, 2021 2:15 pm
by algeron

I believe - someone correct me if I'm wrong - this is because batches adjust weights based on the average score of the EGS in the batch, so outliers will cancel each other out.