Is Learning Rate inversely proportional to Batch size ?

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
ugramund
Posts: 27
Joined: Sat Feb 20, 2021 3:56 am
Location: India
Has thanked: 25 times
Been thanked: 4 times

Is Learning Rate inversely proportional to Batch size ?

Post by ugramund »

Training on my CPU for more than a month now with different image sets on 3 different models :-

Original - Fastest EGs/sec batch size = 12
Dlight (128in/out) - Fastest EGs/sec batch size = 9
RealFace (64in/128out)- Fastest EGs/sec batch size =5

What I have seen (not one time but many times) that if I took half of the Fastest EGs/sec batch, for eg: in my case :-
Original :- half of fastest EGs/sec batch = 6
Dlight :- half of fastest EGs/sec batch = 4 (as 4.5 is not possible)
RealFace :- half of fastest EGs/sec batch = 2 (as 2.5 is not possible)

It can easily handle the learning rate of 0.0001 & the previews become almost perfectly sharp.
Also, I have always trained with learning rate of 6.5e-5 with my fastest EGs/sec batch & it never
came out blurry in the preview window & was always sharp.

My Observations:-
Fastest EGs/sec batch = works well with 6.5e-5 & below Learning rates
Half of fastest EGs/sec batch = works well with 0.0001 & below Learning rates


My maximum experiments were with the RealFace model & this is what I have got on my CPU training:-

Half of fastest EGs/sec batch size 2 = works well with 0.0001 Learning rate
batch size 3 = works well with 8.5e-5 Learning rate
Fastest EGs/sec batch size 5 = works well with 6.5e-5 & below Learning rates

So by this calculation, it seems like Learning Rate is inversely proportional to Batch size.

And whenever I tried a learning rate of anything 7e-5 & up with my fastest EGs/sec batch,
the preview window got blurrier & blurrier & the loss value just swings in the upper region.

During all this training time, the Eye & Mouth multiplier were set to 1.

This ratio was same in case of both Original & Dlight model where I was able to try little
higher batches than the RealFace model.


But I am just talking about the preview window as I am still experimenting with different settings
& have not yet made an actual swap video.
Also, I have tested this thing only on Original, Dlight & RealFace models on CPU training, so I cannot
say if this thing is true for other models or on GPU training.

Are any of my observations true?
Can someone with a GPU confirm it?

In love with the the RealFace model.

User avatar
algeron
Posts: 5
Joined: Sat Mar 27, 2021 7:00 pm
Answers: 1
Has thanked: 4 times
Been thanked: 1 time

Re: Is Learning Rate inversely proportional to Batch size ?

Post by algeron »

I believe - someone correct me if I'm wrong - this is because batches adjust weights based on the average score of the EGS in the batch, so outliers will cancel each other out.

Locked