Training on my CPU for more than a month now with different image sets on 3 different models :-
Original - Fastest EGs/sec batch size = 12
Dlight (128in/out) - Fastest EGs/sec batch size = 9
RealFace (64in/128out)- Fastest EGs/sec batch size =5
What I have seen (not one time but many times) that if I took half of the Fastest EGs/sec batch, for eg: in my case :-
Original :- half of fastest EGs/sec batch = 6
Dlight :- half of fastest EGs/sec batch = 4 (as 4.5 is not possible)
RealFace :- half of fastest EGs/sec batch = 2 (as 2.5 is not possible)
It can easily handle the learning rate of 0.0001 & the previews become almost perfectly sharp.
Also, I have always trained with learning rate of 6.5e-5 with my fastest EGs/sec batch & it never
came out blurry in the preview window & was always sharp.
My Observations:-
Fastest EGs/sec batch = works well with 6.5e-5 & below Learning rates
Half of fastest EGs/sec batch = works well with 0.0001 & below Learning rates
My maximum experiments were with the RealFace model & this is what I have got on my CPU training:-
Half of fastest EGs/sec batch size 2 = works well with 0.0001 Learning rate
batch size 3 = works well with 8.5e-5 Learning rate
Fastest EGs/sec batch size 5 = works well with 6.5e-5 & below Learning rates
So by this calculation, it seems like Learning Rate is inversely proportional to Batch size.
And whenever I tried a learning rate of anything 7e-5 & up with my fastest EGs/sec batch,
the preview window got blurrier & blurrier & the loss value just swings in the upper region.
During all this training time, the Eye & Mouth multiplier were set to 1.
This ratio was same in case of both Original & Dlight model where I was able to try little
higher batches than the RealFace model.
But I am just talking about the preview window as I am still experimenting with different settings
& have not yet made an actual swap video.
Also, I have tested this thing only on Original, Dlight & RealFace models on CPU training, so I cannot
say if this thing is true for other models or on GPU training.
Are any of my observations true?
Can someone with a GPU confirm it?