LR optimizer first impressions (yay new stuff)

Ryzen1988 · Post by **Ryzen1988** » Sun Aug 27, 2023 4:07 pm

So, how blind can you trust this graph i wonder? not much in life is as simple as being able to be captured in a 2d graph but then again it is probably a supercool tool to guide you.

I have already found some cool first conclusions, in the pics is the LR optimizer for Adabelief with and without mixed precision.
As you can see with mixed precision the learning rate should be way lower and the curve goes up much sooner and steeper, no wonder adabelief with mixed precision was always though to get right

: clipvlearning_rate_finder_2023-08-27_17.29.11.png (34.15 KiB) Viewed 10007 times

: learning_rate_finder_2023-08-27_17.38.01.png (34.55 KiB) Viewed 10007 times

Ryzen1988 · Post by **Ryzen1988** » Mon Aug 28, 2023 8:50 am

As far as i can see when the LR optimizer is used the value sort of sticks in the model file right?
Would it be possible to make it so that you can retest models during training?
For example i have noticed the curve looks really different if you have the encoder frozen in the begin stages of training compared to when you have int unfrozen and in the training loop.
Batchsize also seem to influence the LR rate, so would seem really useful to be able to retest the LR curve when new parameters are applied.

Post by **torzdf** » Mon Aug 28, 2023 9:01 am

Ryzen1988 wrote: ↑Mon Aug 28, 2023 8:50 am
Would it be possible to make it so that you can retest models during training?

No, for the reason stated in the training documentation:

https://forum.faceswap.dev/viewtopic.php?t=146#settings wrote:
For new models, this Learning Rate will be discovered prior to commencing training. For resuming saved models, the Learning Rate discovered when first creating the model will be used. It is not possible to use the Learning Rate Finder to discover a new optimal Learning Rate when resuming saved models, as the loss update is too small between each iteration, however this option still needs to be enabled if you wish to use the Learning Rate that was found during the initial discovery phase.

Ryzen1988 wrote: ↑Mon Aug 28, 2023 8:50 am
Batchsize also seem to influence the LR rate, so would seem really useful to be able to retest the LR curve when new parameters are applied.

Yes, Batchsize has a huge impact on Learning Rate. Further discussed here:
viewtopic.php?p=7432#p7432

Ryzen1988 · Post by **Ryzen1988** » Mon Aug 28, 2023 10:51 am

I'm i that late to notice this function or did you already updated the guide?

I was wondering about the lack of being able to read a changelog but from your reaction i guess the guide is also something that integrates all the new functions that arrive, will check that out first.

I understand that LR rate updates are to small to properly retest.
What i did now for testing, please comment if this makes sense.
Load the model with encoder frozen, batchsize 32 - Initial training stage, get the LR curve and delete model
Load the model with encoder unfrozen batchsize 24 - main training stage, get the LR curve and delete model
Load the model with encoder unfrozen en small batchsize for final training without warp - get the LR curve, delete the model
Start with step one training with the LR value, and the 2 other values you already have for the latter training steps

Maybe its a bit overdone or im making it over overcomplicated but the reported values due change a lot in the three modes an average model passes.

Post by **torzdf** » Mon Aug 28, 2023 12:37 pm

The main issue with this approach is that the random initialization has a huge impact on the Learning Rate. There is a link in the training guide that gives some information about what the lr_finder does and some things to think about.

I did not consider the use-case for wanting to test different batch sizes (I tend to train at the batch size I can fit, so this value is always small) so there is no easy way to currently test that kind of stuff with the current implementation.

When I add significant new features, I always update the guides either before, or very soon after, release.

We don't have a changelog because Faceswap is rolling release. The easiest way to see what is added and when is to look at the commit history: https://github.com/deepfakes/faceswap/commits/master

MaxHunter · Post by **MaxHunter** » Tue Aug 29, 2023 3:22 am

@Ryzen1988 I've asked for the same, but I found the easiest and quickest way is to do a "system information" check and it'll give you a brief description of what's been changed. If it's significant I go to the GitHub.

Ryzen1988 · Post by **Ryzen1988** » Thu Aug 31, 2023 4:02 pm

So i have noticed with the optimizer that the first 20% and last 20% is never where the useful part is.
It takes a while before the Best rate start ticking up.
Would it be hard to make the LR steps wider until the curve starts dropping, having dense small steps in the center of the curve but as soon as the best LR isn't moving anymore + margian it can accelerate again in bigger steps. Now certainly 1/3 of the LR rate optimizer feels like just waiting for nothing. In the last part you know it will not move again.
Often i just manually close, enter the rate and go train that feels more logical then letting it finish.

Alternatively having a way to set the min and max value for the optimizer would also save time,
especially when like me, you are testing a whole lot of configurations before actually launching the training

Look at the beauty of this curve, i wonder if it will reflect the potential of the network

Post by **torzdf** » Thu Aug 31, 2023 9:46 pm

What you are asking for is theoretically possible (min and max values), but I didn't implement it for a couple of reasons.... It would be yet more config options to add the already myriad of options, and I don't trust users not to input insane values and then complain that it doesn't work.

As for the step size, it raises logarithmically through each step. I'm unlikely to change that part, as I deliberately implemented as per the original implementation.

Ryzen1988 · Post by **Ryzen1988** » Mon Sep 04, 2023 4:43 pm

Another interesting curve

Ryzen1988 · Post by **Ryzen1988** » Mon Sep 25, 2023 1:01 pm

: learning_rate_finder_2023-09-25_14.34.57.png (33.74 KiB) Viewed 9462 times

That's what you would call a really hyperactive geek that's really wants to learn

Post by **torzdf** » Thu Sep 28, 2023 10:41 am

Yeah, it has definitely put the points in the wrong place there!

Faceswap Forum

LR optimizer first impressions (yay new stuff)

LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)

Re: LR optimizer first impressions (yay new stuff)