Experiments in pretraining

vichitra5587 · Post by **vichitra5587** » Wed Nov 09, 2022 9:04 pm

This was one of the questions on perfect 0.0 loss : - viewtopic.php?p=5140
that I asked about a year ago when I was known as @ugramund here.

I think a perfect 0.0 loss is not possible while swapping faces but it
is possible when you do something called as Image Recreation.

There is a kind of small trick involved in achieving this perfect 0.0 loss but
let me first explain step by step what I am doing.

I am making a Pre-Trained model to be used in all of my future projects.
For this, I have to first decide with which model I have to go & for that
I do a test which I personally called Image Perfection but later I read
somewhere that it was called as Image Recreation in deep learning.

In this process, we first check how well a model can recreate an image that it sees.
We are not talking about swapping faces now, we just talking about redrawing the image that it sees.
So to do this in Faceswap, I select an image, make 25 copies of it because that's the minimum
number of images you need for faceswap to start training in a folder.
This same folder is used as path to both Inputs A & B.

Now, using this test, I was able to decide with which model I should go so as to get the most
realistic image quickly on my RTX 3080 Ti GPU on an output resolution of 256px or above.

This is just a glimpse from my ongoing SECRET project.
Don't think that the 1.9 million iterations@batch 10 that you are seeing in
this video was done to achieve Image Recreation on this single image. I have a HQ data set of
22k images, out of which a small pre training have been performed on 10K images till now.
This 1.9 million iterations@batch 10 is the result of that small training.
The data set I have for pre training is in 3 parts:-
1)10K images from the Nvidia FFHQ data set
2)10K images from the CelebA HQ data set
3)My own HQ 2k+ image collection

If anyone doesn't know about the Nvidia FFHQ data set, you should definitely read about
how awesome data set it is :-
https://github.com/NVlabs/ffhq-dataset

Perfect 0.0 Loss
So after deciding my model , I was training my model like 20+ hrs a day for the last
1 week on some 10k random images from my 22k data set. Initially when I did some random
Image Recreation tests in between to see how good my model is becoming during the training,
it would take about 3-4 hrs to achieve this kind of image recreation on a single image.
Then a stage came yesterday after which you can now throw any HQ seen/unseen image at my
model & it will recreate that image to 90-92% of this perfection level in just 1hr. But the
very fine details like lip-lines or small dots/marks on the faces which are like 8-10% of the
facial details may still require extra 1-2hrs depending upon how complex the face is.
The face that you see in this video was perfected in some 1hr 20 minutes & my model had
never trained on this image previously.

My model was always trained with SSIM loss function & even when I achieved this level of
Image Recreation on other images, so great that on my 32inch 2k monitor,
I was not able tell the difference when looking from close up, the loss function hovered around
0.00250's & the maximum down I was able to take it was may be 0.00210 & after that I always started
training on new images because I thought as the Image Recreation was so good that it would be
difficult for most people to tell any difference, there is no meaning in training it just to
achieve a loss value of 0.0 .

Now comes the trick part. When I completed the Image Recreation on this particular image, the
loss values were hovering in the region of 0.00245, so just for fun, I thought why not change
the loss function at this stage to see how other loss functions will behave & so when I applied
the very 1st loss fuction "ffl" & started training, it instantly declared a perfect loss of 0.0.

So, for this particular image, at this level of Image Recreation :-
SSIM will show a loss value of 0.00245
ffi will show a loss value of 0.00000

Pre-Trained Models = Real world Professional Models
Let's get straight to the point.
I have seen some people on this forum suggesting that we should not use pre trained
models as it can lead to identity bleeding.

Just tell me, when an engineering graduate comes for a job, do you call him
just an engineer or a pre-trained engineer. Of course you will call him just an engineer
because without that pre-training he would not able do his job or will be very
sloppy at it. The same thing goes in the real world for actual AI work where big industries
use DL/ML Models pre-trained on mammoth size data sets.

Do you think the medical industry where it has to use the AI to predict diseases,
the social media industry where it has to use AI to do Image Recognition on millions of images,
Netflix which predicts viewer patterns using AI, do they build their models from scratch daily.
No, they have models pre trained on huge data sets using thousands of professional level GPUs
for months & after that they are brought in the real world for work and then the models are used
continuously to learn/work more without being changed periodically.

There are basically 3 types of data sets used on a model before it goes for actual real world work:-
1)Training data set
2)Validation data set
3)Test data set
You can Google them & read many articles on it. I am providing just 2 simple links on it :-
https://en.wikipedia.org/wiki/Training, ... _data_sets
https://towardsdatascience.com/train-va ... cb40cba9e7

Even if you go on DeepFaceLab forums, they have pre-trained models.
Why go far, I myself have used the same models pre-trained & used on other
different projects giving me great swaps without any identity bleeding.
I can say this for RealFace, Dfaker & Disney models as I have used them
but cannot say the same for other untested models.

But I know people won't believe without seeing so just wait for a few months till
I complete my training on this 22k data set & then I will show some swap videos here
with the download links to my Pre-Trained/Working model.

MaxHunter · Post by **MaxHunter** » Thu Nov 10, 2022 11:36 pm

Interesting.

So In a nutshell, you're proposing that by training a model against itself for approx. 2M iterations will lead to a near perfect model that can then be used to train another model to near perfection?

Did I get this right?

vichitra5587 · Post by **vichitra5587** » Fri Nov 11, 2022 4:26 am

MaxHunter wrote: ↑Thu Nov 10, 2022 11:36 pm
Interesting.

So In a nutshell, you're proposing that by training a model against itself for approx. 2M iterations will lead to a near perfect model that can then be used to train another model to near perfection?

Did I get this right?

Yes you got 1 thing right from my long post i.e a near perfect model can be used to train other projects to near perfection.

But I clearly told in my post that 2M iterations were not done on this single image.
2M iterations were performed on 10,000 images from the 22,000+ image data set that I have.
They are of different individuals & not of a single person.
I think the individual person count in my data set is more than 4000+ & each person is having
1-8 images thereby taking my total image data set count to 22,000+.

This particular image in the video was perfected in 1hr 20 min after I performed 1.9M iterations on random 10k images from my data set.
All I wanted to show, other than a perfect 0.0 loss was there are no identity leaks/bleeding after training on faces of thousands of individuals.

The point I am stressing here from my long post is :-
1)Pre Trained models don't have any identity bleeding (I can this for Dfaker, RealFace & Disney models).
2)Pre Trained models achieve far better accuracy on a given data set for the same number of iterations on a same hardware than model trained from scratch/new models.
3)Pre Trained models will result in very fast training & will help people who can't increase their batch sizes or have low computation power due to their limited hardware.

So if possible, create a pre trained model of at least 1k images from the Nvidia FFHQ data set :- https://github.com/NVlabs/ffhq-dataset

If you can't do that, then I would suggest just keep using the same model for all your future projects &
you will see it will become better & better both in training speed & predicting accuracy.

And do keep watching the swap showcase forum area as I would be showcasing 2 demo swap videos of
my little pre trained model with model download links in the coming 15-20 days. I hope you will better understand
what I am trying to say if you will download & test my pre trained model.

I personally have a theory where a model, may be after 100 billion iterations@batch 256,
can change into "GOD Mode" where it will become an "Instant Swapper" or an "Instant Image Recreator".
In that state, only in 1 iteration, the model will learn an image completely & will be able to recreate or swap it.

I have seen this on YouTube where some university showcased their video where it had a AI model trained on
data set of particular individual, was performing a REAL TIME face swapping of another person speaking on a camera.
I searched it on YouTube but was not able to find it anymore.

I personally had this experience because when I started this project, initally, to master a single image to this perfection
would take 3-5hrs for my model but as training continued, after 1 week it has reached now to a state where it can master
any single image to this perfection @batch 10 in 1-2hrs.

So the image recreation/swapping time definitely comes down as training progresses.

MaxHunter · Post by **MaxHunter** » Fri Nov 11, 2022 6:08 am

I've only been doing this for about three months, so forgive me for the ignorance because I'm still learning. I hadn't really heard of pre-training before which is why I was asking for clarification.

Let me know when you are ready to share your model, I'd love to test it out.

Post by **torzdf** » Fri Nov 11, 2022 9:39 am

I don't think this is going to work, but I welcome experiments, as they often turn up unexpected surprises.

Ultimately (I think) you are going to end up with a model which does a very good job on 8 specific shots of a specific identity, but fails admirably when shown any other data. Will be interesting to see how it pans out though.

MaxHunter · Post by **MaxHunter** » Sat Nov 12, 2022 6:51 am

So, I have one DNY512 no edits stock model that I've struggled with NaNs, etc to get to over 1.11 million. I thought for giggles I'd try it out as a pre-training for another model with a completely different face structure. The two models couldn't be more different. And I was shocked to literally watch it easily morph into this new model within about 1000-ish iterations! I was floored.

The pre-trained model was trained to I think .08-ish (with no warp turned on around 975k) and when used to start a new model it jumped to over .16 (w/no warp turned off.) After about 10k it was down to .12. I'm training at 1e-5/-7 batch of 1, on a 3080ti EVGA hybrid. (Feel free to suggest different numbers if you have any good ideas) I haven't played with the numbers yet, I just want to see if I can get this up and running and down to a reasonable loss rate without running into NaN warnings or OOMs.

I'm going to attempt to train it over night and see where it's at tomorrow morning. But if it worked well with using only one persons face (5,000 example) for the pre-training I'm now wondering how good it can get training your suggested way with multiple people's faces.
I cannot confirm or deny identity bleed issues yet but it's looking good in the previews.

vichitra5587 · Post by **vichitra5587** » Sat Nov 12, 2022 8:05 am

MaxHunter wrote: ↑Sat Nov 12, 2022 6:51 am
So, I have one DNY512 no edits stock model that I've struggled with NaNs, etc to get to over 1.11 million. I thought for giggles I'd try it out as a pre-training for another model with a completely different face structure. The two models couldn't be more different. And I was shocked to literally watch it easily morph into this new model within about 1000-ish iterations! I was floored.

The pre-trained model was trained to I think .08-ish (with no warp turned on around 975k) and when used to start a new model it jumped to over .16 (w/no warp turned off.) After about 10k it was down to .12. I'm training at 1e-5/-7 batch of 1, on a 3080ti EVGA hybrid. (Feel free to suggest different numbers if you have any good ideas) I haven't played with the numbers yet, I just want to see if I can get this up and running and down to a reasonable loss rate without running into NaN warnings or OOMs.

I'm going to attempt to train it over night and see where it's at tomorrow morning. But if it worked well with using only one persons face (5,000 example) for the pre-training I'm now wondering how good it can get training your suggested way with multiple people's faces.
I cannot confirm or deny identity bleed issues yet but it's looking good in the previews.

Quality wise, 100 images of 100 individuals (1 image/individual) is better than having 1000 images of a single person.
That's why I am saying if want a pre training data set look no further than the Nvidia FFHQ data set.

I want to share 2 tips with you that helped me a lot, try & see if it is useful for you.

1)Use only the 1st main Loss Function SSIM & make the Loss Function 2 = none from the default mse.
Also change the Loss Weight 2 = 0(I think it's not needed if we have already made Loss Function 2 = none).
Just use only one single loss fuction SSIM.

No matter how much I train, I was not able to remove very fine blurriness from my final preview but this one
setting has changed everything. Even the loss function description also says "mse" tends to produce slightly blurrier results.
The picture you were seeing in this video was perfected under this single SSIM loss setting.

2)Increase your learning rate to 8e-5.

These 2 settings had made my model achieve faster realistic results with less training.
Check if these settings help you or not.

Post by **bryanlyon** » Sat Nov 12, 2022 8:39 am

Perfect loss is mathematically impossible. It's not just an infeasible thing, it's complete nonsense.

Like claiming that some invention creates more power than it takes in -- It's simply impossible. That said, what you're doing is even worse in many ways. Effectively you're massively overtraining your decoders. In fact, if you did succeed in getting perfect recreation of the original image, you'd actually be harming the model's ability to ever do anything else. The decoder would just memorize the output that you expected from it. The encoder will basically just fill with garbage and get completely useless at it's task.

Pretraining is a hurtful process which you get an decoder and encoder closely coupled and trained on one face. Moving it to another face it must first abandon all those couplings and find a new face. To use your engineer analogy you've got someone with 20 years of very specific engineering in designing bridges and now you're asking them to re-design the Sistine chapel. Some of the techniques may be the same, but they're actually going to be stuck in their ways and not going to be able to use those techniques properly. In fact, you'll have an easier time with a fresh out of college engineer who is excited to learn new things.

In reality, your ideas sound a lot better than they actually work. You've got a rather shallow understanding of how the models work, and traditional deepfakes can NEVER become an "instant swapper". It's just not how FaceSwap is engineered (Though nothing against other projects that are designed for that such as First Order Model). Much like how no amount of work will turn a bridge into the Sistine chapel, they're VERY different tasks and work on one is almost useless for the other.

vichitra5587 · Post by **vichitra5587** » Sat Nov 12, 2022 9:18 am

bryanlyon wrote: ↑Sat Nov 12, 2022 8:39 am
Perfect loss is mathematically impossible. It's not just an infeasible thing, it's complete nonsense.

Like claiming that some invention creates more power than it takes in -- It's simply impossible. That said, what you're doing is even worse in many ways. Effectively you're massively overtraining your decoders. In fact, if you did succeed in getting perfect recreation of the original image, you'd actually be harming the model's ability to ever do anything else. The decoder would just memorize the output that you expected from it. The encoder will basically just fill with garbage and get completely useless at it's task.

Pretraining is a hurtful process which you get an decoder and encoder closely coupled and trained on one face. Moving it to another face it must first abandon all those couplings and find a new face. To use your engineer analogy you've got someone with 20 years of very specific engineering in designing bridges and now you're asking them to re-design the Sistine chapel. Some of the techniques may be the same, but they're actually going to be stuck in their ways and not going to be able to use those techniques properly. In fact, you'll have an easier time with a fresh out of college engineer who is excited to learn new things.

In reality, your ideas sound a lot better than they actually work. You've got a rather shallow understanding of how the models work, and traditional deepfakes can NEVER become an "instant swapper". It's just not how FaceSwap is engineered (Though nothing against other projects that are designed for that such as First Order Model). Much like how no amount of work will turn a bridge into the Sistine chapel, they're VERY different tasks and work on one is almost useless for the other.

Mr @bryanlyon , why are blasting at me as if I am some kind of your enemy.

I have myself said in the very beginning of this post that perfect loss might be impossible while doing faceswaps, but
it is possible when we do image recreation on a single image & set loss function to "ffi". I have already shown that in a video.
If don't want to believe, it's up to you.

Next, I did this thing just to check which model will give me the most realistic image & it is not an uncommon thing in deep learning world.

3rd, you said it will make my model garbage. Sir, in the past, on one of my friend's request, I have a already made a pretty damn good
swap video for him with this kind of pre trained model, so I know what I am doing. And like I said, I am already preparing swap videos
to be posted here in this forum using this model, so if i post a crap video, you win & if I do a good job, then you loose on your pre training theory.

Have you ever done 100 billion iterations@256 batch ?
If not, then I won't believe you for what you have said that traditional deepfakes can NEVER become an "instant swapper".
I am not stopping you from saying anything, you believe what you THINK is right & I believe only when I have done that thing.
I am like that, I don't believe anyone until I do that think myself, but I do listen to everyone's theory & try to learn from it.

And lastly, I still stand by my side as the internet is full of such articles & it is the fact that in the real professional world applications,
heavily pre trained models are used. Like I said in my example, do you think that Netflix, Medical industries, Social media industries etc
daily create & train their models from scratch?

No! they are heavily pre trained models on huge data set.

And I won't be answering you anymore.
My swap showcase videos will speak to you.

Post by **bryanlyon** » Sat Nov 12, 2022 10:01 am

I am not blasting you. Just pointing out Dunning Kruger effect. The simple matter is that I don't have to train to 100 billion to know that FaceSwap can never become an "instant swapper" for the same reason that I know that a car cannot fly if you give it high enough octane fuel. It's not engineered for that and it's not how it works.

If you want to reuse a model for other faces, that's fine, but if you try to use it as a "short cut" to faster training you will inevitably end up with identity bleed. That's simply how it works, it is creating a face -- mostly from memory inside the model. If you give it a single face image it'll just memorize more and more of that exact image. And since the Encoder will inevitably not be able to provide new useful information to the decoder the encoder will start to make more and more drastic changes in an attempt to feed better information to the decoder -- the same decoder you're training to memorize that one single image. This leads the encoder to actually get WORSE at faces than if you gave it lots of faces to work from.

An analogy I use is that there is a grandmaster painter, his two apprentices and a judge. The grandmaster (encoder) tells the apprentices what expressions the person has, their pose, and the lighting, but not what the person looks like. Both apprentices (the decoders) attempt to draw the images that the grandmaster has described, but they're working with different people's faces. Then the judge (loss) scores the paintings based on how accurate they are. If you are only doing a single image over and over, the apprentices learn to ignore the grandmaster since his input is completely meaningless to the task. If you then give them all new faces, the apprentices ignore the grandmaster's new information and just draw the same image.

Models CAN be re-used or retrained, but you have to change the architecture and not just the data. For example, CLIP was created by OpenAI as a paired language and image feature extractor. The idea is that you could have it scan a picture of a dog or some text of "dog" and give the same feature list -- Useful for things like searching for similar images or other tasks. This trained model was inserted into Stable Diffusion's image generation tools as the "text" part of the model, allowing you to create images you can describe. It wasn't able to just be trained with new data saying "give me an image now", it had to be engineered and designed into a new solution.

Back to the car analogy above, it'd be like taking the engine out of the car and putting it into a new shell that has wings and propellers. In that situation the car CAN fly, but it's not just because you gave it airplane fuel, it's because you turned that car into an airplane.

No amount of training a FaceSwap model will make it better able to swap other faces without training on those faces. That's just not how it works. FaceSwap is a car, not an airplane.

vichitra5587 · Post by **vichitra5587** » Sat Nov 12, 2022 11:11 am

bryanlyon wrote: ↑Sat Nov 12, 2022 10:01 am
I am not blasting you. Just pointing out Dunning Kruger effect. The simple matter is that I don't have to train to 100 billion to know that FaceSwap can never become an "instant swapper" for the same reason that I know that a car cannot fly if you give it high enough octane fuel. It's not engineered for that and it's not how it works.

If you want to reuse a model for other faces, that's fine, but if you try to use it as a "short cut" to faster training you will inevitably end up with identity bleed. That's simply how it works, it is creating a face -- mostly from memory inside the model. If you give it a single face image it'll just memorize more and more of that exact image. And since the Encoder will inevitably not be able to provide new useful information to the decoder the encoder will start to make more and more drastic changes in an attempt to feed better information to the decoder -- the same decoder you're training to memorize that one single image. This leads the encoder to actually get WORSE at faces than if you gave it lots of faces to work from.

An analogy I use is that there is a grandmaster painter, his two apprentices and a judge. The grandmaster (encoder) tells the apprentices what expressions the person has, their pose, and the lighting, but not what the person looks like. Both apprentices (the decoders) attempt to draw the images that the grandmaster has described, but they're working with different people's faces. Then the judge (loss) scores the paintings based on how accurate they are. If you are only doing a single image over and over, the apprentices learn to ignore the grandmaster since his input is completely meaningless to the task. If you then give them all new faces, the apprentices ignore the grandmaster's new information and just draw the same image.

Models CAN be re-used or retrained, but you have to change the architecture and not just the data. For example, CLIP was created by OpenAI as a paired language and image feature extractor. The idea is that you could have it scan a picture of a dog or some text of "dog" and give the same feature list -- Useful for things like searching for similar images or other tasks. This trained model was inserted into Stable Diffusion's image generation tools as the "text" part of the model, allowing you to create images you can describe. It wasn't able to just be trained with new data saying "give me an image now", it had to be engineered and designed into a new solution.

Back to the car analogy above, it'd be like taking the engine out of the car and putting it into a new shell that has wings and propellers. In that situation the car CAN fly, but it's not just because you gave it airplane fuel, it's because you turned that car into an airplane.

No amount of training a FaceSwap model will make it better able to swap other faces without training on those faces. That's just not how it works. FaceSwap is a car, not an airplane.

Ok, sorry for being little rude here.

But I never said Pre trained models need no training on new faces for face swapping.
I am just saying there are no identity bleeding when using Dfaker, RealFace & Disney pre trained models
& it will take less iterations to perform a good swap on a good HQ data set when compared to a model made from scratch.

I am not doing a single image over & over again.
I am pre training it on the great Nvidia FFHQ data set & I have already done this level of perfection on
more than 1000 different images of different individuals.

And since, you have explained me so nicely that why faceswap cannot become an "Instant Swapper", I agree with you on that
now theoretically but if tomorrow I have access to such a hardware where 100 billion iterations @ 256 batch can be performed,
just for my very strong curiosity, I will want to see what that model can do.

But I have seen the video of a Real Time face swapping by an AI model.

!	Message from: bryanlyon
	MOD EDIT: we do not discuss NSFW here.

This particular model was 1st trained on the 5k images from Nvidia FFHQ data set.
After that I did my own 3 small less than 1 minute videos of different people.

I had no made them for myself, but when I got challenged, I made them for my friends.

Even now with this current model, I am preparing 3 different videos of 3 different celebrities
to be posted here just to show that they are no identity leaks.

Post by **bryanlyon** » Sat Nov 12, 2022 11:48 am

Identity leakage is a subtle thing. It is easier to see in some contexts over others but it's there when you re-use a decoder.

If you're training with FFHQ or similar dataset, the recommended method is to train a model with those, then copy the encoder over using the "Load Weights" when creating a new model. When paired with a frozen weights to let the decoders catch up we have found that does exactly what you're trying to do -- greatly speed up training a model. That's what that feature is for, re-using known encoders to speed up the training process.

Real-time face swapping is possible (even with FaceSwap trained models). In fact, there are solutions (called 0-shot) that can swap a face it's never seen before (for models like this I suggest looking at First Order Model). These are great for their uses, but aren't the same as FaceSwap and have different goals. FaceSwap focuses on the absolutely best image quality video-focused swap possible with consumer hardware. Most of those 0-shot models are significantly lower quality but can swap instantly.

You can even do things like Stable Diffusion which can generate specific faces in different contexts: like "Elon Musk in heavy makeup". Again, that's not a swap, but a new face, but can look even better than FS's best attempts if it's a context that the model wasn't trained on. The goal is different and the engineering that lead to the result is different.

AI is not so much a "tool" as a "material". You can't take one AI and just have it do anything. Like a piece of metal can make a hammer, wrench, screwdriver, or even a crane, AI must be designed into the situation that it is to be used. Even for models that get used in a different context, it takes careful consideration of how to make that work.

Again, this is why we have the "load weights" and "freeze weights" options in FaceSwap (and why Phaze-A has pretrained encoders available in the settings). We have built our tools carefully for the purpose we have. Are they perfect? No. Can they magically take on new uses without careful design? No. We welcome experimentation such as trying different tweaks to the presets and even have talked about pretraining in depth in the past. We just want to make it clear that there is no "magic bullet" that will solve all training problems or suddenly make FaceSwap an "instant swapper" or give you "Perfect 0.0 Loss".

Just trying to set expectations and explain limitations of the architecture.

MaxHunter · Post by **MaxHunter** » Sat Nov 12, 2022 6:18 pm

phew I love these types of discussions!! This is how things progress, through lively debates and discussions. You guys makes some great points, and I am too new at this to insert myself into this argument. I will say that one can not argue against evidence. Evidence is evidence, and if vich has evidence other than what is presented I'm sure everyone will be happy to see it - especially me. Feel free to present your model (if you're not allowed to post the link publicly message me. I'd love to mess around with it privately .)

@vichitra5587

I tried raising the learning rate and it appears to destroy the model and progress. I'm assuming that is because it is now broadening the learning landscape and "resetting." But again I'm going to let it run for fun as an experiment. Either way it turns out it's going to teach me a good lesson!

I think the last time I did something like this I destroyed the model and the loss instead of going down went the opposite way - by a lot!! Let's see what happens, the worst it can do is destroy the model, and I have to re-start at 1.11 million.

Post by **bryanlyon** » Sat Nov 12, 2022 6:31 pm

MaxHunter wrote: ↑Sat Nov 12, 2022 6:18 pm
I tried raising the learning rate and it appears to destroy the model and progress. I'm assuming that is because it is now broadening the learning landscape and "resetting." But again I'm going to let it run for fun as an experiment. Either way it turns out it's going to teach me a good lesson!

I think the last time I did something like this I destroyed the model and the loss instead of going down went the opposite way - by a lot!! Let's see what happens, the worst it can do is destroy the model, and I have to re-start at 1.11 million.

Learning rate isn't a magical fix. You can increase the learning rate and that may learn faster, but it comes at the cost of overshooting the best results and may get into an oscillating failure state like I showed in viewtopic.php?p=7751#p7751 where the results actually get worse. Many people suggest lowering the learning rate during fit training to get the fine details.

Science only requires one negative proof to throw out previous hypotheses, but I will say that we've tried many of the ideas that vichitra5587 has posted about in the past and run into problems. That's not to say that he doesn't have any new insights and if he did find something truly unique and powerful I'll be first in line to use it, but right now, I see the same problems I've run into before.

MaxHunter · Post by **MaxHunter** » Sun Nov 13, 2022 5:49 am

There are two things that have been helpful to me while learning the ins and outs of this:.

1) The bridge oscillating example, you just referenced

And...

2) Torz, having the patience (and kindness) to break down and explain the learning formula and epsilon (What it means, and how it works, etc.) to me. This saved me countless hours .

The learning rate thing is still infuriating because I keep getting NaN detections. I know I've been using complex models that use a lot of VRAM, and I can only get a batch of 1 (customizing DNY512, using VGG-16 as a supplement in weights, etc.) Which I know I'm supposed to turn down the learning rate when this happens. My one semi-off topic question though (while I have you attention)... Moving the slider to the left is turning it down, right? (I'm not accidentally doing the opposite, right?)

Also...because I just checked my model. 8e-5 didn't work for me. It was too high, but thanks vich for the suggestion. I think, if my model wasn't 512, and I was starting a model from scratch this could have worked. But it was fun to experiment. I might try 8e in the future though.

MaxHunter · Post by **MaxHunter** » Sun Nov 13, 2022 6:38 am

Okay, three take-aways from this thread.

1) What the Dunning–Kruger effect is, and how it explains three quarters of my life.

2) Substantially raising rates in the middle of training causes more trouble than what it's worth.

3) Pre-Training as awesome as it looks initially, does create subtle identity bleed when faces turn in videos.

Applause applause
I'm here all week. Please tip your servers and bartenders. Thank you, and good-night.

kiarasur · Post by **kiarasur** » Wed Dec 28, 2022 10:04 am

your ideas sound a lot better than they actually work.

Faceswap Forum

Experiments in pretraining

Experiments in pretraining

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Perfect 0.0 Loss value achieved !

Re: Experiments in pretraining

Re: Experiments in pretraining

Re: Experiments in pretraining

Re: Experiments in pretraining

Re: Experiments in pretraining