"New" training method taking too much VRAM?

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
jedeitor
Posts: 7
Joined: Tue Mar 16, 2021 10:58 am
Has thanked: 1 time

"New" training method taking too much VRAM?

Post by jedeitor »

Hi, when I started with faceswap, it required 2 facesets (A & B), and the alignments file for A. Soon after some tries, that method was changed and now the alignment info is stored into the images and you no longer need the alignments file to train your models.

The problem is, I have a model I created with the first training method, and it allows me to use a batch size of 8, while the rest of my models, created with the new method, only allow me a batch size of 4. So now it takes twice the time to get the same results.

Is this normal? I am training all models with Dlight with the default options.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: "New" training method taking too much VRAM?

Post by torzdf »

No it's not normal. The migration of the alignment data into the image metadata has no bearing on the amount of VRAM a model takes.

My word is final

User avatar
jedeitor
Posts: 7
Joined: Tue Mar 16, 2021 10:58 am
Has thanked: 1 time

Re: "New" training method taking too much VRAM?

Post by jedeitor »

Weird. Could be the method of extraction what makes all my models different from the first one? I mean, can the extraction affect on how much VRAM will need the training? I don't remember the settings I used for extraction, but I am using the same settings for training all the models and that first one still allows me a batchsize of 8 while the others don't. :?:

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: "New" training method taking too much VRAM?

Post by bryanlyon »

No, that's not possible. What you did in extract cannot effect vram usage in any way. Only changes to the model itself such as input/output resolution, encoder/decoder depth, and batch size could change how much vram is being used.

User avatar
jedeitor
Posts: 7
Joined: Tue Mar 16, 2021 10:58 am
Has thanked: 1 time

Re: "New" training method taking too much VRAM?

Post by jedeitor »

Ok, so I'm doing some tests.

If I take the model that works with more iterations than the others and change just the name of the model dir, forcing it to generate a new one, the software runs into the same VRAM problem as the others.

So there has to be something different in that first model dir that allows twice the speed for the same task.

Maybe there was a training setting around march/may 2021 that isn't available now? That would explain it.

User avatar
jedeitor
Posts: 7
Joined: Tue Mar 16, 2021 10:58 am
Has thanked: 1 time

Re: "New" training method taking too much VRAM?

Post by jedeitor »

Would there be any way to get the software version that was available at march this year? Just for testing this issue.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: "New" training method taking too much VRAM?

Post by torzdf »

jedeitor wrote: Tue Aug 24, 2021 9:17 pm

Ok, so I'm doing some tests.

If I take the model that works with more iterations than the others and change just the name of the model dir, forcing it to generate a new one, the software runs into the same VRAM problem as the others.

Are you using Conv Aware Init. This is known to take up more vram on model initialization. Try starting a new model with very low batchsize for 1 iteration, then reloading the model, to see if that resolves things for you.

jedeitor wrote: Sat Aug 28, 2021 11:30 am

Would there be any way to get the software version that was available at march this year? Just for testing this issue.

Sure, you can just checkout an earlier commit (Google will tell you how to do this).

My word is final

User avatar
jedeitor
Posts: 7
Joined: Tue Mar 16, 2021 10:58 am
Has thanked: 1 time

Re: "New" training method taking too much VRAM?

Post by jedeitor »

torzdf wrote: Sun Aug 29, 2021 9:49 am

Are you using Conv Aware Init. This is known to take up more vram on model initialization. Try starting a new model with very low batchsize for 1 iteration, then reloading the model, to see if that resolves things for you.

Ok, just tried this. I hadn't Conv Aware checked, so that was not the problem. Anyway, I enabled it and started a new model with a batchsize of 1 and just 1 gpu, then stopped the training after initializing, enabled all gpus and changed the batchsize to 8. VRAM error again. It works if I lower the batch to 4, but it doesn't explain why my first model still works at 8 with same settings.

torzdf wrote: Sun Aug 29, 2021 9:49 am

Sure, you can just checkout an earlier commit (Google will tell you how to do this).

I definitely will try this. Being able to cut in half the processing time is worth the investigation. Thanks!

Locked