dfl-H128 taking forver to train

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
p96xl
Posts: 12
Joined: Wed Sep 18, 2019 11:45 pm
Answers: 1
Has thanked: 4 times

dfl-H128 taking forver to train

Post by p96xl »

To get 100 iterations, it would take 6:45 on my Vega FE. To get 100000 iterations with batch size of 32, it would take almost 5 days. None of the other training methods do this like the original. For original, 100000 iters it would take 16.666 hours with the same setup of batch size of 32. Is H128 supposed to take this long? or is there a way to speed it up?

by torzdf » Thu Sep 26, 2019 11:49 pm

The following is a simplified explanation. There is more to it than this (the structure of the model etc), but to keep it in simple terms, I will talk purely in terms of resolution.

H128 has double the resolution. of the original model. You would think that this would mean it would take twice as long to train? This would be incorrect. A 64px image has 4,096 pixels. A 128px image has 16,384 pixels. This is 4 times as many.

So, given everything else being identical, you would expect 66.64 hours for a double resolution model (or 2.7 days).

However, everything isn't identical, and the model has to be scaled appropriately, which adds more time to the training.

There is no quick fix for this. Bigger models take longer. Ultimately, an Nvidia card will train faster than an AMD card, but beyond that, it's about patience.

Go to full post
User avatar
torzdf
Posts: 2671
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: dfl-H128 taking forver to train

Post by torzdf »

The following is a simplified explanation. There is more to it than this (the structure of the model etc), but to keep it in simple terms, I will talk purely in terms of resolution.

H128 has double the resolution. of the original model. You would think that this would mean it would take twice as long to train? This would be incorrect. A 64px image has 4,096 pixels. A 128px image has 16,384 pixels. This is 4 times as many.

So, given everything else being identical, you would expect 66.64 hours for a double resolution model (or 2.7 days).

However, everything isn't identical, and the model has to be scaled appropriately, which adds more time to the training.

There is no quick fix for this. Bigger models take longer. Ultimately, an Nvidia card will train faster than an AMD card, but beyond that, it's about patience.

My word is final

User avatar
p96xl
Posts: 12
Joined: Wed Sep 18, 2019 11:45 pm
Answers: 1
Has thanked: 4 times

Re: dfl-H128 taking forver to train

Post by p96xl »

torzdf wrote: Thu Sep 26, 2019 11:49 pm

The following is a simplified explanation. There is more to it than this (the structure of the model etc), but to keep it in simple terms, I will talk purely in terms of resolution.

H128 has double the resolution. of the original model. You would think that this would mean it would take twice as long to train? This would be incorrect. A 64px image has 4,096 pixels. A 128px image has 16,384 pixels. This is 4 times as many.

So, given everything else being identical, you would expect 66.64 hours for a double resolution model (or 2.7 days).

However, everything isn't identical, and the model has to be scaled appropriately, which adds more time to the training.

There is no quick fix for this. Bigger models take longer. Ultimately, an Nvidia card will train faster than an AMD card, but beyond that, it's about patience.

There isnt any optimizations for AMD gpus?

User avatar
torzdf
Posts: 2671
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: dfl-H128 taking forver to train

Post by torzdf »

There are no quick fixes. Machine Learning takes a long time.

Generally, you will see that your GPU is being utilized close to 100% (Certainly true for Nvidia, I couldn't say for AMD). There is nothing else to be squeezed out of it.

The only optimizations that exist are to squeeze bigger models into the same VRAM, however, these slow down training, not speed it up.

My word is final

Locked