dfl-H128 taking forver to train

p96xl · Post by **p96xl** » Thu Sep 26, 2019 11:38 pm

To get 100 iterations, it would take 6:45 on my Vega FE. To get 100000 iterations with batch size of 32, it would take almost 5 days. None of the other training methods do this like the original. For original, 100000 iters it would take 16.666 hours with the same setup of batch size of 32. Is H128 supposed to take this long? or is there a way to speed it up?

Post by **torzdf** » Thu Sep 26, 2019 11:49 pm

The following is a simplified explanation. There is more to it than this (the structure of the model etc), but to keep it in simple terms, I will talk purely in terms of resolution.

H128 has double the resolution. of the original model. You would think that this would mean it would take twice as long to train? This would be incorrect. A 64px image has 4,096 pixels. A 128px image has 16,384 pixels. This is 4 times as many.

So, given everything else being identical, you would expect 66.64 hours for a double resolution model (or 2.7 days).

However, everything isn't identical, and the model has to be scaled appropriately, which adds more time to the training.

There is no quick fix for this. Bigger models take longer. Ultimately, an Nvidia card will train faster than an AMD card, but beyond that, it's about patience.

p96xl · Post by **p96xl** » Fri Sep 27, 2019 12:20 am

torzdf wrote: ↑Thu Sep 26, 2019 11:49 pm
The following is a simplified explanation. There is more to it than this (the structure of the model etc), but to keep it in simple terms, I will talk purely in terms of resolution.

H128 has double the resolution. of the original model. You would think that this would mean it would take twice as long to train? This would be incorrect. A 64px image has 4,096 pixels. A 128px image has 16,384 pixels. This is 4 times as many.

So, given everything else being identical, you would expect 66.64 hours for a double resolution model (or 2.7 days).

However, everything isn't identical, and the model has to be scaled appropriately, which adds more time to the training.

There is no quick fix for this. Bigger models take longer. Ultimately, an Nvidia card will train faster than an AMD card, but beyond that, it's about patience.

There isnt any optimizations for AMD gpus?

Post by **torzdf** » Fri Sep 27, 2019 1:30 am

There are no quick fixes. Machine Learning takes a long time.

Generally, you will see that your GPU is being utilized close to 100% (Certainly true for Nvidia, I couldn't say for AMD). There is nothing else to be squeezed out of it.

The only optimizations that exist are to squeeze bigger models into the same VRAM, however, these slow down training, not speed it up.

Faceswap Forum

dfl-H128 taking forver to train

dfl-H128 taking forver to train

Re: dfl-H128 taking forver to train

Re: dfl-H128 taking forver to train

Re: dfl-H128 taking forver to train