GPU Training Speed

tochan · Post by **tochan** » Sun Sep 22, 2019 8:41 am

Hi to all,
last week my "old" GPU died... more precisely the pump on my EVGA 1080 Hybrid.
For the warrenty time, i bought a 2080 RTX MSI Aero (cooling blower design).

now, my idea is a comparison list (same trainer is important) with some system Info to help other for there GPU "upgrade" plans (new or second).

So, here are the resoults of these tow cards.

My Sys info:
CPU AMD 1800x, 64GB RAM, SSD storage, Windows 10,
Trainer Dfl-H128
Batchsize 64
Warp to Landmarks
Optimizer Savings
Save interval 100

GPU:
1080 GTX EVGA Hybrid (not overclocked) 8GB. Iterations 17.9 (Faceswap software 09/10/19 18.2)
2080 RTX MSI Aero (not overclocked) 8GB. Iterations 25.1 (Faceswap software 09/22/19 25.1)

Hope you like the idea and show some infos from youre mashine...

tochan · Post by **tochan** » Sun Sep 22, 2019 11:23 am

For Traing speed Information (Trainer Dfl-128):
1080 GTX EVGA Hybrid (AIO Whatercooler) 18,2 (12k Iterations)
2080 RTX MSI Aero (Blower Cooler)25,2 (15k iterations)

Post by **torzdf** » Sun Sep 22, 2019 12:32 pm

I have been wanting something like this for a while, so thanks for making a start!

Ideally we could have a google sheet, which someone could maintain.

I'll see if I can pull out some stats to add.

kilroythethird · Post by **kilroythethird** » Sun Sep 22, 2019 2:41 pm

Considering how fast faceswap develops we maybe should use a fixed version for this (tag on GH or at some fixed commit) ?

A c+p able snippet to checkout at a given commit, download a test faceset, run for a fixed iteration count at a given BS (?), revert to current master should do.
Someone could even write some simple batch/sh/py(?) benchmark script.

Post by **torzdf** » Sun Sep 22, 2019 2:43 pm

I thought of that, but I figured that most people are not going to want to rollback to train a model just to post stats.

Ultimately, I expect these numbers to bump a bit with the forthcoming augmentation optimizations, but I suspect (and hope) for it to settle down for a while after that.

AndrewB · Post by **AndrewB** » Wed Nov 13, 2019 2:17 pm

Where can I see these stats? I only see Eg/s on the Analysis tab and it depends on a batch size. About 10-15 Eg/s for bs=8 (RTX 2070 Super).

tochan · Post by **tochan** » Sun Nov 17, 2019 8:54 pm

little update after my 1080 return form the repair.

At the moment, i train a Dlight model with this 2 cards (now it works)

general
Dlight model,
allow Growth "on"
Oprtimizer Savings.
Batch Size 39
Gpus 2

Train Plugin Global
Coverage "87.5"
Masktyp "compents"
Subpixel Upscaling "on"
Loss Funktion "mae"
Penalized Mask Loss "on"

Dlight
Features "Best"
Details "Good"
Output "256"

Results:
17.4 EGs/sec for batchsize 39.... 40 Crash

tochan · Post by **tochan** » Fri Nov 22, 2019 8:50 am

Hi again,

little info form my side with some "new" parts in my System.

Switch the AMD 1800x to 3800x

Change the GPU from 2080+1080 to SLI 2080. Training works fine with SLI option is on in the OS with NVlink birge but there is no benefit for the training speed with ore without NVlink.

Here is a Traingspeed History from a Dfl-H128 model.

18.5 EGs/sec =1800x +1080 GTX Batchsize 64
25.2 EGs/sec = 1800x + 2080 RTX Batchsize 64
31.4 EGs/sec =1800x + 2080 RTX + 1080 GTX Batchsize 112
39.9 EGs/sec= 3800x + 2x2080 RTX Batchsize 117

The 2 2080 are ceeper than one 2080 TI. Info for power consume: 300-670Watt peak (Status Monitor form the RM750i)

Linhaohoward · Post by **Linhaohoward** » Sat Feb 01, 2020 1:55 pm

My Sys info:
CPU AMD 3900x, 32GB 3200 RAM, m.2 SSD storage, Windows 10
No Overclocking done
Trainer Dfl-SAE 160Input Size, Df achitecture 512 autoencoder, 45 encoder dims, 25 decoder dims, Vgg-obstructed,
Batchsize 20
Optimizer Savings
Save interval 1000

GPU:
MultiGPUs 2x Zotac RTX 2070 Super NVlink

EG/s 7.9-8.0 @ Batchsize 20, 7.0 @ Batchsize 22, OOM Crash @ Batchsize 24
Didn't bother to try 23

Not sure if my info helps at all since it's a custom input size and encoder dims

cosmico · Post by **cosmico** » Sat Feb 01, 2020 5:14 pm

Specs
-AMD Ryzen 5 2400G
-AMD Radeon RX 580
-8Gb DDR4-2666 SDRAM + another 8GB I added for VR gaming
nothing overclocked

Initial notes: Ive tried almost all the trainers except Dlight, which I've only had minimal use. I can technically run Villain , but the EG/S rate is so low on just about any setting that its totally not worth it -so technically I can't run it. I prefer Original because it seems to give me the most EG/s. Almost every single one of my projects involves somewhere between "1000-5000" replacing faces, and 500-2000 "to-be-replaced" faces. My coverage is 87.5%, but that stats below were more or less the same when it was at 75%. Save intervals anywhere between 100-200. Other than that, just about everything else is on default as the guide suggests. I'm going to include the times it was trained for, because I tend to get an extra 10% EG/s if I let it run for at least a couple hours as compared to just 20 minutes.

Training InfoTrainer used: Original
10.7 EG/s -batchsize 10, ran for 11 hours
24.7 EG/s -batchsize of 30, ran for 2 hours
26 EG/s -batchsize of 46, ran for 5 hours
26.5 EG/s -batchsize of 62, ran for 5 hours
29.5 EG/s -batchsize of 64, ran for 7 hours
20.2 EG/s -batchsize of 66, ran for 47 minutes.
19.2 EG/s -batchsize of 128, ran for 30 minutes

Final thoughts 64 batchsize seems to be the magic number for me, I've noticed a massive dropoff in EG/s just going 2 above it or below. I also can't seem to get higher than 30 eg/s. The one time I managed to do this was on a project that had like 200 inbut B faces, and 100 input A faces, just to see how fast it could go. But yeah, I'm pretty much stuck to 2.4 million EG a day.

paruru715 · Post by **paruru715** » Sun Sep 20, 2020 6:05 am

tochan wrote: ↑Sun Sep 22, 2019 11:23 am
For Traing speed Information (Trainer Dfl-128):
1080 GTX EVGA Hybrid (AIO Whatercooler) 18,2 (12k Iterations)
2080 RTX MSI Aero (Blower Cooler)25,2 (15k iterations)

Can I know in what units are these in? 12k iterations in an hour?

I am using a GTX 1060 3GB to train on DFL-128 Lowmem and got around 93k iterations after 15 hours.
Looking to upgrade my GPU to further my exploration and wondering which GPU is going to give me the best value since the RTX 30 series is coming out. Hopefully I could get away with RTX 3070 if RTX 3080 if the training rate is not far from the 3070.

paruru715 · Post by **paruru715** » Fri Sep 24, 2021 7:03 am

An update:

I got my RTX3070 last year but had to wait until now so the auto-installer works on the RTX 30 series cards. Could never get the unofficial manual way of installing to ever work.
Also to note that I upgraded my CPU, motherboard and RAM as well.

Anyway, here is the speed comparison coming from GTX 1060 3GB to my new RTX 3070 8GB.
Same algorithm, DFL128.

GTX 1060 3GB - 93k iterations after 15 hours with 54.4 EGs/sec
RTX 3070 8GB - 100k iterations after 7 hours with 83.8 EGs/sec

So, roughly almost 2.5x faster.

Faceswap Forum

GPU Training Speed

GPU Training Speed

Re: Hardware best practices

Re: GPU Training Speed

Re: GPU Training Speed

Re: GPU Training Speed

Re: GPU Training Speed

Re: GPU Training Speed

Re: GPU Training Speed

Re: GPU Training Speed

Re: GPU Training Speed

Re: Hardware best practices

Re: GPU Training Speed