Page 1 of 1

GPU Training Speed

Posted: Sun Sep 22, 2019 8:41 am
by tochan
Hi to all,
last week my "old" GPU died... more precisely the pump on my EVGA 1080 Hybrid.
For the warrenty time, i bought a 2080 RTX MSI Aero (cooling blower design).

now, my idea is a comparison list (same trainer is important) with some system Info to help other for there GPU "upgrade" plans (new or second).

So, here are the resoults of these tow cards.

My Sys info:
CPU AMD 1800x, 64GB RAM, SSD storage, Windows 10,
Trainer Dfl-H128
Batchsize 64
Warp to Landmarks
Optimizer Savings
Save interval 100

GPU:
1080 GTX EVGA Hybrid (not overclocked) 8GB. Iterations 17.9 (Faceswap software 09/10/19 18.2)
2080 RTX MSI Aero (not overclocked) 8GB. Iterations 25.1 (Faceswap software 09/22/19 25.1)

Hope you like the idea and show some infos from youre mashine...

Re: Hardware best practices

Posted: Sun Sep 22, 2019 11:23 am
by tochan
For Traing speed Information (Trainer Dfl-128):
1080 GTX EVGA Hybrid (AIO Whatercooler) 18,2 (12k Iterations)
2080 RTX MSI Aero (Blower Cooler)25,2 (15k iterations)

Re: GPU Training Speed

Posted: Sun Sep 22, 2019 12:32 pm
by torzdf
I have been wanting something like this for a while, so thanks for making a start!

Ideally we could have a google sheet, which someone could maintain.

I'll see if I can pull out some stats to add.

Re: GPU Training Speed

Posted: Sun Sep 22, 2019 2:41 pm
by kilroythethird
Considering how fast faceswap develops we maybe should use a fixed version for this (tag on GH or at some fixed commit) ?

A c+p able snippet to checkout at a given commit, download a test faceset, run for a fixed iteration count at a given BS (?), revert to current master should do.
Someone could even write some simple batch/sh/py(?) benchmark script.

Re: GPU Training Speed

Posted: Sun Sep 22, 2019 2:43 pm
by torzdf
I thought of that, but I figured that most people are not going to want to rollback to train a model just to post stats.

Ultimately, I expect these numbers to bump a bit with the forthcoming augmentation optimizations, but I suspect (and hope) for it to settle down for a while after that.

Re: GPU Training Speed

Posted: Wed Nov 13, 2019 2:17 pm
by AndrewB
Where can I see these stats? I only see Eg/s on the Analysis tab and it depends on a batch size. About 10-15 Eg/s for bs=8 (RTX 2070 Super).

Re: GPU Training Speed

Posted: Sun Nov 17, 2019 8:54 pm
by tochan
little update after my 1080 return form the repair.

At the moment, i train a Dlight model with this 2 cards (now it works)

general
Dlight model,
allow Growth "on"
Oprtimizer Savings.
Batch Size 39
Gpus 2

Train Plugin Global
Coverage "87.5"
Masktyp "compents"
Subpixel Upscaling "on"
Loss Funktion "mae"
Penalized Mask Loss "on"

Dlight
Features "Best"
Details "Good"
Output "256"

Results:
17.4 EGs/sec for batchsize 39.... 40 Crash ;)

Re: GPU Training Speed

Posted: Fri Nov 22, 2019 8:50 am
by tochan
Hi again,

little info form my side with some "new" parts in my System.

Switch the AMD 1800x to 3800x

Change the GPU from 2080+1080 to SLI 2080. Training works fine with SLI option is on in the OS with NVlink birge but there is no benefit for the training speed with ore without NVlink.

Here is a Traingspeed History from a Dfl-H128 model.

18.5 EGs/sec =1800x +1080 GTX Batchsize 64
25.2 EGs/sec = 1800x + 2080 RTX Batchsize 64
31.4 EGs/sec =1800x + 2080 RTX + 1080 GTX Batchsize 112
39.9 EGs/sec= 3800x + 2x2080 RTX Batchsize 117

The 2 2080 are ceeper than one 2080 TI. Info for power consume: 300-670Watt peak (Status Monitor form the RM750i)

Re: GPU Training Speed

Posted: Sat Feb 01, 2020 1:55 pm
by Linhaohoward
My Sys info:
CPU AMD 3900x, 32GB 3200 RAM, m.2 SSD storage, Windows 10
No Overclocking done
Trainer Dfl-SAE 160Input Size, Df achitecture 512 autoencoder, 45 encoder dims, 25 decoder dims, Vgg-obstructed,
Batchsize 20
Optimizer Savings
Save interval 1000

GPU:
MultiGPUs 2x Zotac RTX 2070 Super NVlink

EG/s 7.9-8.0 @ Batchsize 20, 7.0 @ Batchsize 22, OOM Crash @ Batchsize 24
Didn't bother to try 23

Not sure if my info helps at all since it's a custom input size and encoder dims

Re: GPU Training Speed

Posted: Sat Feb 01, 2020 5:14 pm
by nnifj
Specs
-AMD Ryzen 5 2400G
-AMD Radeon RX 580
-8Gb DDR4-2666 SDRAM + another 8GB I added for VR gaming
nothing overclocked

Initial notes: Ive tried almost all the trainers except Dlight, which I've only had minimal use. I can technically run Villain , but the EG/S rate is so low on just about any setting that its totally not worth it -so technically I can't run it. I prefer Original because it seems to give me the most EG/s. Almost every single one of my projects involves somewhere between "1000-5000" replacing faces, and 500-2000 "to-be-replaced" faces. My coverage is 87.5%, but that stats below were more or less the same when it was at 75%. Save intervals anywhere between 100-200. Other than that, just about everything else is on default as the guide suggests. I'm going to include the times it was trained for, because I tend to get an extra 10% EG/s if I let it run for at least a couple hours as compared to just 20 minutes.

Training InfoTrainer used: Original
10.7 EG/s -batchsize 10, ran for 11 hours
24.7 EG/s -batchsize of 30, ran for 2 hours
26 EG/s -batchsize of 46, ran for 5 hours
26.5 EG/s -batchsize of 62, ran for 5 hours
29.5 EG/s -batchsize of 64, ran for 7 hours
20.2 EG/s -batchsize of 66, ran for 47 minutes.
19.2 EG/s -batchsize of 128, ran for 30 minutes

Final thoughts 64 batchsize seems to be the magic number for me, I've noticed a massive dropoff in EG/s just going 2 above it or below. I also can't seem to get higher than 30 eg/s. The one time I managed to do this was on a project that had like 200 inbut B faces, and 100 input A faces, just to see how fast it could go. But yeah, I'm pretty much stuck to 2.4 million EG a day.