GPU Training Speed

Talk about Hardware used for Deep Learning
Post Reply
User avatar
tochan
Posts: 18
Joined: Sun Sep 22, 2019 8:17 am
Been thanked: 4 times

GPU Training Speed

Post by tochan »

Hi to all,
last week my "old" GPU died... more precisely the pump on my EVGA 1080 Hybrid.
For the warrenty time, i bought a 2080 RTX MSI Aero (cooling blower design).

now, my idea is a comparison list (same trainer is important) with some system Info to help other for there GPU "upgrade" plans (new or second).

So, here are the resoults of these tow cards.

My Sys info:
CPU AMD 1800x, 64GB RAM, SSD storage, Windows 10,
Trainer Dfl-H128
Batchsize 64
Warp to Landmarks
Optimizer Savings
Save interval 100

GPU:
1080 GTX EVGA Hybrid (not overclocked) 8GB. Iterations 17.9 (Faceswap software 09/10/19 18.2)
2080 RTX MSI Aero (not overclocked) 8GB. Iterations 25.1 (Faceswap software 09/22/19 25.1)

Hope you like the idea and show some infos from youre mashine...

User avatar
tochan
Posts: 18
Joined: Sun Sep 22, 2019 8:17 am
Been thanked: 4 times

Re: Hardware best practices

Post by tochan »

For Traing speed Information (Trainer Dfl-128):
1080 GTX EVGA Hybrid (AIO Whatercooler) 18,2 (12k Iterations)
2080 RTX MSI Aero (Blower Cooler)25,2 (15k iterations)

User avatar
torzdf
Posts: 656
Joined: Fri Jul 12, 2019 12:53 am
Answers: 96
Has thanked: 17 times
Been thanked: 132 times

Re: GPU Training Speed

Post by torzdf »

I have been wanting something like this for a while, so thanks for making a start!

Ideally we could have a google sheet, which someone could maintain.

I'll see if I can pull out some stats to add.
My word is final

User avatar
kilroythethird
Posts: 20
Joined: Fri Jul 12, 2019 11:35 pm
Answers: 2
Has thanked: 2 times
Been thanked: 9 times

Re: GPU Training Speed

Post by kilroythethird »

Considering how fast faceswap develops we maybe should use a fixed version for this (tag on GH or at some fixed commit) ?

A c+p able snippet to checkout at a given commit, download a test faceset, run for a fixed iteration count at a given BS (?), revert to current master should do.
Someone could even write some simple batch/sh/py(?) benchmark script.
that amd guy

User avatar
torzdf
Posts: 656
Joined: Fri Jul 12, 2019 12:53 am
Answers: 96
Has thanked: 17 times
Been thanked: 132 times

Re: GPU Training Speed

Post by torzdf »

I thought of that, but I figured that most people are not going to want to rollback to train a model just to post stats.

Ultimately, I expect these numbers to bump a bit with the forthcoming augmentation optimizations, but I suspect (and hope) for it to settle down for a while after that.
My word is final

User avatar
AndrewB
Posts: 8
Joined: Tue Nov 12, 2019 10:16 am
Has thanked: 1 time

Re: GPU Training Speed

Post by AndrewB »

Where can I see these stats? I only see Eg/s on the Analysis tab and it depends on a batch size. About 10-15 Eg/s for bs=8 (RTX 2070 Super).

User avatar
tochan
Posts: 18
Joined: Sun Sep 22, 2019 8:17 am
Been thanked: 4 times

Re: GPU Training Speed

Post by tochan »

little update after my 1080 return form the repair.

At the moment, i train a Dlight model with this 2 cards (now it works)

general
Dlight model,
allow Growth "on"
Oprtimizer Savings.
Batch Size 39
Gpus 2

Train Plugin Global
Coverage "87.5"
Masktyp "compents"
Subpixel Upscaling "on"
Loss Funktion "mae"
Penalized Mask Loss "on"

Dlight
Features "Best"
Details "Good"
Output "256"

Results:
17.4 EGs/sec for batchsize 39.... 40 Crash ;)

User avatar
tochan
Posts: 18
Joined: Sun Sep 22, 2019 8:17 am
Been thanked: 4 times

Re: GPU Training Speed

Post by tochan »

Hi again,

little info form my side with some "new" parts in my System.

Switch the AMD 1800x to 3800x

Change the GPU from 2080+1080 to SLI 2080. Training works fine with SLI option is on in the OS with NVlink birge but there is no benefit for the training speed with ore without NVlink.

Here is a Traingspeed History from a Dfl-H128 model.

18.5 EGs/sec =1800x +1080 GTX Batchsize 64
25.2 EGs/sec = 1800x + 2080 RTX Batchsize 64
31.4 EGs/sec =1800x + 2080 RTX + 1080 GTX Batchsize 112
39.9 EGs/sec= 3800x + 2x2080 RTX Batchsize 117

The 2 2080 are ceeper than one 2080 TI. Info for power consume: 300-670Watt peak (Status Monitor form the RM750i)

User avatar
Linhaohoward
Posts: 21
Joined: Sat Dec 21, 2019 1:23 pm
Has thanked: 3 times

Re: GPU Training Speed

Post by Linhaohoward »

My Sys info:
CPU AMD 3900x, 32GB 3200 RAM, m.2 SSD storage, Windows 10
No Overclocking done
Trainer Dfl-SAE 160Input Size, Df achitecture 512 autoencoder, 45 encoder dims, 25 decoder dims, Vgg-obstructed,
Batchsize 20
Optimizer Savings
Save interval 1000

GPU:
MultiGPUs 2x Zotac RTX 2070 Super NVlink

EG/s 7.9-8.0 @ Batchsize 20, 7.0 @ Batchsize 22, OOM Crash @ Batchsize 24
Didn't bother to try 23

Not sure if my info helps at all since it's a custom input size and encoder dims

User avatar
nnifj
Posts: 17
Joined: Sat Jan 18, 2020 6:32 pm
Has thanked: 3 times
Been thanked: 1 time

Re: GPU Training Speed

Post by nnifj »

Specs
-AMD Ryzen 5 2400G
-AMD Radeon RX 580
-8Gb DDR4-2666 SDRAM + another 8GB I added for VR gaming
nothing overclocked

Initial notes: Ive tried almost all the trainers except Dlight, which I've only had minimal use. I can technically run Villain , but the EG/S rate is so low on just about any setting that its totally not worth it -so technically I can't run it. I prefer Original because it seems to give me the most EG/s. Almost every single one of my projects involves somewhere between "1000-5000" replacing faces, and 500-2000 "to-be-replaced" faces. My coverage is 87.5%, but that stats below were more or less the same when it was at 75%. Save intervals anywhere between 100-200. Other than that, just about everything else is on default as the guide suggests. I'm going to include the times it was trained for, because I tend to get an extra 10% EG/s if I let it run for at least a couple hours as compared to just 20 minutes.

Training InfoTrainer used: Original
10.7 EG/s -batchsize 10, ran for 11 hours
24.7 EG/s -batchsize of 30, ran for 2 hours
26 EG/s -batchsize of 46, ran for 5 hours
26.5 EG/s -batchsize of 62, ran for 5 hours
29.5 EG/s -batchsize of 64, ran for 7 hours
20.2 EG/s -batchsize of 66, ran for 47 minutes.
19.2 EG/s -batchsize of 128, ran for 30 minutes

Final thoughts 64 batchsize seems to be the magic number for me, I've noticed a massive dropoff in EG/s just going 2 above it or below. I also can't seem to get higher than 30 eg/s. The one time I managed to do this was on a project that had like 200 inbut B faces, and 100 input A faces, just to see how fast it could go. But yeah, I'm pretty much stuck to 2.4 million EG a day.

Post Reply