Some benchmarking numbers

Want to use Faceswap in The Cloud? This is not directly supported by the Devs, but you may find community support here


Forum rules

Read the FAQs and search the forum before posting a new topic.

NB: The Devs do not directly support using Cloud based services, but you can find community support here.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
Replicon
Posts: 50
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 2 times

Some benchmarking numbers

Post by Replicon »

I experimented with a variety of GPUs, and thought I'd share my numbers.

I basically just ran 10K iterations of "original" with batch size 16. I figured it's representative enough for entry level stuff, but I'm sure things change once you get into bigger/fancier models.

Anyway, at least for these settings, it kind of looks like performance differences between the GPUs are negligible compared to the cost differences of using them. Sure, a P100 performs better, but a T4 is so much cheaper that it overtakes the performance benefits.

Also, looks like having multiple GPUs doesn't really help at all... which isn't too surprising, since iterations are sequential, and likely aren't very distributable.

Anyway, just for funzies, here's the numbers.

I'd be really curious about other people's experiences with other configurations... and whether A100 is all that. :)

Attachments
Screenshot from 2021-04-16 21-00-44.png
Screenshot from 2021-04-16 21-00-44.png (48.59 KiB) Viewed 15818 times
User avatar
sp13
Posts: 15
Joined: Sat Apr 10, 2021 12:20 am
Has thanked: 3 times
Been thanked: 4 times

Re: Some benchmarking numbers

Post by sp13 »

Interesting. I was wondering how the P100 would perform compared to the T4.
So far my experiments have been with a T4 on n1-standard-2. It doesn't seem to be having CPU or system RAM limitations and could save like 10 cents an hour ;)

User avatar
sp13
Posts: 15
Joined: Sat Apr 10, 2021 12:20 am
Has thanked: 3 times
Been thanked: 4 times

Re: Some benchmarking numbers

Post by sp13 »

I'm curious about the poor V100 numbers you were seeing.

I haven't ran any specific benchmarks, but yesterday I was training a realface model on a n1-standard-2 + T4. I was getting around 78 EG/s (batchsize = 32, mixed precision enabled) with CPU utilization about 69%.

Today I wanted to redo the model with different coverage, but I couldn't get a T4. So I am currently running on a n1-standard-2 + V100, same settings (except coverage %), same data, and getting about 126 EG/s. CPU is at about 96% as the 2 little cores try to keep up.

UPDATE: I changed the V100 to a n1-standard-4 and got CPU usage down below 80% and training up to 195 EG/s. So the extra cores are worth it in this case.

The V100 performance increase is not worth the much greater cost but there is a performance increase.

(BTW, this model will train at 11 EG/s on my GTX 1060 with batchsize = 3)

User avatar
Replicon
Posts: 50
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 2 times

Re: Some benchmarking numbers

Post by Replicon »

Thanks for the tip, will have to try with a n1-standard-2. I use preemptible hardware, so savings would add up to 2 cents per hour, but it's free to change my script's config. :)

User avatar
sp13
Posts: 15
Joined: Sat Apr 10, 2021 12:20 am
Has thanked: 3 times
Been thanked: 4 times

Re: Some benchmarking numbers

Post by sp13 »

Yeah, I always use preempt when I can. The particular day that I ran the V100 I couldn't get a T4 even at non-preempt price. I haven't had a problem since.

I stopped my realface experiments because they didn't play nice with my little 6 GB GTX 1060. But villain with mixed precision is very nice on a T4. I get a training rate 2.5 times higher than my 1060 due to larger batch size and tensor cores. I will test it on a V100 if I end up with some extra credits.

I also did a little experiment just to see how bad conversion (not training) is on CPU only. Not surprisingly, it's bad. :D Using a T4 was 25 times faster than nd2-highcpu-8 for about twice the price.

Oh, also you if don't get an external IP address it will still let you tunnel in for SSH and you can save like 2 tenths of a cent per hour. ;)

User avatar
Replicon
Posts: 50
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 2 times

Re: Some benchmarking numbers

Post by Replicon »

Haha Money$$$$$

I should experiment with Villain, it seems like it's the state of the art right now. I wonder if you get better bang for your buck for running it overnight, vs with "original" as I've been doing. In the comparison thread, even lightweight is looking pretty good haha. Maybe I can stage a T4 race between the two. It's probably a more fair comparison to keep "time spent running" the same, rather than "number of iterations", since time is the real commodity.

User avatar
sp13
Posts: 15
Joined: Sat Apr 10, 2021 12:20 am
Has thanked: 3 times
Been thanked: 4 times

Re: Some benchmarking numbers

Post by sp13 »

I did have some spare credits so I've been playing with a V100 some more. Using something similar to Villain I get about 2.5 times the performance of the T4, which is exactly what my Realface experiments showed.

The V100 is a hungry beast. On an n1-standard-4, faceswap averaged about 333% CPU so not maxing out but some threads looked like they could use help. I'm currently trying a custom config with 6 cores and 12 GB RAM and get about an extra 5% performance.

Still not worth it at 5 to 6 times the cost of a T4 but it is fun.

Locked