Hardware Performance Data Request.

Talk about Hardware used for Deep Learning


Locked
User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Hardware Performance Data Request.

Post by abigflea »

Thought we could pool a bit of information about what hardware is being used and its relative performance.
To do so, it would be easiest for you users to run this little benchmark routine.

This is how it will work.

  1. Update Faceswap . Help menu --> Update Facewap, restart FS if needed.

  2. Download this tiny set of random faces as your data set.

  3. Extract it to your C:\ drive.

  4. Start up Faceswap. There is a FSW file included you can load if you wish.

  5. Set up a training session like this.

Use the same faces/alignments for Input A and Input B

Screenshot 2021-03-13 080704.png
Screenshot 2021-03-13 080704.png (27.77 KiB) Viewed 19011 times

Its a small set, already aligned. Nothing else needed.

  1. Pick a model like original, DFL-SAE, Dfaker. Make sure the model settings are set to default.
    and run it at Batch Size = 2
    Let it run 1200-2000 iterations or so
    Then record the EGs/s

  2. Using that same model, find the highest batch that will run w/o crashing or giving OOM (out of memory).
    Let that run for 1200-2000 iterations .
    Then record the EGs/s , and Batch size

  3. Then post below your results like I have.

  4. Delete the contents of C:\nonsense\NonsenseModel\

10: Now you can try it with a different model.

If I get enough results I'll make a nice spreadsheet for everyone to see with the data.
Thank you for helping with my little endeavor.

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: Hardware Performance Data Request.

Post by abigflea »

GPU: RTX 2070 8gb
Mainboard: Ryzen 2600 - X470 Chipset
OS: Windows 10

Model: Original
BS=2 EGs/sec = 43.1 (mixed precision off)
BS=2 EGs/sec = 52 (mixed precision on)
BS=140 EGs/sec = 208 (mixed precision on)

Model :Villain
BS=2 EGs/sec = 22.2 (Mixed precision on)
BS=13 EGs/sec = 46.9 (Mixed precision on)

Model :DFL-SAE
BS=2 EGs/sec = 24.2 (Mixed precision on)
BS=21 EGs/sec = 52.4 (Mixed precision on)above BS=21 a lot of slowdown and OOM.

Model :Dlight
BS=2 EGs/sec = 27.7 (Mixed precision on)
BS=29 EGs/sec = 71.0 (Mixed precision on)

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: Hardware Performance Data Request.

Post by abigflea »

GPU: RTX 3090 FE 24GB
Mainboard: Ryzen 2600 - X470 Chipset
OS: Windows 10

Model: Original
BS=2 EGs/sec = 62.3 (mixed precision on)
BS=256 EGs/sec = 199 (mixed precision on)

Model :Villain
BS=2 EGs/sec = 24.7 (Mixed precision on)
BS=20 EGs/sec = 95.7 (Mixed precision on)
BS=65 EGs/sec = 99.8 (Mixed precision on)

Model :DFL-SAE
BS=2 EGs/sec = 28.4 (Mixed precision on)
BS=78 EGs/sec = 109.1 (Mixed precision on)

Model :Dlight
BS=2 EGs/sec = 32.1 (Mixed precision on)
BS=100 EGs/sec = 120.5 (Mixed precision on)

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
deephomage
Posts: 33
Joined: Fri Jul 12, 2019 6:09 pm
Answers: 1
Has thanked: 2 times
Been thanked: 8 times

Re: Hardware Performance Data Request.

Post by deephomage »

GPU: RTX 2080ti x2 22 Gb. NVLink
Mainboard: Ryzen 3900 - X570 Chipset
OS: Windows 10

Model: Original
BS=2 EGs/sec = 77.8 (mixed precision on)
BS=256 EGs/sec = 307.2 (mixed precision on)

Model: Villain
BS=2 EGs/sec = 34.8 (Mixed precision on)
BS=20 EGs/sec = 71.4 (Mixed precision on)

Also tried BS 22 and 24 with improved EGs/sec., but Villain training became unstable. BS 50 = OOM error.

User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: Hardware Performance Data Request.

Post by abigflea »

GPU: RTX 3060 12GB
Mainboard: Ryzen 2600 - X470 Chipset
OS: Windows 10

Model: Original
BS=2 EGs/sec = 47.2 (mixed precision on)
BS=256 EGs/sec = 222.8 (mixed precision on)

Model :Villain
BS=2 EGs/sec =13.9 (Mixed precision off)
BS=2 EGs/sec =12.8 (Mixed precision on)
BS=14 EGs/sec =29.7 (Mixed precision off)
BS=21 EGs/sec = 44.3 (Mixed precision on)

Model :DFL-SAE
BS=2 EGs/sec = 22.1 (Mixed precision on)
BS=32 EGs/sec = 46.7 (Mixed precision on)

Model :Dlight
BS=2 EGs/sec = 23.3 (Mixed precision on)
BS=51 EGs/sec = 56.3 (Mixed precision off)
BS=51 EGs/sec = 59.2 (Mixed precision on)
BS=56 EGs/sec = 57.7 (Mixed precision on)

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

User avatar
Col58
Posts: 1
Joined: Sun Mar 07, 2021 8:27 pm
Has thanked: 3 times
Been thanked: 1 time

Re: Hardware Performance Data Request.

Post by Col58 »

GPU: GTX 1070 Ti 8GB
Mainboard: i3-8100 / Z370 Chipset / 32GB RAM
OS: Arch Linux
Python: 3.9.2 / Tensorflow: 2.4.0 / CUDA: 11.2 / CUDNN: 8.1
Mixed Precision: Off
Model and Images are loaded from tmpfs

Model: Original
BS=2 EGs/sec = 29.6
BS=64 EGs/sec = 198.1
BS=140 EGs/sec = 200.6

My CPU was reaching 100% and bottlenecking so I tweaked the code to cache the parsed face images and tried again:
BS=2 EGs/sec = 29.6
BS=64 EGs/sec = 259.1
BS=140 EGs/sec = 256.0

For the heavier models the cache tweak performance boost is negligible
Model: DFaker (128px out)
BS=2 EGs/sec = 19.8
BS=64 EGs/sec = 109.8
BS=64 EGs/sec = 115.9 (with image cache)

Model: DFaker (256px out)
BS=2 EGs/sec = 9.1
BS=10 EGs/sec = 20.5

Model: Villain
BS=2 EGs/sec = 13.2
BS=10 EGs/sec = 23.3

User avatar
sp13
Posts: 15
Joined: Sat Apr 10, 2021 12:20 am
Has thanked: 3 times
Been thanked: 4 times

Re: Hardware Performance Data Request.

Post by sp13 »

GPU: GTX 1060 6 GB
Mainboard: AMD FX8350 + 32 GB RAM
OS: Debian 10
NVIDIA 418.181.07 driver + Tensorflow 2.2

Original
BS = 2: EGs/sec = 20.5
BS = 140: EGs/sec = 132.3
BS = 256: EGs/ sec = 137.5
looks CPU limited at large batchsize

Villain
BS = 2: EGs/sec = 7.9
BS = 6: EGs/sec = 11.3

DFL H128
BS = 2: EGs/sec = 14.2
BS = 35: EGs/sec = 38.9
starts with BS up to 40 but OOM after a few minutes

DFL SAE
BS = 2: EGs/sec = 8.5
BS = 11: EGs/sec = 14.7

Edit: Added the two DFL entries

Last edited by sp13 on Sat Apr 10, 2021 12:26 am, edited 1 time in total.
User avatar
sp13
Posts: 15
Joined: Sat Apr 10, 2021 12:20 am
Has thanked: 3 times
Been thanked: 4 times

Re: Hardware Performance Data Request.

Post by sp13 »

Google Colab
There are different GPUs available, but you have no control over which one you get. No GUI, so I calculated EGs/sec manually as (iterations * batchsize * 2) / seconds. (The 2 because both the A and B sides get a batchsize of faces. This is the way the GUI calculates it now but I don't think it was always so.)

GPU: Tesla T4 (16 GB)
Mainboard: Intel Xeon CPU @ 2.20 GHZ, 2 cores provided
OS: Ubuntu 18.04
NVIDIA 460.32.03 driver & Tensorflow 2.4.1 (pre-installed version)
The T4 should support mixed precision but I simply didn't think of it so it is turned off

Original
BS = 2: EGs/sec = 30.3
BS = 140: EGs/sec = 56.9
BS = 256: EGs/sec = 60
It ran with a batchsize of 256 but iterated so slowly that I didn't let it run long. I then tried 140 since that was a number used in other posts. Looks like the CPU is really killing performance here.

Might run some other models in the future but I don't want to use up my usage limits on just benchmarks.

User avatar
googyi
Posts: 1
Joined: Fri Mar 25, 2022 8:34 pm

Re: Hardware Performance Data Request.

Post by googyi »

Not based on your training set
GPU: 3070 Ti
CPU: i9 9900K
MB: MSI Z370 SLI PLUS
OS: WIN 10

Villain
BS = 2: EGs/sec = 22.7 (Mixed precision)
BS = 11: EGs/sec = 53.9 (Mixed precision)
(unstable - BS = 12: EGs/sec = 55.8 (Mixed precision))

Last edited by googyi on Fri Mar 25, 2022 8:57 pm, edited 1 time in total.
Locked