Hardware Settings

Talk about Hardware used for Deep Learning


Locked
User avatar
MattB
Posts: 22
Joined: Fri Aug 19, 2022 4:54 pm
Been thanked: 5 times

Hardware Settings

Post by MattB »

I finally invested the time in putting together a workstation to train models. I'm struggling to get the hardware to perform though. Understanding it's a totally new install, so I suspect I'm missing something. I'll include the details below. My model has 12k and 14k extracted faces, and I'm running Phaze-A with the DNY 512 preset, mirrored distribution, with a batch size of 20. The CPU seldom passes 12% utilization and though the GPU RAM is maxed, the processors seldom exceed 45% utilization. I'm getting a whopping 2,500 iterations an hour. I know it's a non-trivial model, but I expected much better. On a much smaller box (i9-9900 with an RTX A4500) I get more almost twice that performance with a batch size of only 8 (VRAM limitations). Depending on the model Blender or 3D performance tools pound the CPU and both GPU's.

I'm not terribly familiar with nVidia's tools and if they can help. And I don't want to start installing a bunch of bloatware. Any of you smart folks have guidance?

AMD Ryzen Threadripper PRO 3955WX
Two GeForce RTX 3090 24GB GPU's
ASUS Pro WS WRX80E MB
128GB DDR4 3200MHz

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: Hardware Settings

Post by torzdf »

TBH, I find it hard to measure GPU performance using Windows system monitor, as it is geared towards 3D, whilst AI tends to use Cuda cores or Tensorcores (and I'm not sure if the latter is even reflected on the system monitor).

Is the 20 batch-size maxing out what your GPU can hold? I ask, because there is a certain amount of overhead associated with multiple GPU setups (syncing between the devices) which may impact the performance per device. Ultimately 20 BS on a single GPU is likely to be quicker than 20 BS on multiple GPUs, so the multiple GPU benefits come when you are running larger batches than could fit on a single GPU.

My word is final

User avatar
MattB
Posts: 22
Joined: Fri Aug 19, 2022 4:54 pm
Been thanked: 5 times

Re: Hardware Settings

Post by MattB »

Yes, the batch size of 20 maxes out the VRAM. Really, I only invested in the dual GPU's for the purpose of running the larger batches. I'm wondering if I'd have been better off to invest in a single workstation card. Sigh. I was able to change some settings in the nVidia control panel that improved performance. But I'm still disappointed that little A4500 is comparable to two 3090's. Ces't la vie....

Given I'm already in this a few thousand dollars I'm going to look for an nvlink bridge and see if that helps. If not, I'll probably run parallel models with a smaller batch size.

The lesson here is that brute force is not always a fix. Oh, and "a fool and his money are soon parted." :lol:

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: Hardware Settings

Post by MaxHunter »

If you use a link, it should be a little quicker. Let me know how it goes because with 3090 prices dropping dramatically it might be worth me picking up another 3090, linking them and selling my 3080ti.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: Hardware Settings

Post by torzdf »

MattB wrote: Wed Feb 22, 2023 2:19 pm

Yes, the batch size of 20 maxes out the VRAM. Really, I only invested in the dual GPU's for the purpose of running the larger batches. I'm wondering if I'd have been better off to invest in a single workstation card. Sigh. I was able to change some settings in the nVidia control panel that improved performance. But I'm still disappointed that little A4500 is comparable to two 3090's. Ces't la vie....

Given I'm already in this a few thousand dollars I'm going to look for an nvlink bridge and see if that helps. If not, I'll probably run parallel models with a smaller batch size.

The lesson here is that brute force is not always a fix. Oh, and "a fool and his money are soon parted." :lol:

Sadly I have no experience of running dual GPUs for training. I never have matching GPUs, so I sadly can't advise any more. The theory is that multiple GPUs = larger batch sizes, which means training speed up, but I do know that it is not linear, and it is diminishing returns with the more GPUs you add.

Ideally we would look at ways to be able to split the model across multiple GPUs (ie model parallelism) as well as batch splitting (data parallelism), but sadly, due to the aforementioned lack of matching GPUs in my setup, this is not something I can investigate.

My word is final

User avatar
MattB
Posts: 22
Joined: Fri Aug 19, 2022 4:54 pm
Been thanked: 5 times

Re: Hardware Settings

Post by MattB »

Received and installed the nVlink between the 3090's. Oddly after a baffling array of inputs, prices, etc. I just had to walk down to best buy and grab an nVidia branded item off the shelf for $80US. It works. My speeds aren't double, but they're close. It's hard to measure but even things like extractions that previously used one GPU are noticeably quicker. So, a bit of a win. Now if I can just figure out why my electric bill is so high lately... :?:

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: Hardware Settings

Post by torzdf »

Glad you got it sorted out

Now if I can just figure out why my electric bill is so high lately

Ha! If you work out a solution for this, please share :wink:

My word is final

Locked