Distribution and Multiple GPUS

MaxHunter · Post by **MaxHunter** » Thu Jan 12, 2023 5:49 am

So, I bought a used 3090 to compliment my 3080ti (so I can do other things while messing around with machine learning.) The idea was to use the 3090 for training, and switching the 3080 off so I can use it for gaming or whatever

Faceswap reads there's 36G of memory and I can go from a batch size of 1 to a batch size of three using default, but when I go to mirrored it actually seems to slow down. Is that a type of placebo-effect? Is it really slowing down? And, if left in Default is it still using both GPUs? Because it seems like it is and I thought it was only supposed to use 1.

And, I have my 3080ti plugged into my PCIe 1 slot, and 3090 in #2 (for cooling purposes.) Will that matter to how Faceswap uses the GPUs?

Post by **torzdf** » Thu Jan 12, 2023 5:52 pm

I don't have multiple GPUs to test, but iirc, default strategy should only use 1 GPU.

As to the other questions, I will need to leave them to someone with a multi-gpu setup

Post by **bryanlyon** » Thu Jan 12, 2023 6:52 pm

Tensorflow has recently combined memory of all GPUs into one pool.

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.

See: https://www.tensorflow.org/guide/gpu

This is done because GPUs may be able to be connected with NVLink which provides a high-bandwidth link between the GPUs allowing very low latencies in accessing the memory of other GPUs connected in this way. However, if you don't have NVLink the access will be done over the PCI-E bus which is not very fast or efficient.

This is why the "exclude gpu" option exists in FaceSwap, you can disable the 3080 ti and prevent it from sharing it's VRAM with the 3090. (Though it's not the original reason it exists, the method of restriction is also used to tell Tensorflow not to share the VRAM.)

It's important to note that without setting a distributed mode that the only thing that is shared is the VRAM, the 2nd GPU would not participate at all in the calculations being done.

MaxHunter · Post by **MaxHunter** » Thu Jan 12, 2023 8:19 pm

Thanks for your reply Bryan.

Another question though:
When I tried to shut off the 3080ti to play a game, the game (Assassin's Creed) was slowing down to 18 fps. It was as if Faceswap wasn't turning off the 3080ti. This brings up a couple questions:

Is this due to the tensor flow update you mentioned, causing Faceswap to hold onto the memory?

Or

Is this due to the 3080ti being in the the first PCIe slot?

(Or is this just a fluke?)

Post by **bryanlyon** » Thu Jan 12, 2023 8:26 pm

It's probably due to CPU usage and PCIe bandwidth. Most motherboards have full lanes for each slot, but some don't and even with full PCIe lanes available many motherboards have "crosstalk" that limits full speed when multiple slots are being used extensively.

Also CPU limitations exist. For example if you have a Ryzen chip, the PCIe lanes come off a chiplet inside the chip. That chiplet communicates with the other chiplets through infinity fabric. If you're doing something extensively using the PCIe it's also extensively using the infinity fabric which can be a bottleneck in some circumstances. (Intel chips have similar limitations on PCIe throughput.)

Unfortunately it's very hard to segment a consumer system into separate "nodes" doing different tasks. VMs can help, but are probably overkill (and unlikely to be perfect as they'll likely just slow down the training in order to speed up the game) However, if you want to try, you can see if your motherboard supports configurable NUMA nodes. If you set the two GPUs on separate NUMA nodes along with the cores that they're both using then you might be able to mitigate the slowdowns.

MaxHunter · Post by **MaxHunter** » Thu Jan 12, 2023 9:44 pm

Wow! Super informative! Thanks!!

It just so happens I rebuilt my system over Xmas, with a new motherboard (I blew out my old PCIe lanes installing the 3090 ) (Tai chi z690) and a 13900k. So, it's very possible the Intel chip is working like the Ryzen considering the number of multiple cores.

I'll check into the bios for the NUMA. This is the first time using an ASRock board so I'm still learning their bios. Thanks again!

Faceswap Forum

Distribution and Multiple GPUS

Distribution and Multiple GPUS

Re: Distribution and Multiple GPUS

Re: Distribution and Multiple GPUS

Re: Distribution and Multiple GPUS

Re: Distribution and Multiple GPUS

Re: Distribution and Multiple GPUS