Distribution and Multiple GPUS

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Distribution and Multiple GPUS

Post by MaxHunter »

So, I bought a used 3090 to compliment my 3080ti (so I can do other things while messing around with machine learning.) The idea was to use the 3090 for training, and switching the 3080 off so I can use it for gaming or whatever

Faceswap reads there's 36G of memory and I can go from a batch size of 1 to a batch size of three using default, but when I go to mirrored it actually seems to slow down. Is that a type of placebo-effect? Is it really slowing down? And, if left in Default is it still using both GPUs? Because it seems like it is and I thought it was only supposed to use 1.

And, I have my 3080ti plugged into my PCIe 1 slot, and 3090 in #2 (for cooling purposes.) Will that matter to how Faceswap uses the GPUs?

Last edited by MaxHunter on Thu Jan 12, 2023 6:09 pm, edited 1 time in total.
User avatar
torzdf
Posts: 2667
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 131 times
Been thanked: 625 times

Re: Distribution and Multiple GPUS

Post by torzdf »

I don't have multiple GPUs to test, but iirc, default strategy should only use 1 GPU.

As to the other questions, I will need to leave them to someone with a multi-gpu setup

My word is final

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Distribution and Multiple GPUS

Post by bryanlyon »

Tensorflow has recently combined memory of all GPUs into one pool.

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.

See: https://www.tensorflow.org/guide/gpu

This is done because GPUs may be able to be connected with NVLink which provides a high-bandwidth link between the GPUs allowing very low latencies in accessing the memory of other GPUs connected in this way. However, if you don't have NVLink the access will be done over the PCI-E bus which is not very fast or efficient.

This is why the "exclude gpu" option exists in FaceSwap, you can disable the 3080 ti and prevent it from sharing it's VRAM with the 3090. (Though it's not the original reason it exists, the method of restriction is also used to tell Tensorflow not to share the VRAM.)

It's important to note that without setting a distributed mode that the only thing that is shared is the VRAM, the 2nd GPU would not participate at all in the calculations being done.

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: Distribution and Multiple GPUS

Post by MaxHunter »

Thanks for your reply Bryan.

Another question though:
When I tried to shut off the 3080ti to play a game, the game (Assassin's Creed) was slowing down to 18 fps. It was as if Faceswap wasn't turning off the 3080ti. This brings up a couple questions:

  • Is this due to the tensor flow update you mentioned, causing Faceswap to hold onto the memory?

Or

  • Is this due to the 3080ti being in the the first PCIe slot?

(Or is this just a fluke?)

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Distribution and Multiple GPUS

Post by bryanlyon »

It's probably due to CPU usage and PCIe bandwidth. Most motherboards have full lanes for each slot, but some don't and even with full PCIe lanes available many motherboards have "crosstalk" that limits full speed when multiple slots are being used extensively.

Also CPU limitations exist. For example if you have a Ryzen chip, the PCIe lanes come off a chiplet inside the chip. That chiplet communicates with the other chiplets through infinity fabric. If you're doing something extensively using the PCIe it's also extensively using the infinity fabric which can be a bottleneck in some circumstances. (Intel chips have similar limitations on PCIe throughput.)

Unfortunately it's very hard to segment a consumer system into separate "nodes" doing different tasks. VMs can help, but are probably overkill (and unlikely to be perfect as they'll likely just slow down the training in order to speed up the game) However, if you want to try, you can see if your motherboard supports configurable NUMA nodes. If you set the two GPUs on separate NUMA nodes along with the cores that they're both using then you might be able to mitigate the slowdowns.

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: Distribution and Multiple GPUS

Post by MaxHunter »

Wow! Super informative! Thanks!!

It just so happens I rebuilt my system over Xmas, with a new motherboard (I blew out my old PCIe lanes installing the 3090 🤦🙄) (Tai chi z690) and a 13900k. So, it's very possible the Intel chip is working like the Ryzen considering the number of multiple cores.

I'll check into the bios for the NUMA. This is the first time using an ASRock board so I'm still learning their bios. Thanks again!

Locked