Hardware best practices

Talk about Hardware used for Deep Learning


User avatar
torzdf
Posts: 2761
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 141 times
Been thanked: 643 times

Re: Hardware best practices

Post by torzdf »

bryanlyon wrote: Mon Nov 13, 2023 11:26 pm

The trouble with hardware recommendations since the 20xx series has been that you always have to add "if you can get it for a good price" to the end of ANY recommendation. Will a 4060 TI with 16gb of vram work? Definitely. Will it beat a $100 1080? Yes. Is it worthwhile to buy it for FaceSwap? That's ENTIRELY up to you.

16gb of Vram is good enough for any of our models. Most of which were designed to work with 8gb 2060 tis or 1070s. The 16gb of ram will be plenty for most FaceSwap tasks.

Is the 4060ti fast enough? Unquestionably. I started training with 2x 970s in SLI. They'd take days to do what a 4060ti could do in hours.

Is it the right price? That's up to you. For some people spending ANY money on FaceSwap is spurious as it's just "for the memes". Others spend thousands on A6000s and consider them a bargain.

So I repeat: Will a 4060 TI with 16gb of vram work? Definitely. Will it beat a $100 1080? Yes. Is it worthwhile to buy it for FaceSwap? That's ENTIRELY up to you.

I would also add to the above, that if you are wanting to do this professionally (as in full HD + ) then 16GB is unlikely to be enough. You'd really be looking at at least double that, but it becomes hard to recommend for this, as 'professionally' is a broad term, and the prices start to become somewhat insane.

Last edited by torzdf on Mon Nov 13, 2023 11:29 pm, edited 1 time in total.

My word is final

User avatar
trippod
Posts: 3
Joined: Mon Nov 13, 2023 8:23 am
Has thanked: 4 times

Re: Hardware best practices

Post by trippod »

could be more iteresting 2 3060 12gb in sli?

User avatar
torzdf
Posts: 2761
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 141 times
Been thanked: 643 times

Re: Hardware best practices

Post by torzdf »

I have not linked 2 GPUs for training with NVLink (you'd need NVLink not SLI).

However, without NVLink 2 GPUs would just let you run bigger batch sizes, they would not allow you to load larger model.s as a copy of the model needs to be loaded onto each GPU.

You'd need to research NVLinking 2 GPUs to see whether it can be seen as 1 large GPU (that is the VRAM combined). I suspect it will not, and you will hit the same barrier as if you had 2 GPUs without NVLink.

My word is final

User avatar
bryanlyon
Site Admin
Posts: 799
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 221 times
Contact:

Re: Hardware best practices

Post by bryanlyon »

The only 30xx card with the nvlink fingers is the 3090. All others cannot do any sort of "linking". That said, you can still use multiple GPUs in Faceswap. It is diminishing returns where each additional GPU slows down the collective more, and like torzdf says, only the batch size can be increased with additional cards, not the size of the model directly.

User avatar
luisjcaso
Posts: 1
Joined: Thu Jun 27, 2024 6:05 pm

Re: Hardware best practices

Post by luisjcaso »

Hi all, I currently tested some deepfakes using an Asus F15 laptop that has a rtx 4060 with 8gb vram and i'm reaching it's limits due to the low vram, I was thinking on using an eGPU with the laptop but I'm worried on the Thunderbolt 4 bandwith.

Does someone knows of any use case with an external GPU for faceswap and if 32gps is enough bandwidth?

Thanks.

User avatar
EthnTempest
Posts: 5
Joined: Mon Oct 07, 2024 12:22 pm
Has thanked: 1 time

Re: Hardware best practices

Post by EthnTempest »

luisjcaso wrote: Thu Jun 27, 2024 6:12 pm

Hi all, I currently tested some deepfakes using an Asus F15 laptop that has a rtx 4060 with 8gb vram and i'm reaching it's limits due to the low vram, I was thinking on using an eGPU with the laptop but I'm worried on the Thunderbolt 4 bandwith.

Does someone knows of any use case with an external GPU for faceswap and if 32gps is enough bandwidth?

Thanks.

I actually had a similar experience. I was using the internal GPU on my laptop, which wasn’t powerful enough for my needs, so I decided to connect an external GPU. At the time, I was working on a game simulation. Afterward, I attempted to simulate a large number of iterations while saving the previous results. My internal GPU couldn’t handle the workload, so I connected an external one through Thunderbolt. Overall, it was a good solution, but if the intensity of the game had been higher, I would have needed to explore other options.

User avatar
santiagogjof
Posts: 1
Joined: Tue Oct 15, 2024 12:25 am
Has thanked: 1 time

Re: Hardware best practices

Post by santiagogjof »

Hi everyone,

I recently began training my first model and wanted to share my hardware setup experience. Since I'm still involved in crypto mining, I repurposed some GPUs for my face-swap project. I used a 3080 Ti and a 3070 Ti alongside a Ryzen 9 5950X CPU and 32GB of RAM.
...... (currently running Villain, default, 20 batch size on a 3080ti - 1M itirations - 600 512px pics prioritizing quality)
Initially, I assumed adding more GPUs would allow me to increase the batch size and speed up the training process. However, after testing and doing some research, including reading TensorFlow’s distributed training page, I found that this wasn't necessarily true. The algorithm sets the maximum batch size based on the GPU with the least VRAM, so while adding more GPUs initially allowed for larger batches, the training speed actually decreased A LOT.

In my case, the 3070 Ti’s lower VRAM capped my batch size at 14. After experimenting, I realized that using a dual-GPU setup wasn’t ideal for my needs. Switching to a single 3080 Ti turned out to be more efficient, as it processed iterations three times faster than using both GPUs together.
(LOL my plan was to include all 3080ti, 3070ti, 2X 3060ti, bust based on the tests it won't make it super fast.)

I also have two 3060 Ti GPUs and plan to experiment with mirrored strategy for my next project. Although they have lower but same VRAM, I’ll likely need to lower the batch size to 10. I’m hopeful the process will be faster overall. Ideally, I’d prefer to run two 3080 Ti GPUs in parallel for maximum performance, but since my mining farm is far from home, I’ll have to work with the 3060 Ti GPUs for now.

Once I run tests with the 3060 Ti setup, I’ll document the process and share the results. I still have about 800,000 iterations to complete.

Cheers!

Image
https://ibb.co/F0kLm9x

Post Reply