Using one GPU has more VRAM than running two?

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
rekauqrast133
Posts: 2
Joined: Mon Aug 02, 2021 5:50 pm

Using one GPU has more VRAM than running two?

Post by rekauqrast133 »

When I run my training with the 1070 and 1070ti which are both at 8 gbs of vram, i get an error no matter what batch size I use saying I don't have the vram to run the training. I went to 2 batches and it still said I didn't have enough memory. I tried it with and without distributed option enabled. Both times it didn't work, However when I ran the training on just a single card at 6 batches it ended up working.

I am using the realface trainer at 128 input/output. which is why i understand the memory requirment is so high but what i dont understand is how having two cards is less memory than a single card. even on a lower batch size to the point where i cant run any batch.

Both cards are fine solo and they worked before running a different model but i cant seem to run this one with multi gpu.

The 70ti is connected to the 16xpci and the normal 70 is connected to the 4xpci. Which i get will slow the cards down but i dont get how its using more memory if that is the issue.

Any help would be greatly appreciated.

by bryanlyon » Mon Aug 02, 2021 10:26 pm

Your error is probably not related to an Out Of Memory (even if it says it is). We don't recommend using different cards in distributed as they can cause issues like this.

However, it is possible that it is a true OOM. Using 2 cards doesn't mean you get double the available RAM. One card still works as the "main" card and will handle the backpropogation. That means it has to copy gradients from the second card and it's possible that it simply can't handle the full model and the gradients from both cards all at once.

Unfortunately, if it is the true OOM, the only way to solve it would be to reduce the memory usage on the main card low enough it can hold the data it needs.

Go to full post
User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Using one GPU has more VRAM than running two?

Post by bryanlyon »

Your error is probably not related to an Out Of Memory (even if it says it is). We don't recommend using different cards in distributed as they can cause issues like this.

However, it is possible that it is a true OOM. Using 2 cards doesn't mean you get double the available RAM. One card still works as the "main" card and will handle the backpropogation. That means it has to copy gradients from the second card and it's possible that it simply can't handle the full model and the gradients from both cards all at once.

Unfortunately, if it is the true OOM, the only way to solve it would be to reduce the memory usage on the main card low enough it can hold the data it needs.

User avatar
rekauqrast133
Posts: 2
Joined: Mon Aug 02, 2021 5:50 pm

Re: Using one GPU has more VRAM than running two?

Post by rekauqrast133 »

I see, so it most likely isn't OOM and more likely has a issue in the pipeline of connectivity, since even dropping the batch size to 2 which would free a lot of memory still gives the issue. I appreciate the resonse.

Locked