Advice to maximize results for low VRAM cards

Discussions about research, Faceswapping and things that don't fit in the other categories here.


Locked
User avatar
Replicon
Posts: 50
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 2 times

Advice to maximize results for low VRAM cards

Post by Replicon »

Hi all! This is my first post.

I got curious and decided to play around with deepfakes and see what I can do.

My system is 6-7 years old, and my video card is (pasting from my system76 order confirmation): "2 GB nVidia GeForce GTX 750 Ti with 640 CUDA Cores"

From my basic experimentation, I can do the following:

  • Extract:

    • Can do: Mtcnn, Fan aligner, Hist normalization, Re Feed 8

    • Crashes (OOM): S3Fd, additional maskers (e.g. Vgg-Obstructed)

  • Train:

    • Can do: Lightweight, Original (with lowmem enabled), batch sizes up to 16

    • Crashes (OOM): Original without lowmem, any other trainer I tried (haven't tried most though tbh)

To test, I grabbed a couple of youtube videos with good, stable faces, and played with it. I got some not-too-terrible results by editing the originals down to representative 30 second clips, extracting them to O(725) images each, and training 100K iterations. Not sure what the rules are around posting results of experiments from swapping youtubers, so I'll err on the side of caution and not post them at this point.

I'm hoping someone can give me some advice about how to make the most of my limited resources. Specifically:

  • Are there other models that work reasonably well with my lower video memory setup?

  • I don't really see a difference so far between original (lowmem) and Lightweight. Are they basically the same? What are the types of scenarios where one outperforms the other?

  • Out of curiosity: Is "Original with lowmem" going have the same results as "Original (not lowmem)", just slower? Or is non-lowmem Original going to provide BETTER results, for the same data and the same settings?

  • What's generally better with these lighter-weight models: More iterations at a lower batch size, or fewer iterations at a higher batch size? I notice BS=1 churns through iterations much faster than BS=16, so if I have say 12 hours to do some training, which route would you take? Or does this really depend on the data? If so... in what way? I'm trying to avoid wasting time as much as possible. :)

Thanks everyone! I'm really excited to play around with it more. Last time I trained a NN, I was in university taking an AI class, and the term "machine learning" wasn't all mainstream. We built a NN from scratch, to recognize a small set of handwritten characters (5, 6, 7, 8, 9, I if remember right). We've come a long way haha.

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Advice to maximize results for low VRAM cards

Post by bryanlyon »

Lowmem on Original wont provide the same results as Original without.

Lightweight is even more memory constrained than Original with lowmem.

Generally you'll get fastest results with a higher BS. While BS=1 may go through iterations faster, it's actually slower than a higher BS which can go through more frames at a time. That's why the Analysis tab shows the EG/sec. Thats how many images it's showing the model per second instead of the iteration count.

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Advice to maximize results for low VRAM cards

Post by torzdf »

Honestly, dude. I'm amazed that you can run any of the models on a 2GB card, so kudos to you :)

What OS are you running?

Lightweight and LowMem (original) are fairly similar. Lightweight should be the lightest model we have, although it is balanced slightly different to the LowMem original model (lightweight has a bigger Decoder than Original LowMem. Original LowMem has a bigger encoder).

As to what the difference would be? I couldn't tell you that, unfortunately. You'd need to experiment.

My word is final

User avatar
Replicon
Posts: 50
Joined: Mon Mar 22, 2021 4:24 pm
Been thanked: 2 times

Re: Advice to maximize results for low VRAM cards

Post by Replicon »

Thanks folks! Haha yay, barely making the cut with a solid D+ :)

I'm running Ubuntu 20.04. Nothing special. It's from System76, and they do write their own "system76-drivers" package, but I bet that's most relevant on newly-released stuff, since they spend time on getting the latest hardware to work... Once you've had a couple of major version releases, the generic repos are probably up to date enough to not make a difference in that regard.

I'll play with lightweight until I get a stronger understanding of getting/massaging decent source data and mask checking and all that... then once I feel I won't be wasting time/money, I'll give the cloud stuff a try.

... or maybe, if lightweight is so light that the cloud stuff would make quick work of it (100K+ iterations in less than an hour?), it wouldn't be a waste of money to just provision GCE instances with good GPUs for the experimentation. Right now, with BS=16, it takes me roughly 16-17 hours to do 100K iterations on the clips I used for that first experiment.

Do the fancier models provide better results with less-good data (obstructed, lower quality images, etc.), or is it more like, you still need as good source data as you'd need with lightweight, but you get a better/less blurry rendering in fewer iterations, because of the wider firehose?

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Advice to maximize results for low VRAM cards

Post by torzdf »

Replicon wrote: Tue Mar 23, 2021 4:06 pm

Do the fancier models provide better results with less-good data (obstructed, lower quality images, etc.), or is it more like, you still need as good source data as you'd need with lightweight, but you get a better/less blurry rendering in fewer iterations, because of the wider firehose?

The latter. No model will make up for poor data.

My word is final

Locked