Advice to maximize results for low VRAM cards

Replicon · Post by **Replicon** » Mon Mar 22, 2021 4:55 pm

Hi all! This is my first post.

I got curious and decided to play around with deepfakes and see what I can do.

My system is 6-7 years old, and my video card is (pasting from my system76 order confirmation): "2 GB nVidia GeForce GTX 750 Ti with 640 CUDA Cores"

From my basic experimentation, I can do the following:

Extract:
- Can do: Mtcnn, Fan aligner, Hist normalization, Re Feed 8
- Crashes (OOM): S3Fd, additional maskers (e.g. Vgg-Obstructed)
Train:
- Can do: Lightweight, Original (with lowmem enabled), batch sizes up to 16
- Crashes (OOM): Original without lowmem, any other trainer I tried (haven't tried most though tbh)

To test, I grabbed a couple of youtube videos with good, stable faces, and played with it. I got some not-too-terrible results by editing the originals down to representative 30 second clips, extracting them to O(725) images each, and training 100K iterations. Not sure what the rules are around posting results of experiments from swapping youtubers, so I'll err on the side of caution and not post them at this point.

I'm hoping someone can give me some advice about how to make the most of my limited resources. Specifically:

Are there other models that work reasonably well with my lower video memory setup?
I don't really see a difference so far between original (lowmem) and Lightweight. Are they basically the same? What are the types of scenarios where one outperforms the other?
Out of curiosity: Is "Original with lowmem" going have the same results as "Original (not lowmem)", just slower? Or is non-lowmem Original going to provide BETTER results, for the same data and the same settings?
What's generally better with these lighter-weight models: More iterations at a lower batch size, or fewer iterations at a higher batch size? I notice BS=1 churns through iterations much faster than BS=16, so if I have say 12 hours to do some training, which route would you take? Or does this really depend on the data? If so... in what way? I'm trying to avoid wasting time as much as possible.

Thanks everyone! I'm really excited to play around with it more. Last time I trained a NN, I was in university taking an AI class, and the term "machine learning" wasn't all mainstream. We built a NN from scratch, to recognize a small set of handwritten characters (5, 6, 7, 8, 9, I if remember right). We've come a long way haha.

Post by **bryanlyon** » Mon Mar 22, 2021 5:01 pm

Lowmem on Original wont provide the same results as Original without.

Lightweight is even more memory constrained than Original with lowmem.

Generally you'll get fastest results with a higher BS. While BS=1 may go through iterations faster, it's actually slower than a higher BS which can go through more frames at a time. That's why the Analysis tab shows the EG/sec. Thats how many images it's showing the model per second instead of the iteration count.

Post by **torzdf** » Tue Mar 23, 2021 11:45 am

Honestly, dude. I'm amazed that you can run any of the models on a 2GB card, so kudos to you

What OS are you running?

Lightweight and LowMem (original) are fairly similar. Lightweight should be the lightest model we have, although it is balanced slightly different to the LowMem original model (lightweight has a bigger Decoder than Original LowMem. Original LowMem has a bigger encoder).

As to what the difference would be? I couldn't tell you that, unfortunately. You'd need to experiment.

Replicon · Post by **Replicon** » Tue Mar 23, 2021 4:06 pm

Thanks folks! Haha yay, barely making the cut with a solid D+

I'm running Ubuntu 20.04. Nothing special. It's from System76, and they do write their own "system76-drivers" package, but I bet that's most relevant on newly-released stuff, since they spend time on getting the latest hardware to work... Once you've had a couple of major version releases, the generic repos are probably up to date enough to not make a difference in that regard.

I'll play with lightweight until I get a stronger understanding of getting/massaging decent source data and mask checking and all that... then once I feel I won't be wasting time/money, I'll give the cloud stuff a try.

... or maybe, if lightweight is so light that the cloud stuff would make quick work of it (100K+ iterations in less than an hour?), it wouldn't be a waste of money to just provision GCE instances with good GPUs for the experimentation. Right now, with BS=16, it takes me roughly 16-17 hours to do 100K iterations on the clips I used for that first experiment.

Do the fancier models provide better results with less-good data (obstructed, lower quality images, etc.), or is it more like, you still need as good source data as you'd need with lightweight, but you get a better/less blurry rendering in fewer iterations, because of the wider firehose?

Post by **torzdf** » Tue Mar 23, 2021 7:17 pm

Replicon wrote: ↑Tue Mar 23, 2021 4:06 pm
Do the fancier models provide better results with less-good data (obstructed, lower quality images, etc.), or is it more like, you still need as good source data as you'd need with lightweight, but you get a better/less blurry rendering in fewer iterations, because of the wider firehose?

The latter. No model will make up for poor data.

Faceswap Forum

Advice to maximize results for low VRAM cards

Advice to maximize results for low VRAM cards

Re: Advice to maximize results for low VRAM cards

Re: Advice to maximize results for low VRAM cards

Re: Advice to maximize results for low VRAM cards

Re: Advice to maximize results for low VRAM cards