Scaled Memory - Apple Silicone vs Nvidia

Talk about Hardware used for Deep Learning


Post Reply
User avatar
sw_frog
Posts: 1
Joined: Tue Mar 11, 2025 11:07 am

Scaled Memory - Apple Silicone vs Nvidia

Post by sw_frog »

Hi Friends,

I have been successfully training deepfake models for a good few months now, and I am taking some time to research and plan the next iteration of my process.

I was reading that the Apple M processors have unified memory, which would be a god send for large resolution and batch training.

However there isn't a lot of information on the subject as NVidia is the king on the subject.

I am training on 4090s, which gives me 20gb ish of headroom for data on both my machines. And while they are really fast compared to everything else I've tested, I wonder if say using an m4 Ultra with 256gb of memory would unlock a much higher resolution. I know it would be a lot slower. But would it work that way?

Would the 256gb of unified memory give me room to train say a 1024x model at a batch of 4 or even 8, even though much slower? Time isn't the problem, I am chasing quality.

Any personal experience or feedback would be great.

Cheers,
A

User avatar
bryanlyon
Site Admin
Posts: 805
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 223 times
Contact:

Re: Scaled Memory - Apple Silicone vs Nvidia

Post by bryanlyon »

A basic rule of thumb is that doubling the resolution multiplies by 4 the amount of memory required. if 256x256 is taking 20gb, then 512 would take 80gb and 1024 would take 320gb. So, no, 256gb of ram wouldn't be able to handle 1024x1024 training out of the box. However, another key point is that resolution is not the same as fidelity. Increasing the resolution alone would not massively increase the quality of the output. You'd have a blurry output like an undertrained model even when fully trained. The last problem (and the biggest) is data. You'd need a much higher resolution data, sure, but also, you'd need a lot MORE data since you'd be training a much larger number of pixels.

FaceSwap is architected for an output resolution between 64 and 256. It'd require different model techniques to go much higher. If this is something you're really interested in, I'd suggest emailing us at contact@faceswap.dev to ask about some custom development. It wouldn't be cheap, but it'd be absolutely necessary to push the resolution you want.

The other (and somewhat simpler) option is to tie in an upscaling system after your output. This will let you push your resolution higher. Topaz has a very good upscaling solution for videos, and it has a model that excels at faces. There are open source solutions with lesser results that could still boost the resolution of faceswap's output. Upscaling works best with outputs that are high enough quality to see the details you want (For example, 64x64 isn't quite high enough to get Clint Eastwood's signature smirk and 256x256 is the minimum to get decent wrinkles) then the upscaler can just add more detail in the gaps.

Post Reply