Scaled Memory - Apple Silicone vs Nvidia

sw_frog · Post by **sw_frog** » Tue Mar 11, 2025 11:16 am

Hi Friends,

I have been successfully training deepfake models for a good few months now, and I am taking some time to research and plan the next iteration of my process.

I was reading that the Apple M processors have unified memory, which would be a god send for large resolution and batch training.

However there isn't a lot of information on the subject as NVidia is the king on the subject.

I am training on 4090s, which gives me 20gb ish of headroom for data on both my machines. And while they are really fast compared to everything else I've tested, I wonder if say using an m4 Ultra with 256gb of memory would unlock a much higher resolution. I know it would be a lot slower. But would it work that way?

Would the 256gb of unified memory give me room to train say a 1024x model at a batch of 4 or even 8, even though much slower? Time isn't the problem, I am chasing quality.

Any personal experience or feedback would be great.

Cheers,
A

Post by **bryanlyon** » Tue Mar 11, 2025 3:32 pm

A basic rule of thumb is that doubling the resolution multiplies by 4 the amount of memory required. if 256x256 is taking 20gb, then 512 would take 80gb and 1024 would take 320gb. So, no, 256gb of ram wouldn't be able to handle 1024x1024 training out of the box. However, another key point is that resolution is not the same as fidelity. Increasing the resolution alone would not massively increase the quality of the output. You'd have a blurry output like an undertrained model even when fully trained. The last problem (and the biggest) is data. You'd need a much higher resolution data, sure, but also, you'd need a lot MORE data since you'd be training a much larger number of pixels.

FaceSwap is architected for an output resolution between 64 and 256. It'd require different model techniques to go much higher. If this is something you're really interested in, I'd suggest emailing us at [email protected] to ask about some custom development. It wouldn't be cheap, but it'd be absolutely necessary to push the resolution you want.

The other (and somewhat simpler) option is to tie in an upscaling system after your output. This will let you push your resolution higher. Topaz has a very good upscaling solution for videos, and it has a model that excels at faces. There are open source solutions with lesser results that could still boost the resolution of faceswap's output. Upscaling works best with outputs that are high enough quality to see the details you want (For example, 64x64 isn't quite high enough to get Clint Eastwood's signature smirk and 256x256 is the minimum to get decent wrinkles) then the upscaler can just add more detail in the gaps.

sw_frog · Post by **sw_frog** » Thu Mar 20, 2025 9:34 am

Hey Bryan, thanks a lot for the detailed and informative answer.

I am lucky to be able to get the correct kind of data to train on, in my last set we captured 8k res images. Around 100k images per subject to train on.

However I understand now that it is the architecture that is also a limiter. Topaz and other upscale methods don't hold up to final picture quality requirements for high end vfx.

I'll petition my overlords for budgets to talk to your dev team and see if they'll allow me to entertain that conversation.

Thanks a lot.

Faceswap Forum

Scaled Memory - Apple Silicone vs Nvidia

Scaled Memory - Apple Silicone vs Nvidia

Re: Scaled Memory - Apple Silicone vs Nvidia

Re: Scaled Memory - Apple Silicone vs Nvidia