A question for someone who is from The fruity cargo cult

Ryzen1988 · Post by **Ryzen1988** » Sun Aug 06, 2023 5:26 pm

Jobs Mob..... I'm not really a fan to be honest, but i will reserve my many opinions because i have a honest question.

Vram is always the biggest limiting factor in most AI workloads, i myself have specifically for this a RTX a6000 gpu.
Unfortunately it is prices as if there was a Apple sticker on it and it still tops out at 48gb.

So the new super duper always incredibly overpriced MMMMMega MMMMMMax apple stuff has the advantage of having a unified memory architecture, and can go up to 192gb Dram and with the 4 memory controllers it can do a decent amount of bandwidth
So anyone tried to make a very very big model? Or load a large language model for training? It sounds very ideal to have such a large memory pool, and with the a6000 gpu's costing upwards of 5000 euro's and bigger cards based around the H100 around the 20000, pricewise there is not really a difference.

My assumption of course when you load up to train on the overpriced apple gets stressed that it needs to do real compute, burns and slowdown to sludge and reveals the true nature of the The fruity cargo cult. That it's all show and no real compute.
But hey i could be completely wrong and that it holds up, and then the unified memory architecture seems really nice.

So anyone got one of those maxy apple devices and tried this out?

Post by **bryanlyon** » Fri Aug 11, 2023 4:25 pm

Apple Silicon is impressive and while I'm not a fan of their ecosystem, I must admit to being intrigued by their M2 laptops. That said, by the benchmarks it's about 6-10x slower than a decent Nvidia GPU.

I wouldn't buy it if you want a system to train on all the time, it just isn't the economical option compared to the speed you get out of it. But if you want a laptop that can do AI tasks for more than an hour away from an outlet, it's the only choice in the game.

Ryzen1988 · Post by **Ryzen1988** » Fri Aug 11, 2023 5:32 pm

I really like the thought of having 192gb's of unified memory.
You could only get that with 3x H100 80gb Link with NVlink for coherent memory, and that's a shitload of money if its even available.
To be honest i don't know if with nvlink its really presented as 1 unified memory pool or still coding tricks have to be implemented.
But if you have the fruity construction and the iterations are measured in Its/min you also never get the model done

To be fare the M1 was a big thing, i do think they come up against the same laws of nature regarding the silicon limits.
The improvements to m2 and m3 seem more or less iteratieve.

It makes me wonder is apple that fast because the chips are that good?
Or just because they have to money to simply always be a node ahead.
If you look at a GPU comparison with a node difference it also socking how much improvement has taken place
But still they have accomplished to be customer numero unno at tsmc so that's to there credit

Maybe the future MI300 super duper APU everything on a single socket with 192gb HBM will be the ideal mix.
Just one socket for everything, probably cryogenic cooling deeded for decent frequency ranges
Unfortunately will probably cost its weight in gold as well

foo321 · Post by **foo321** » Sat Nov 04, 2023 12:57 am

I have a Apple Studio M2 Max 32BG with 30 core GPU... I also have a PC with a 4090.
The Mac is WAY slower.
Basically all the underlying software out there that can take advantage of a GPU is assuming Nvidia hardware.
There is probably tons of potential in the Apple hardware due to the shared memory, but none of the software I have played around with is taking advantage of it yet.

Faceswap Forum

A question for someone who is from The fruity cargo cult

A question for someone who is from The fruity cargo cult

Re: A question for someone who is from The fruity cargo cult

Re: A question for someone who is from The fruity cargo cult

Re: A question for someone who is from The fruity cargo cult