There appears to be a complex relationship between model structure and how MP will behave.
Some models run significantly faster or with higher batch sizes, but others don't really see any improvement at all.
And in the worst case, I even built a model that somehow allowed me to run a batch size (4) on FP that OOMed on MP
Is it possible to determine how model structure will affect the MP/FP ratio of performance or VRAM use? (without building/running the model, I mean)
I'm at a loss to explain why MP would run slower or with more VRAM than FP in some configurations... has anyone else experienced this? I'm running a 3060, so should support MP fully afaik