Both the documentation and the paper - "50x fewer parameters. ... 510x smaller than AlexNet" - describe Squeeze as lightweight compared to Alex. I do find that at the same batch rate, Squeeze is faster than Alex. I'd expect that Squeeze should also consume less VRAM than Alex or at worst the same.
Yet, I consistently get OOM on the max batch size I can do with Alex if I only change Alex to Squeeze.
Started observing this on Windows, and confirmed on Xubuntu to rule out any Windows memory shenanigans.
Is this expected behavior? Is anyone else experiencing this?