LPIPS Alex vs Squeeze Surprising Behavior

couleurs · Post by **couleurs** » Wed Jan 18, 2023 12:10 am

Both the documentation and the paper - "50x fewer parameters. ... 510x smaller than AlexNet" - describe Squeeze as lightweight compared to Alex. I do find that at the same batch rate, Squeeze is faster than Alex. I'd expect that Squeeze should also consume less VRAM than Alex or at worst the same.

Yet, I consistently get OOM on the max batch size I can do with Alex if I only change Alex to Squeeze.

Started observing this on Windows, and confirmed on Xubuntu to rule out any Windows memory shenanigans.

Is this expected behavior? Is anyone else experiencing this?

Post by **torzdf** » Wed Jan 18, 2023 12:29 pm

I believe that to be correct. Whilst Sqeeze has fewer params than Alex, it does have significantly more activations, convolutions and layers generally. It also has larger output feature maps, so it taking more VRAM is probably to be expected.

Related: https://github.com/forresti/SqueezeNet/issues/19

couleurs · Post by **couleurs** » Thu Jan 19, 2023 7:51 pm

I see - so is there ever any advantage of using Squeeze over Alex since it uses more VRAM and runs as-fast-or-slower? It sounds like it's a smaller model in terms of stored size which is not particularly of concern for a training use-case.

I've noticed
VGG16: uses way more VRAM but gives significantly nicer results, esp. in terms of less moire pattern
Squeeze: uses more VRAM than Alex, gives about the same or worse results, and has a slightly less coarse moire pattern than Alex

I would propose to slightly amend the documentation for Squeeze to clarify this:

Same as lpips_alex, but using the SqueezeNet backbone. A more lightweight version of AlexNet that uses more VRAM than AlexNet.

or something similar? I can open a PR on GitHub if that is preferred.

Post by **bryanlyon** » Thu Jan 19, 2023 8:16 pm

This is funny because squeezenet was designed for low-memory environments. It's used on FPGAs and microcontrollers for example. However, it seems that this is likely due to how Tensorflow works. Accumulates happen after nearly every operation, but TF makes new ones for each operation instead of re-using them once used. Unfortunately Tensorflow is rather opaque in memory usage so tracking down issues can be a painful endeavour.

It's also possible that it's an implementation issue. We don't always get access to the best version of a given model. It's possible that squeezenet is generally smaller, but that some inefficiencies in how it was prepared have actually made it take more vram than necessary (for example, if they trained it in pytorch and brought it to Tensorflow through ONNX).

Post by **torzdf** » Fri Jan 20, 2023 12:15 am

FWIW, both SqueezeNet and AlexNet I ported myself and then manually ported the weights from PyTorch as the models did not exist in Keras and are super-simple:
https://github.com/deepfakes/faceswap/b ... el/nets.py

I can open a PR on GitHub if that is preferred.

No need to PR, I've made a note to update

Faceswap Forum

LPIPS Alex vs Squeeze Surprising Behavior

LPIPS Alex vs Squeeze Surprising Behavior

Re: LPIPS Alex vs Squeeze Surprising Behavior

Re: LPIPS Alex vs Squeeze Surprising Behavior

Re: LPIPS Alex vs Squeeze Surprising Behavior

Re: LPIPS Alex vs Squeeze Surprising Behavior