LPIPS Alex vs Squeeze Surprising Behavior

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
couleurs
Posts: 9
Joined: Fri Jan 13, 2023 3:09 am
Has thanked: 10 times
Been thanked: 6 times

LPIPS Alex vs Squeeze Surprising Behavior

Post by couleurs »

Both the documentation and the paper - "50x fewer parameters. ... 510x smaller than AlexNet" - describe Squeeze as lightweight compared to Alex. I do find that at the same batch rate, Squeeze is faster than Alex. I'd expect that Squeeze should also consume less VRAM than Alex or at worst the same.

Yet, I consistently get OOM on the max batch size I can do with Alex if I only change Alex to Squeeze.

Started observing this on Windows, and confirmed on Xubuntu to rule out any Windows memory shenanigans.

Is this expected behavior? Is anyone else experiencing this?

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: LPIPS Alex vs Squeeze Surprising Behavior

Post by torzdf »

I believe that to be correct. Whilst Sqeeze has fewer params than Alex, it does have significantly more activations, convolutions and layers generally. It also has larger output feature maps, so it taking more VRAM is probably to be expected.

Related: https://github.com/forresti/SqueezeNet/issues/19

My word is final

User avatar
couleurs
Posts: 9
Joined: Fri Jan 13, 2023 3:09 am
Has thanked: 10 times
Been thanked: 6 times

Re: LPIPS Alex vs Squeeze Surprising Behavior

Post by couleurs »

I see - so is there ever any advantage of using Squeeze over Alex since it uses more VRAM and runs as-fast-or-slower? It sounds like it's a smaller model in terms of stored size which is not particularly of concern for a training use-case.

I've noticed
VGG16: uses way more VRAM but gives significantly nicer results, esp. in terms of less moire pattern
Squeeze: uses more VRAM than Alex, gives about the same or worse results, and has a slightly less coarse moire pattern than Alex

I would propose to slightly amend the documentation for Squeeze to clarify this:

Same as lpips_alex, but using the SqueezeNet backbone. A more lightweight version of AlexNet that uses more VRAM than AlexNet.

or something similar? I can open a PR on GitHub if that is preferred.

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: LPIPS Alex vs Squeeze Surprising Behavior

Post by bryanlyon »

This is funny because squeezenet was designed for low-memory environments. It's used on FPGAs and microcontrollers for example. However, it seems that this is likely due to how Tensorflow works. Accumulates happen after nearly every operation, but TF makes new ones for each operation instead of re-using them once used. Unfortunately Tensorflow is rather opaque in memory usage so tracking down issues can be a painful endeavour.

It's also possible that it's an implementation issue. We don't always get access to the best version of a given model. It's possible that squeezenet is generally smaller, but that some inefficiencies in how it was prepared have actually made it take more vram than necessary (for example, if they trained it in pytorch and brought it to Tensorflow through ONNX).

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: LPIPS Alex vs Squeeze Surprising Behavior

Post by torzdf »

FWIW, both SqueezeNet and AlexNet I ported myself and then manually ported the weights from PyTorch as the models did not exist in Keras and are super-simple:
https://github.com/deepfakes/faceswap/b ... el/nets.py

I can open a PR on GitHub if that is preferred.

No need to PR, I've made a note to update

My word is final

Locked