Yah new sutff, a vision transformer CLipV

Post by **torzdf** » Mon Sep 04, 2023 9:57 pm

Ryzen1988 wrote: ↑Mon Sep 04, 2023 4:58 pm
They also seem to have a ''State-of-the-art face alignment and face parsing model''

That's already in hand, but it's going to take a while. Most likely there will be an iterative FAN update first whilst I build a training set.

Ryzen1988 · Post by **Ryzen1988** » Tue Sep 05, 2023 7:32 pm

very cool to hear that, should you ever be in need of high quality/high res faces sets for development or stuff i be more that happy to contribute.
Other than that I'm looking forward to these improvements.

Ryzen1988 · Post by **Ryzen1988** » Thu Sep 21, 2023 5:17 am

Having trained a couple of different clip setups up and to around 200000 its, it has become clear that the clipv_vit-l-14 models really stand head and shoulder above the other ones, it's crazy almost every setup using one of the 2 vit-l models gives crazy fast and crazy good results.
Only just a short spurt of training with encoder frozen and when you unfreeze it the magic starts happening fast.
Will post some results in the weekend but this has been a real hero (and last model i picked to try out )

It is crazy heavy on Vram, a setup with the normal vit-L encoder with mixed precision i can just fit on my 4090 with output res of 384 and batch size of 6.
The vit-L-336 big brother with output of 512px just fills up the 48gb vram with batch size 4 as if it was nothing to speak off
But very interesting results, am doing a split / half shared FC model and it really looks crazy good.
keep ya guys updated but if you can fit it, these are really worth checking out

MaxHunter · Post by **MaxHunter** » Fri Sep 22, 2023 3:28 pm

So, would you recommend the 336 for a 512 output?

Though I've thought about it, I ultimately haven't tried these yet because I felt they wouldn't be very good for 512 output.

Ryzen1988 · Post by **Ryzen1988** » Mon Sep 25, 2023 6:46 pm

the clipv L encoder is certainly very suitable for 448-512px output.
The farl B i would personally use for 256-384px.
The clipv L - 334px is probably good for 512 and above but the encoder itself is already very massive so vram is a big issue, so whatever comes behind it as model needs to be skinny.

tokafondo · Post by **tokafondo** » Fri Jan 19, 2024 12:59 pm

It's not that I know what I'm doing, so I started to play with settings so I could make this model run in 12GB with bs=8.

These are my settings so corrections or comments are more than welcome.

clip_v_global_preset.json.txt: (525 Bytes) Downloaded 1077 times

clip_v_global_loss_preset.json.txt: (476 Bytes) Downloaded 1265 times

clip_v_model_phaze_a_preset.json.txt: (1.37 KiB) Downloaded 1170 times

edit: I replaced phaze-a preset because I did a better configuration

Faceswap Forum

Yah new sutff, a vision transformer CLipV

Re: Yah new sutff, a vision transformer CLipV

Re: Yah new sutff, a vision transformer CLipV

Re: Yah new sutff, a vision transformer CLipV

Re: Yah new sutff, a vision transformer CLipV

Re: Yah new sutff, a vision transformer CLipV

Re: Yah new sutff, a vision transformer CLipV