Yah new sutff, a vision transformer CLipV

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

User avatar
torzdf
Posts: 2621
Joined: Fri Jul 12, 2019 12:53 am
Answers: 155
Has thanked: 128 times
Been thanked: 610 times

Re: Yah new sutff, a vision transformer CLipV

Post by torzdf »

Ryzen1988 wrote: Mon Sep 04, 2023 4:58 pm

They also seem to have a ''State-of-the-art face alignment and face parsing model''

That's already in hand, but it's going to take a while. Most likely there will be an iterative FAN update first whilst I build a training set.

Last edited by torzdf on Mon Sep 04, 2023 9:58 pm, edited 1 time in total.

My word is final

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 27 times

Re: Yah new sutff, a vision transformer CLipV

Post by Ryzen1988 »

very cool to hear that, should you ever be in need of high quality/high res faces sets for development or stuff i be more that happy to contribute.
Other than that I'm looking forward to these improvements. :D

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 27 times

Re: Yah new sutff, a vision transformer CLipV

Post by Ryzen1988 »

Having trained a couple of different clip setups up and to around 200000 its, it has become clear that the clipv_vit-l-14 models really stand head and shoulder above the other ones, it's crazy almost every setup using one of the 2 vit-l models gives crazy fast and crazy good results.
Only just a short spurt of training with encoder frozen and when you unfreeze it the magic starts happening fast.
Will post some results in the weekend but this has been a real hero (and last model i picked to try out :shock: :lol: )

It is crazy heavy on Vram, a setup with the normal vit-L encoder with mixed precision i can just fit on my 4090 with output res of 384 and batch size of 6.
The vit-L-336 big brother with output of 512px just fills up the 48gb vram with batch size 4 as if it was nothing to speak off
But very interesting results, am doing a split / half shared FC model and it really looks crazy good.
keep ya guys updated but if you can fit it, these are really worth checking out

Last edited by Ryzen1988 on Thu Sep 21, 2023 5:40 am, edited 2 times in total.
User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 176 times
Been thanked: 13 times

Re: Yah new sutff, a vision transformer CLipV

Post by MaxHunter »

So, would you recommend the 336 for a 512 output?

Though I've thought about it, I ultimately haven't tried these yet because I felt they wouldn't be very good for 512 output.

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 27 times

Re: Yah new sutff, a vision transformer CLipV

Post by Ryzen1988 »

the clipv L encoder is certainly very suitable for 448-512px output.
The farl B i would personally use for 256-384px.
The clipv L - 334px is probably good for 512 and above but the encoder itself is already very massive so vram is a big issue, so whatever comes behind it as model needs to be skinny.

User avatar
tokafondo
Posts: 32
Joined: Mon Dec 16, 2019 1:43 pm
Has thanked: 10 times
Been thanked: 5 times

Re: Yah new sutff, a vision transformer CLipV

Post by tokafondo »

It's not that I know what I'm doing, so I started to play with settings so I could make this model run in 12GB with bs=8.

These are my settings so corrections or comments are more than welcome.

clip_v_global_preset.json.txt
(525 Bytes) Downloaded 73 times
clip_v_global_loss_preset.json.txt
(476 Bytes) Downloaded 64 times
clip_v_model_phaze_a_preset.json.txt
(1.37 KiB) Downloaded 40 times

edit: I replaced phaze-a preset because I did a better configuration

Last edited by tokafondo on Fri Jan 19, 2024 2:04 pm, edited 1 time in total.
Post Reply