That's already in hand, but it's going to take a while. Most likely there will be an iterative FAN update first whilst I build a training set.
Yah new sutff, a vision transformer CLipV
Read the FAQs and search the forum before posting a new topic.
This forum is for discussing tips and understanding the process involved with Training a Faceswap model.
If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.
Please mark any answers that fixed your problems so others can find the solutions.
- Ryzen1988
- Posts: 52
- Joined: Thu Aug 11, 2022 8:31 am
- Location: Netherlands
- Has thanked: 7 times
- Been thanked: 24 times
Re: Yah new sutff, a vision transformer CLipV
very cool to hear that, should you ever be in need of high quality/high res faces sets for development or stuff i be more that happy to contribute.
Other than that I'm looking forward to these improvements.
- Ryzen1988
- Posts: 52
- Joined: Thu Aug 11, 2022 8:31 am
- Location: Netherlands
- Has thanked: 7 times
- Been thanked: 24 times
Re: Yah new sutff, a vision transformer CLipV
Having trained a couple of different clip setups up and to around 200000 its, it has become clear that the clipv_vit-l-14 models really stand head and shoulder above the other ones, it's crazy almost every setup using one of the 2 vit-l models gives crazy fast and crazy good results.
Only just a short spurt of training with encoder frozen and when you unfreeze it the magic starts happening fast.
Will post some results in the weekend but this has been a real hero (and last model i picked to try out
)
It is crazy heavy on Vram, a setup with the normal vit-L encoder with mixed precision i can just fit on my 4090 with output res of 384 and batch size of 6.
The vit-L-336 big brother with output of 512px just fills up the 48gb vram with batch size 4 as if it was nothing to speak off
But very interesting results, am doing a split / half shared FC model and it really looks crazy good.
keep ya guys updated but if you can fit it, these are really worth checking out
Re: Yah new sutff, a vision transformer CLipV
So, would you recommend the 336 for a 512 output?
Though I've thought about it, I ultimately haven't tried these yet because I felt they wouldn't be very good for 512 output.
- Ryzen1988
- Posts: 52
- Joined: Thu Aug 11, 2022 8:31 am
- Location: Netherlands
- Has thanked: 7 times
- Been thanked: 24 times
Re: Yah new sutff, a vision transformer CLipV
the clipv L encoder is certainly very suitable for 448-512px output.
The farl B i would personally use for 256-384px.
The clipv L - 334px is probably good for 512 and above but the encoder itself is already very massive so vram is a big issue, so whatever comes behind it as model needs to be skinny.