Auto alignment to "fill in the gaps" based on manually cleaned frames?

Discussions about research, Faceswapping and things that don't fit in the other categories here.


Post Reply
User avatar
deathb4honor
Posts: 4
Joined: Sat Mar 27, 2021 10:14 pm

Auto alignment to "fill in the gaps" based on manually cleaned frames?

Post by deathb4honor »

Hello,

Is it possible to "train" the tool or make it infer from adjacent frames to improve making and alignment?

I have a minute long video but there are consistent obstructions around the face and due to the person's eye make up or creases the auto aligned eyes is always abit off.

Cleaning hundreds of frames for the purpose of training while tedious is something I can stomach. However, aligning literal thousands of frames for only a minute long video is simply way, too, much!


User avatar
torzdf
Posts: 1495
Joined: Fri Jul 12, 2019 12:53 am
Answers: 127
Has thanked: 51 times
Been thanked: 287 times

Re: Auto alignment to "fill in the gaps" based on manually cleaned frames?

Post by torzdf »

If you are needing to manually align 1000s of frames for a single video, then something doesn't sound right. Do you have image examples you can share?

My word is final


User avatar
deathb4honor
Posts: 4
Joined: Sat Mar 27, 2021 10:14 pm

Re: Auto alignment to "fill in the gaps" based on manually cleaned frames?

Post by deathb4honor »

I was able to train off most of the issues with Vgg-obstructed and Learn masks after about 600k iterations, though there's still some double mouth issues. I used Original but I think I will try Villain next.

It was able to train away most obvious obstructions but it can't handle things like hair bangs details or facial fluids (sweat, tears, or something more imaginative) or facial creases very well, so inevitably head turns result in the face feeling very flat even at 1,02m+ iterations.

However i do want to comment that after using the tool a few times I find that it seems to be only capable of processing videos at a frame-by-frame basis, as in, treating them as individual photos with limited to no perceptible ability to take adjacent frames into context and track information as video softwares usually do, so this could be limiting.


Post Reply