Latest with "Learn Mask"

ianstephens · Post by **ianstephens** » Sun Jul 17, 2022 9:53 pm

Documentation dictates that not much is gained by enabling mask learning.

However, I can see some updates recently focusing on this feature.

We currently use Bisenet-FP along with FS weights as masking settings.

Sometimes, for example, when a subject's nose is larger/longer than the swaps (especially profile images) we can lose the construction of an accurate swap because of the masking based on the original (swapped) face. The nose is cut off and looks stunted in the final render.

What's the latest with using "learn mask" - should we be using this now? It is worth it? Would it make a difference?

Post by **torzdf** » Mon Jul 18, 2022 1:36 am

The recent pushes around learn mask were due to a regression introduced with the new Phaze-A preset, so they were to fix those bugs rather than any feature adds...

Re: Bisenet-face mask. An unintended consequence of full face masking was that when swapping someone with a lower hairline onto someone with a higher hairline, you would often get a "shadowy" area where the lower hairline would be, I pushed an update to the preview tool/masking options which allowed for directional erosion of mask (i.e. erode from top, left, right, bottom separately), and that seems to solve that issue quite well.

Theoretically this could also help with profile masking, but I doubt it would be perfect, and it would require splitting the profile shots out from the rest of the video and applying different mask settings to those shots. Tedious.

I also came up with another idea (this was around the hairline issue, but I imagine it would be more useful for profile shots) in a discussion with user @HoloByteus on Discord. I will post the discussion verbatim here. If you do decide to go that way, I would be keen to hear how it works out.

torzdf wrote:
do the swap, no mask applied.

Generate a bisenet-FP from the swapped video into a copy of the original video's alignment's file.

Run the swap from the original video again, but this time using your newly generated bisenet-fp mask created specifically for the swap

I have no idea if it would work at all well, but that is the only reasonable way I can see of getting an adaptive mask for B's hairline

of course, as bisenet-fp was not trained on AI data, I cannot guarantee that it would even identify the swapped 'hair' as hair at all, but it may be worth a test

HolyByteus wrote:
never even noticed there's a NONE option for training mask. I follow surprisingly... That is a good idea, will have to give that a try.
I've been wanting a way to merge masks... if that was a tools option that would save a whole training session even though you probably won't need to take this to full fruition just to get a mask created... will see

torzdf wrote:
It has long been on my list to be able to have a "custom" mask option, where you can use masks from different maskers. It's never been able to bump it's way high enough up my priority list though.

The main downside [with regenerating masks from swapped frames] that I see (beyond not knowing how it will perform on AI generated data), is that you lose any clean-ups you've done and will need to do it again

However, if it does work, then you probably wouldn't bother cleaning up the masks from the original video. You would just wait until you generated the 'swapped' masks to do the cleanup work

The good news is that the faces are already aligned (thanks to the original alignments file), so it will just take the swapped patches and feed it through the masker
will definitely be interested in results. I expect it to work, just probably not perfectly.

Some time later...

HolyByteus wrote:
Played around with the previously mentioned means of bringing in the Swap mask to a final swap. The technique is simple, train & convert your initial swap using the bisnet-fp-head mask. What's cool is you can have both bisnet-fp-face and head mask, they're considered seperate which is useful. Using Bisnet-fp-head convert with 0 to negative erosion with 1 for mask blend to create as defined a facial boundary as possible. Then rename the original target video and replace with you swapped video and reapply a bisnet-fp-face mask. Finally restore the original target video and re-run the convert using the new swap based bisnet-fp-face mask.

Initially I tried using No mask which as long as you have no obstructions would work but with obstructions, you're better off using Bisnet-Fp-head or they'd be a blurred mess bisenet-face would not be able to mask out on the second application. In addition to dealing with a major difference in hair line it can also be a means of swapping a wider face to a thinner face. It's not perfect though, without hair on both sides of the face you're limited to the boundaries of a bisnet-head mask. As long as there's hair on both sides of the face in all frames you can do a wider face to thinner maintaining the swaps width.

I still needed to use some erosion on the final swap due to hairline shadow being masked in by bisnet-fp-face. Noticed a few problems with erosion, values greater than 15 can be problematic and also one value doth not fit all angles. What works great for a straight on face won't work for an angle leading to eroding in target data or not eroding enough. You have to find the best middle ground and just blend in hard with a 9 setting. One other issue with erosion are frames where the face is off frame. Apparently there's no mask off frame so eroding down will result in target data being swapped in. Just something to be aware of and yet another reason to avoid off frame faces.

torzdf wrote:
Yeah, I didn't consider obstructions when suggesting this, so, yeah, you would need originally masked input to mask out that blurry mess.
However, I'm glad that, in principal the idea appears to work. I knew it would, but did not know if it would work at all well.
re: mask values for different angles etc... Yeah, that is a challenge which I'm not sure I see a solve to. Ultimately the most robust (albeit time consuming) method would be to split the output video into different poses, and mask separately.
The best approach would almost definitely be to output the untouched mask as a separate layer and comp in a 3rd party application.
The only theoretical solution I can think of is that the mask erosion/blurring can be somehow related to pose data. We do collect the pose data, but implementing this in a user-friendly and usable way would be a huge challenge to say the least.
Ultimately 3rd party tools are likely to be able to do a better job with any output masks than I could implement in faceswap

martinf · Post by **martinf** » Fri Sep 30, 2022 3:49 pm

Another possible aid to the forehead boundary issue would be to introduce a second adjustment channel just for the top of the mask. The entire mask could be set to average color or histogram, but allow the top 1/3 of the swapped face to also be mapped to the underlying luminance channel of the A image. You would not want this adjustment to get into the detailed areas of the face however. This second layer could be controlled separately for its transparency perhaps. This could allow the border at the forehead and below the hairline to be luminance adjusted to smooth out the border. Getting the luminance information from the A face, then applying it over the top of the color-adjusted B face could really smooth that border out.

I use this technique in Photoshop when compositing interior photographs taken with different K light sources.

martinf · Post by **martinf** » Fri Sep 30, 2022 4:19 pm

And, yet another procedural mask enhancement idea...

For landmarks based masks, think of the nose, mouth and eyes as the main body of a comet and the forehead and cheeks as the coma. Create a safe zone in which the mask remains tightly bounded at any point that the landmarks approach the boundary of the face while, at the same time, allowing for erosion to take place AWAY from the important landmarks.

Think of a video of someone looking both ways before crossing a street. At one point their profile is to the left and the important features are maintained in a tight boundary while the rightmost portion is subject to user defined erosion. As the face looks the other way, the erosion is slowly reduced on the cheek so that the face (now looking straight at the camera) still has all of the important landmark features preserved while the left and right cheeks are now equally eroded. Then the face looks completely to the right... the erosion of right side of the mask is eliminated, preserving the profile, and shifted now to the left cheek where a smoother transition is achieved.

If one was to just look at the mask as it is animated, it would look a bit like a comet with the coma facing away from the critically detailed landmarks.

I'll stop with the ideas for now...

Post by **torzdf** » Mon Oct 03, 2022 12:06 pm

These are all good ideas. The main challenge (for me) is finding the time to research and implement these kind of things

Faceswap Forum

Latest with "Learn Mask"

Latest with "Learn Mask"

Re: Latest with "Learn Mask"

Re: Latest with "Learn Mask"

Re: Latest with "Learn Mask"

Re: Latest with "Learn Mask"