My way of creating good training set

jode · Post by **jode** » Sun Oct 01, 2023 6:40 am

Let's say the situation is that we have 5 videos which we want to use making good training set for face B. But how to choose versatile set of images from every face angle possible but not too much same type of images? I'm using this method which goes something like this:

At the beginning I of course extract all the videos and delete all obvious false images by hand. Let's say there's about 10000 images left in every video. Next I make these steps to every video one at time:

First I use sort tool to group images by blur making 10 different bins. Then I discard blurriest image directories containing about 20% of images and there's left about 8000 sharpest images.

Next I move all files left to same directory and use sort tool to sort them by face-cnn and group them by face at same time using treshold 0.1 to get many different bins. Let's say it creates 50 bins this time. If my target is to get 4000 images from every video, we can count that means 80 images from every bin. Rather little bit more than less. Next we check every directory and if there's under 160 images, we move them all to the keep directory, 160-240 every second, 240-320 every third and so on. That's not very precise. And because images is sorted by alignments it's important to take every second, every third or so (NOT first half, first third...).

When this is done to all 5 videos, I copy all the images in same directory and make that sorting thing again and now targeting to have as many pictures I'm going to use in training.

Right now I'm making little script to automate file processing when doing this manually is quite time consuming.

Same process without that blur thing works with original face A.

Any thoughts?

Post by **torzdf** » Fri Oct 06, 2023 11:06 am

Thanks for sharing. A couple of points/questions...

jode wrote: ↑Sun Oct 01, 2023 6:40 am
First I use sort tool to group images by blur making 10 different bins. Then I discard blurriest image directories containing about 20% of images and there's left about 8000 sharpest images.

Do you find this effective? I have found sorting by blur hit and miss at best (faces that look fine to me will be in blurry bins, faces which are blurry to me are in non-blurry bins). Unfortunately I haven't found a better algorithm for detecting blurry images.... I also use the mantra "if the final swap sees it, the model needs to see it), so if there are blurry faces in the final swap, I will want some blurry faces going into the model.

I don't personally use face-cnn, so can't comment on effectiveness. I'm strictly a 'sort by face' guy, and adjust threshold according to dataset.

One thing I do tend to do (sometimes from the beginning, but definitely for the final fit train) when working on particularly targeted videos:

Generate a full "A" extraction set for the final convert video
sort + group this by yaw (bins=180) and make a note of the lowest and highest yaw values
Put faces back in the same folder and repeat the same process for pitch
Delete this 'final' face set (just used them for getting pitch + yaw values)
I then sort/group my actual training set (A and B) and remove any faces which fall outside of my final yaw/pitch values (or, within a couple of degrees, either way)

I can then be fairly confident that I'm not wasting training cycles training angles I will never use

jode · Post by **jode** » Mon Oct 09, 2023 3:44 am

torzdf wrote: ↑Fri Oct 06, 2023 11:06 am
Thanks for sharing. A couple of points/questions...

jode wrote: ↑Sun Oct 01, 2023 6:40 am
First I use sort tool to group images by blur making 10 different bins. Then I discard blurriest image directories containing about 20% of images and there's left about 8000 sharpest images.

Do you find this effective? I have found sorting by blur hit and miss at best (faces that look fine to me will be in blurry bins, faces which are blurry to me are in non-blurry bins). Unfortunately I haven't found a better algorithm for detecting blurry images.... I also use the mantra "if the final swap sees it, the model needs to see it), so if there are blurry faces in the final swap, I will want some blurry faces going into the model.

I did some testing and you're maybe right that group by blur is unnecessary because group by face does anyway take poor blurry images away. So there's one step to save time. But do you think there should be blurry images in B side training set? I haven't needed them yet but that't maybe because I'm so new doing this thing. In A side I of course keep them.

I don't personally use face-cnn, so can't comment on effectiveness. I'm strictly a 'sort by face' guy, and adjust threshold according to dataset.

My idea of sorting by face-cnn is to have all possible type of facial expression to my training set as possible. Usually there's most of the faces looking straight forward and just a few profile faces. Faces looking up or down is just a couple. When training straight looking faces learns very fast when other face is still blurry. That way I have created versatile training set where is as equal amount of all kind of faces as possible. And I have found that have been best method I've tried so far with not too much picking up faces manually.

One thing I do tend to do (sometimes from the beginning, but definitely for the final fit train) when working on particularly targeted videos:

Generate a full "A" extraction set for the final convert video

sort + group this by yaw (bins=180) and make a note of the lowest and highest yaw values

Put faces back in the same folder and repeat the same process for pitch

Delete this 'final' face set (just used them for getting pitch + yaw values)

I then sort/group my actual training set (A and B) and remove any faces which fall outside of my final yaw/pitch values (or, within a couple of degrees, either way)

I can then be fairly confident that I'm not wasting training cycles training angles I will never use

Did I get this correctly that you mean if there's exactly same face another 0 degrees and second for example 180 degrees them are different faces in training as they seems same in training set? I have never pay attention to pitch and yaw values but on the other hand I have never done videos having faces tilted very much (over 90 degrees or something)

Ryzen1988 · Post by **Ryzen1988** » Thu Oct 12, 2023 8:18 am

My latest dataset creation proces looked a lot like the previous mentioned, with a slight twist.
I first sorted everything in 5 Yaw sorted groups, then did each group in 3 pitch groups.
So you basically have looking up, straight and down in 5 groups from left to right.

Then sorted each on blur in the pitch group, delete about 20 to 30% most blurry. Little bit more on angles that have a lot of faces, little bit less with the less abundant angles.
Also keep in mind that it sort of matches the target faces in distribution.
Copy all remaining faces back in the main folder and dataset is complete.

Its a bit extra work but it speeds up getting good results and certainly if its a dataset for multiple use its worth the effort

Post by **torzdf** » Thu Oct 12, 2023 9:27 pm

jode wrote: ↑Mon Oct 09, 2023 3:44 am
But do you think there should be blurry images in B side training set? I haven't needed them yet but that't maybe because I'm so new doing this thing. In A side I of course keep them.

Depends on your final swap. If some faces are blurry for A in the final swap, then B should see blurry faces too.

jode wrote: ↑Mon Oct 09, 2023 3:44 am
Did I get this correctly that you mean if there's exactly same face another 0 degrees and second for example 180 degrees them are different faces in training as they seems same in training set? I have never pay attention to pitch and yaw values but on the other hand I have never done videos having faces tilted very much (over 90 degrees or something)

Ultimately pitch and yaw are just 2 values that are very easy to filter by. By themselves they only tell a very small part of the story (lighting, expression, obstructions etc. etc. are also hugely important, but are harder to filter). However if the most people look in your final video is -45 to +45 degrees in any direction, then it is pointless having the model learn on faces which go much beyond these angles.

jode · Post by **jode** » Sat Oct 14, 2023 2:13 pm

FIrst of all I had misunderstood what pitch and yaw means Those was just the right type of grouping I was looking for before. Now making trainsetmaker-script it more easier:

So the process will be something like this to every B face video:

After extraction I group images by face for cleaning bad faces. Then I move all images back together and group them first by pitch and then group all new directories by yaw. Now I have 25 directories and for example directory 13 is faces looking straight forward and directory 05 looking up left.

01-02-03-04-05
06-07-08-09-10
11-12-13-14-15
16-17-18-19-20
21-22-23-24-25

When every video is done I copy all same direction faces together to those 25 bins. Then I sort every bin by face-cnn to make images in order by alignments. Now I have face collection where I can pick the faces I need for training.

And what's the script for? When extracted face A images, and grouped them to 25 bins, it compares face A and face B bins what faces is needed for training and how many of them. If face A 01 bin has 100 images, and face B 01 has 300, it takes every third B face image for training set and so on. And because faces is sorted by face-cnn, every third image covers all possible face look available.

And If i want to add more images to B collection, I just extract new video, make 25 bins and copy them to collection. And of course when adding new images all collection bins them must sort again face-cnn to put new images in right order with old ones.

That's the short version of that, hope you understand what I ment

MaxHunter · Post by **MaxHunter** » Sun Oct 15, 2023 12:44 am

You know, if one were to come up with an algorithm for automatically choosing training faces... Ahem @torzdf cough cough

Post by **torzdf** » Tue Oct 17, 2023 10:06 am

jode wrote: ↑Sat Oct 14, 2023 2:13 pm
So the process will be something like this to every B face video:

You certainly spend a lot more time auditing datasets than I do! But it is good to have a system that works for you.

MaxHunter wrote: ↑Sun Oct 15, 2023 12:44 am
You know, if one were to come up with an algorithm for automatically choosing training faces... Ahem @torzdf cough cough

Ha. Wouldn't that be great Unfortunately use cases can vary greatly from project to project, so would be difficult to implement, even excluding how we measure things like 'lighting' and 'obstructions' etc.

That said, I did start work on a simple dataset analysis tool a while back, but abandoned it due to lack of interest, pose distribution worked out quite well:

: yaw-pirch.png (92.59 KiB) Viewed 9790 times

I couldn't find a good way to represent skin tone in a useful manner before abandoning it:

: col1.jpg (63.5 KiB) Viewed 9790 times

: col2.jpg (29.51 KiB) Viewed 9790 times

MaxHunter · Post by **MaxHunter** » Tue Oct 17, 2023 8:00 pm

You know, you could implement an experimental feature to choose the right amount of poses, just like you have here. Any remaining photos from a batch of faces gets filtered out into a septate folder for review. A slider controls the amount of faces the user needs/wants. The program (with maybe a sensitivity slider) then determines which faces to use within those parameters.

The remaining photos get dumped into filter folder for the user to pick from at their choosing.

Then you could work on improvements to the controller by adding skin tone later on.

jode · Post by **jode** » Fri Nov 03, 2023 8:25 am

I have now improved my system and I extended folders to 49 (7x7). It's necessary because folders 1-14 and 36-49 have usually so few images so there must be more folders in the middle where most of the images is grouped.

I have made script that groups all extracted images to those 49 directories. Then it checks every folder how many images it have and sorts them by image count to 'keep', 'mid' and 'big' folders. In keep folder is all directories less than 500 images, in mid folder all folders with 500-5000 images and in big folder all with over 5000 images.

After that it starts to reduce imagecount of big folder. Let's say there's folders 25 and 26 in big directory. All the big directories is grouped by face and after that all face folders is sorted by face-cnn (In different tasks because it's faster that way). Then it takes every 1 to every 15 pictures depending of current folder image count to keep. After all folder 25 face directories is done, all those images is copied to mid\25 directory. And then same thing to folder 26 and so on. After reducing big directories those have about 2000-6000 images left.

After all big directories is reduced and all the reduced big directories is moved to mid directory, it starts to reduce mid directories. First all the mid directories is sorted by face-cnn and then it takes every 1 to every 15 pictures depending of current folder image count to keep. Then all reduced mid directories is moved to keep directory and now there's all what we want to keep. Usually there's some of the folders 15-35 with each 1-500 files and after quicks check every folder is ok folders is ready to move where we wanna save all different b-face image folders.

This whole process is automatic and all I have to do, is make quick cleanup of extracted faces before using this script to remove all pictures I don't wanna be sorted. For example hand obstructed faces, blurry faces and so on...

I also found a good way to sort blurry images. I found little python script what uses OpenCV to convert images to grayscale and then uses laplacian to determine if image is blurry or not with adjustable treshold. Only problem was that some of the images was determined blurry because of blurry backround even when actual face is sharp. So I made different script that uses ImageMagick to crop middle part of every image where the face is and save it to different folder. Then that blur script sorts those cropped images to blurry and not blurry and after that the script check what files is found in not blurry folder and copies all equivalent original files to keep.

I use that blur detection script only with B-side faces before using that sorting script. Sorting script works also with A-side faces but there I leave blurry faces because that script reduces count of them also and leaves a some of them what probably is necessary.

Third script is to compare A-side and B-side image folders. If using multiple different videos to B-side set, all B-side folders (1-49) must be sorted by face-cnn before using this compare script. This third script checks what folders is used with A-side and then it takes necessary amount of B-side images of those needed folders. And because all the directories is sorted by face-cnn it takes every possible face expression available in that current directory.

I hope you understard because my english is far from perfect and this is little bit hard to explain. But this is the short version of my system and so far it have been working good when doing basic faceswaps. Professionals do their own way

Post by **torzdf** » Fri Nov 03, 2023 1:14 pm

I love over analysing datasets, so good work, and keep it up

Feel free to share your script here so others can try it out.

I can also look at your blur code, as the blur detection we have in Faceswap already masks out the background, so if it works well, I can look to implement it as a sort option in faceswap.

jode · Post by **jode** » Fri Nov 03, 2023 1:50 pm

torzdf wrote: ↑Fri Nov 03, 2023 1:14 pm
I love over analysing datasets, so good work, and keep it up

Feel free to share your script here so others can try it out.

I can also look at your blur code, as the blur detection we have in Faceswap already masks out the background, so if it works well, I can look to implement it as a sort option in faceswap.

There is the python script I found with Google.

My own scripts is coded with AutoIt ( ) and they just do many faceswap grouping and sorting and filecopy operations in batch. Them are dependent of right directories etc. so some code cleaning and stuff is needed to get them work generally. I do some more testing and if i still think this is good, I can pm you working principle of my script.

Faceswap Forum

My way of creating good training set

My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set

Re: My way of creating good training set UPDATE

Re: My way of creating good training set

Re: My way of creating good training set