intelligently (and automatically) thinning the herd

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Post Reply
User avatar
korrupt78
Posts: 59
Joined: Wed Jan 29, 2020 1:34 am
Answers: 0
Has thanked: 4 times
Been thanked: 1 time

intelligently (and automatically) thinning the herd

Post by korrupt78 »

Let's say you have too many images (extracted from many videos) — like, 100,000, and you want to use a smart automated process to reduce that to 10,000, which is the maximum recommended number for training in the FAQ.

Would it make sense to use the identity information in the alignments file — along with one of the sorting algorithms provided by Faceswap — to:

  1. first order the images
  2. then start deleting images with minimum distance (difference) to its neighbor until you've hit your goal (100k -> 10k)
    In order to end up with a set with maximum variety?

Lastly, is there a document with definitions for each of the many sorting algorithms?

(none,blur,blur-fft,distance,face,face-cnn,face-cnn-dissim,yaw,pitch,roll,hist,hist-dissim,color-black,color-gray,color-luma,color-green,color-orange,size)

User avatar
torzdf
Posts: 2790
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 142 times
Been thanked: 649 times

Re: intelligently (and automatically) thinning the herd

Post by torzdf »

korrupt78 wrote: Sun Mar 16, 2025 3:14 am

Let's say you have too many images (extracted from many videos) — like, 100,000, and you want to use a smart automated process to reduce that to 10,000, which is the maximum recommended number for training in the FAQ.

Would it make sense to use the identity information in the alignments file — along with one of the sorting algorithms provided by Faceswap — to:

  1. first order the images
  2. then start deleting images with minimum distance (difference) to its neighbor until you've hit your goal (100k -> 10k)
    In order to end up with a set with maximum variety?

Lastly, is there a document with definitions for each of the many sorting algorithms?

(none,blur,blur-fft,distance,face,face-cnn,face-cnn-dissim,yaw,pitch,roll,hist,hist-dissim,color-black,color-gray,color-luma,color-green,color-orange,size)

Honestly, I never really worry too much about too many images. Data augmentation is artificially increasing this count anyway. I just worry about variety and quality, but the steps you've outlined should work fine.

Unfortunately there is no documentation on sorting algos, but most should be fairly self-explanatory. For the others you could try sorting a data set and see what it does for yourself.

My word is final

Post Reply