Page 1 of 1

intelligently (and automatically) thinning the herd

Posted: Sun Mar 16, 2025 3:14 am
by korrupt78

Let's say you have too many images (extracted from many videos) — like, 100,000, and you want to use a smart automated process to reduce that to 10,000, which is the maximum recommended number for training in the FAQ.

Would it make sense to use the identity information in the alignments file — along with one of the sorting algorithms provided by Faceswap — to:

  1. first order the images
  2. then start deleting images with minimum distance (difference) to its neighbor until you've hit your goal (100k -> 10k)
    In order to end up with a set with maximum variety?

Lastly, is there a document with definitions for each of the many sorting algorithms?

(none,blur,blur-fft,distance,face,face-cnn,face-cnn-dissim,yaw,pitch,roll,hist,hist-dissim,color-black,color-gray,color-luma,color-green,color-orange,size)


Re: intelligently (and automatically) thinning the herd

Posted: Mon May 19, 2025 4:07 pm
by torzdf
korrupt78 wrote: Sun Mar 16, 2025 3:14 am

Let's say you have too many images (extracted from many videos) — like, 100,000, and you want to use a smart automated process to reduce that to 10,000, which is the maximum recommended number for training in the FAQ.

Would it make sense to use the identity information in the alignments file — along with one of the sorting algorithms provided by Faceswap — to:

  1. first order the images
  2. then start deleting images with minimum distance (difference) to its neighbor until you've hit your goal (100k -> 10k)
    In order to end up with a set with maximum variety?

Lastly, is there a document with definitions for each of the many sorting algorithms?

(none,blur,blur-fft,distance,face,face-cnn,face-cnn-dissim,yaw,pitch,roll,hist,hist-dissim,color-black,color-gray,color-luma,color-green,color-orange,size)

Honestly, I never really worry too much about too many images. Data augmentation is artificially increasing this count anyway. I just worry about variety and quality, but the steps you've outlined should work fine.

Unfortunately there is no documentation on sorting algos, but most should be fairly self-explanatory. For the others you could try sorting a data set and see what it does for yourself.