intelligently (and automatically) thinning the herd

korrupt78 · Post by **korrupt78** » Sun Mar 16, 2025 3:14 am

Let's say you have too many images (extracted from many videos) — like, 100,000, and you want to use a smart automated process to reduce that to 10,000, which is the maximum recommended number for training in the FAQ.

Would it make sense to use the identity information in the alignments file — along with one of the sorting algorithms provided by Faceswap — to:

first order the images
then start deleting images with minimum distance (difference) to its neighbor until you've hit your goal (100k -> 10k)
In order to end up with a set with maximum variety?

Lastly, is there a document with definitions for each of the many sorting algorithms?

(none,blur,blur-fft,distance,face,face-cnn,face-cnn-dissim,yaw,pitch,roll,hist,hist-dissim,color-black,color-gray,color-luma,color-green,color-orange,size)

Post by **torzdf** » Mon May 19, 2025 4:07 pm

korrupt78 wrote: ↑Sun Mar 16, 2025 3:14 am
Let's say you have too many images (extracted from many videos) — like, 100,000, and you want to use a smart automated process to reduce that to 10,000, which is the maximum recommended number for training in the FAQ.

Would it make sense to use the identity information in the alignments file — along with one of the sorting algorithms provided by Faceswap — to:

first order the images

then start deleting images with minimum distance (difference) to its neighbor until you've hit your goal (100k -> 10k)
In order to end up with a set with maximum variety?

Lastly, is there a document with definitions for each of the many sorting algorithms?

(none,blur,blur-fft,distance,face,face-cnn,face-cnn-dissim,yaw,pitch,roll,hist,hist-dissim,color-black,color-gray,color-luma,color-green,color-orange,size)

Honestly, I never really worry too much about too many images. Data augmentation is artificially increasing this count anyway. I just worry about variety and quality, but the steps you've outlined should work fine.

Unfortunately there is no documentation on sorting algos, but most should be fairly self-explanatory. For the others you could try sorting a data set and see what it does for yourself.

Faceswap Forum

intelligently (and automatically) thinning the herd

intelligently (and automatically) thinning the herd

Re: intelligently (and automatically) thinning the herd