The resolution of the source video matters, of course. However, where the subject (beholder of a human face) is positioned within the video frame determines the actual resolution of the face image extracted, prior to saving at the specified output size. Movie clips with high quality close ups will offer the largest dimensions on extracted face images without scaling up and losing detail.
I'm curious to see if the experienced swappers would see any value in a tool that would basically run the extraction process (minus the file output) and give a summary of the sizes of detected faces.
They could be basic stats in bins like:
0 to 64 kb
64 kb to 128 kb
128 kb to 256 kb
256 kb to 512 kb
512 kb to 1024 kb
Giving us x-% of detected face images in each bin. Informing which input sizes would otherwise tend to sacrifice quality when inflating them to a different input size. It could basically tell more novice users that they either need better source video, or should choose training models that work with input resolutions more befitting the extracted faces they're working with.
Alternatively, rather than building a tool, it might simply summarize the above data in the command line output when the extraction is complete.
Any value to doing this?