My VR experiences and tips

jode · Post by **jode** » Thu Apr 20, 2023 1:59 pm

I've been working with VR videos and I've notised that detectors find more faces if I use frames as source (not video file). That's why I always use ffmpeg to crop left and right side of video to images. I think 2880x1440 is lowest resolution that is reasonable to use and detectors still finds faces pretty good.

ffmpeg -i orig.mp4 -vf "crop=iw/2:ih:0:0" -q:v 2 -s 1440x1440 origleft/origleft%06d.jpg
ffmpeg -i orig.mp4 -vf "crop=iw/2:ih:ow:0" -q:v 2 -s 1440x1440 origright/origright%06d.jpg

Because adding missing faces with manual tool is so big job to do, I use all three detectors to find faces. First I use S3Fd with rotate images option "0, 90, 270, 360" and Re Align without any extra masker to slow down. After that I manually clean detected faces (In my opinion it's faster than using Sort tool). Then I use Alignments tool to remove bad faces from alignments file and next I use Cv2-Dnn with same setting to detect more faces to different folder Skip existing faces option on. Then cleanup as early, copy good faces to first folder and alignments tool to remove bad faces and last Mtcnn, clean, remove bad and extraction is over.

Why I use detectors in that order is S3Fd finds most of the faces and is pretty fast. Cv2-Dnn is fastest but not so good finding faces. Mtcnn is slowest and is last when there's less frames left to check. Even so every detector finds faces that others don't so every found face saves time with manual tool.

When I started practicing faceswaps I thought training would be most time consuming process but it's not. Aligning faces with manual tool is and that's why I do everyting I can to make it easier. I've had pretty good results just using this extraction method and then just fixing worst misaligned frames detected by manual tool.