Hello all, I've been toying with faceswap for a few weeks now. Many thanks to the dev team. I own a couple of SW integration shops and realize the hard work (and IQ) that goes into this sort of work. That said, I'm basically toying with different methods and the performance of each (both in speed and quality). I have four sets of source and target faces I'm using to test. The targets are mostly the same, mid-quality photos downloaded in bulk from shutterstock. The sources range from 15K low quality faces ripped from random videos to 5K high quality faces from some poor couples wedding pictures (that what you get for posting stuff on the interwebs!). I have two dedicated PC's running processes 24x7, but only check them 1-2 times daily.
Rig1 - Intel i9 (9th gen) with nVidia RTX 2070 8gb
Rig 2 - Intel (12th gen) with nVidia RTX 2070 12gb
Both have 32gb RAM and 4th gen NVMe HDD's (it's good to own IT companies!)
My problem is I always encounter NaN errors that shut down the tests before I hit 1M iterations. I try to roll back, sometimes 200K iterations, and still the NaN's show up in the same rough area, between 500-600K iterations.
That's a long input to my question which is "what are the best methods to avoid NaNs?" I've tried reducing the learning rate, batch size, etc. based on some posts I've read. But nothing seems to totally solve the issue. I'm not so much worried about processing speed, since I can leave these rigs cranking for a week if I want to. It's reliability, so I can determine quality for different models/applications, etc.
Any advice is appreciated. I mean other than "find a better use of my time."
FYI, since I've never had a batch complete 1M iterations I'm splitting one rig working on Rig one running Dfaker models and Rig two running Villain (because of GPU constraints).
Cheers!