Erm... Loss A: nan, Loss B: nan - what's going on?
Have been training for over 18 hours - it started out like this and it's still the same... what could cause this?
The place to discuss Faceswap and Deepfakes
https://forum.faceswap.dev/
Have been training for over 18 hours - it started out like this and it's still the same... what could cause this?
Damn! I think i figured it out...
I was using Distributed training across 3090 and 2080....
Re-started it by excluding the 3090 and it's working.
FYI - for anyone who may face it
Actually, app.php/faqpage#f3r6 It's a somewhat common occurrence, though using two different generations of GPU may make it more common.
I'm assuming that once we get to a point where 3x cards are supported, using 2 GPUs power would actually be beneficial?
You can already do multiple GPU. That's what the distributed feature is for. Mixing generations (30xx and 20xx) will probably never be worthwhile however.
Could you explain why so?
Because they all work differently under the covers. Mixing GPUs causes multiple graphs to be created. First, the slowest GPU will be setting the pace, so the faster GPU will be waiting on the slower one all the time. Second, if the graphs happen to diverge at any point, you would get a failure like you saw. There is really no way to avoid that (except for running on a single GPU generation).