I know there are very little "hard" rules in face swapping, but are there any soft rules for learning rates and epsilons?
Generally I've been training with Adam 3e-5, EE -4 (as suggested by Icarus in his Phase A post,) with the DNY512 model, with edits to fc depth ( 1), fc dimensions (4), Dec filter min 128, and output kernal (3), with stable batch loads of 3 (4 seems to OOM after 6-8 hrs.) I'm noticing some weird blocky discoloration anomalies at about 200k its, that I'm hoping will go away. I'm wondering if raising the learning rates will help with these (because as I understand it, the learning rates are related to detail) but I'm not sure I get the math or how it works to implement.
What are your soft rules for the learning rates? Any advice?