I think the problem you note is not really a problem. When a model is swapped, only the parts of the model inside of the frame are output, so anything falling outside (even half an eyeball) are okay. Ditto with training - anything outside of the frame is ignored. So the program doesn't attempt to either train or swap areas not in-frame. As long as the alignments are able to recognize the face that's in frame okay, everything should be good. That's not always the case, and if the alignment is hosed, the swap will be hosed.
I can tell you from experience that raw eye variation is critical to proper tracking. Many new users (i.e. me several months ago) concentrate on just getting facial expressions, and consider a face that is static except for the eye position to be unneeded. No. While it is true that nearly identical faces provide little training value, it turns out that eyes can provide value just by being in a different position. In order for eyes to track well you need lots of different eye positions, and this can be a real problem when you are using lots of "face on" images with the eyes staring at the camera. So you usually need to specifically hunt down training images with lots of eye positions other than looking into the camera. In this regard, even bad (blurry, etc) eye variation is better than too little eye variation. As far as I can tell, when pinged for output the trained model - being unable to find an exact match in the model - uses the closest thing available (someone please correct me if I'm wrong on this). So if you have too little variation during training, you get eyes looking in the wrong direction (i.e. different from the source). Or even worse "lazy eye", where one eye may look okay, but the other is off by a little or a lot.
What would be nice is for the training model to do a bit more work specifically on the eyes to reduce or eliminate this problem. The eyes comprise a very small area, but are high value because they are critical to social engagement. We instinctivelly track where a person's eyes are looking (gaze tracking), and we use that to add context information about the image and the environment. If the model could just take an eye (the iris, actually, and assuming the eye is not closed) and just move it along horizontally under the eyelid openings during training, that would fill in these critical features completely. I think this is what the new "Prioritize Eyes" feature in DFL does, but I'm not really sure since there doesn't appear to be any info on it yet.