After 10+ days of training, I have come to the conclusion that a training run without a preview (including writing to an image) is more stable or completely stable
My second suspect is a combination of loss functions and model training state: start/mid/end.