I just started training for the first time. It was running for about 15 minutes, and I had stepped away from the computer for a moment. When I came back I saw this.
collapse_and_recover.png (130.98 KiB) Viewed 3705 times
I think this means that my model collapsed, but it also recovered on its own. I'm pretty sure I would've terminated training if I was present during this 200 iterations.
So, what happened? Did my model collapse and recover? Is this a bad omen?
This is usually a sign of an overclocked GPU. The other option is exploding gradients that recovered -- which is far more rare. The fact it was uniform for A and B tells me the GPU is more likely the culprit.
I noticed a interesting phenomenon. When I was training the model, there were some peaks appeared on the graph. Although it just happened several times and had no impact on the model, but its is still wierd. Have anyone known or encountered the same situation before?