Thanks for the indepth explaination. I know I'm a pain, but I honestly do research the questions beforehand. I didn't know about the effect going beyond a -7 would have on the performance. Makes sense and answers why my performance slowed to a crawl when I updated it to -10.
As for the second question, and the reference to the Nvidia paper (in which I fully read and only partially understood ,) in the link Icarus posted:
"... by three exponent values (multiply by 8) was sufficient to match the accuracy achieved with FP32 training by recovering the relevant values lost to 0. Shifting by 15 exponent values (multiplying by 32K) would recover all but 0.1% of values lost to 0 when converting to FP16 and still avoid overflow. In other words, FP16 dynamic range is sufficient for training, but gradients may have to be scaled to move them into the range to keep them from becoming zeros in FP16."
https://docs.nvidia.com/deeplearning/pe ... index.html
My confusion is in the "multiply by 8". I thought it was referring to the learning rate formula.
I think the answer to my confusion is wrapped up in what the learning rate represents (I know it represents how fast the machine learns, but what do the numbers represent,) how it's calculated, and what the "e" in the learning rate formula represents. I thought the "e" represented the "epsilon". So, I'm interpreting the Nvidia paper to read "8(-7)-5" where as 8 is the multiplier referenced in the paper, -7 is the epsilon, then subtract 5.
I also thought the "e" could represent the epoch. So a learning rate of 8e-5, is interpreted as, for every 8 epochs, subtract 5.
Or, does "e" represent the exponent? 8 to the -7 power, subtract 5.
I have looked and researched hours of my day, and every paper I've come across of course assumes I'm a data scientist (not a middle aged failed writer ) and therefore assumes I already know what the learning rate formula represents and how it's calculated.
This is the danger when idiots like me read scientific papers well above their head.
If you can answer this question, I promise not to ask another question for one weeks time.