Page 1 of 1

Implementation of TF32 and BF16

Posted: Fri Aug 11, 2023 5:04 pm
by Ryzen1988

Mixed precision is often times very nice for the speed increase and vram savings but its fortunately not only how low can you go :lol:
My experience with stable diffusion leads me to prefer BF16 above FP16.
Since we are getting Clip.vit models now, I believe up from G and H there also bf-16 compatible
Any plans to support or intergrate BF-16? and would this be a difficult thing to code? and would it also benefit older CNN models do you think?

Also the TF32 format seems pretty cool as well as it is sort of drop in replacement for fp32, keeping the 8-bit exponent range as FP32 while using the same 10-bit mantissa as FP-16.
But i don't see it being mentioned much while it seems to be a very good speed/quality tradeoff with little code required.
Not that i could code it but from what i have read :lol:


Re: Implementation of TF32 and BF16

Posted: Fri Aug 11, 2023 6:09 pm
by torzdf

I may be wrong, but from memory, floats get automatically converted to TF32 inside Tensorflow where it is supported.

Edit:
Yes, from here:

https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_tensor_float_32_execution wrote:

TensorFloat-32 is enabled by default. TensorFloat-32 is only supported on NVIDIA GPUs starting with the Ampere generation, so older NVIDIA GPUs and other hardware will use the full float32 precision regardless of whether TensorFloat-32 is enabled or not. If you want to use the full float32 precision on all GPUs, you can disable TensorFloat-32 execution with this function. For example:

I will look into BF16 when I have some time.

Edit:
A quick search shows that BF16 is currently only supported on the A100 and on TPUs, so it's unlikely to be a priority for me to implement this (if it isn't automatically handled by enabling mixed precision on those devices anyway)


Re: Implementation of TF32 and BF16

Posted: Fri Aug 11, 2023 7:48 pm
by Ryzen1988
torzdf wrote: Fri Aug 11, 2023 6:09 pm

I may be wrong, but from memory, floats get automatically converted to TF32 inside Tensorflow where it is supported.

A quick search shows that BF16 is currently only supported on the A100 and on TPUs, so it's unlikely to be a priority for me to implement this (if it isn't automatically handled by enabling mixed precision on those devices anyway)

If i am not mistaken (double checked it for you to make sure) it's available on all GPU's from ampere onwards and also on the gforce consumer sku's.
AMD RDNA 3 seems to supports it from what i read.
Intel was ofcourse one of the first to support it on its CPU's for facebook inference.
And even arm has support for it since Armv8-A

The A100 was the first mayor SKU of the ampere line that released way before the rest of the 3000 series so all the news articles are specifically focused on that sku

About the TF32 i couldn't get a clear picture, some places they said it was automatic, other places said it was automatic up to version xx, and other places claimed just a small code addition was needed :lol: