Implementation of TF32 and BF16

Discussions about research, Faceswapping and things that don't fit in the other categories here.


Locked
User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Implementation of TF32 and BF16

Post by Ryzen1988 »

Mixed precision is often times very nice for the speed increase and vram savings but its fortunately not only how low can you go :lol:
My experience with stable diffusion leads me to prefer BF16 above FP16.
Since we are getting Clip.vit models now, I believe up from G and H there also bf-16 compatible
Any plans to support or intergrate BF-16? and would this be a difficult thing to code? and would it also benefit older CNN models do you think?

Also the TF32 format seems pretty cool as well as it is sort of drop in replacement for fp32, keeping the 8-bit exponent range as FP32 while using the same 10-bit mantissa as FP-16.
But i don't see it being mentioned much while it seems to be a very good speed/quality tradeoff with little code required.
Not that i could code it but from what i have read :lol:

User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: Implementation of TF32 and BF16

Post by torzdf »

I may be wrong, but from memory, floats get automatically converted to TF32 inside Tensorflow where it is supported.

Edit:
Yes, from here:

https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_tensor_float_32_execution wrote:

TensorFloat-32 is enabled by default. TensorFloat-32 is only supported on NVIDIA GPUs starting with the Ampere generation, so older NVIDIA GPUs and other hardware will use the full float32 precision regardless of whether TensorFloat-32 is enabled or not. If you want to use the full float32 precision on all GPUs, you can disable TensorFloat-32 execution with this function. For example:

I will look into BF16 when I have some time.

Edit:
A quick search shows that BF16 is currently only supported on the A100 and on TPUs, so it's unlikely to be a priority for me to implement this (if it isn't automatically handled by enabling mixed precision on those devices anyway)

Last edited by torzdf on Fri Aug 11, 2023 6:16 pm, edited 2 times in total.

My word is final

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 28 times

Re: Implementation of TF32 and BF16

Post by Ryzen1988 »

torzdf wrote: Fri Aug 11, 2023 6:09 pm

I may be wrong, but from memory, floats get automatically converted to TF32 inside Tensorflow where it is supported.

A quick search shows that BF16 is currently only supported on the A100 and on TPUs, so it's unlikely to be a priority for me to implement this (if it isn't automatically handled by enabling mixed precision on those devices anyway)

If i am not mistaken (double checked it for you to make sure) it's available on all GPU's from ampere onwards and also on the gforce consumer sku's.
AMD RDNA 3 seems to supports it from what i read.
Intel was ofcourse one of the first to support it on its CPU's for facebook inference.
And even arm has support for it since Armv8-A

The A100 was the first mayor SKU of the ampere line that released way before the rest of the 3000 series so all the news articles are specifically focused on that sku

About the TF32 i couldn't get a clear picture, some places they said it was automatic, other places said it was automatic up to version xx, and other places claimed just a small code addition was needed :lol:

Last edited by Ryzen1988 on Fri Aug 11, 2023 7:53 pm, edited 1 time in total.
Locked