Page 1 of 1

Training runs painfully slow on V100

Posted: Tue Aug 27, 2019 4:21 pm
by koko191

For some reason my training runs super slow on V100 even though it runs like a mad lad on a Titan RTX. I'm using the same settings for both GPU. When I check nvidia-smi, the script was using 12+GB of VRAM on the Titan RTX, but always 305MB on the V100. Am I doing something wrong? The V100, according to what I know, should be a lot faster than any other GPU.


Re: Training runs painfully slow on V100

Posted: Tue Aug 27, 2019 7:21 pm
by koko191

Solved.

Both machines had CUDA 10.1 (and other versions as well) but the RTX had cuDNN 7.5.1 (compatible with CUDA 10.1) while the V100 had only cuDNN 7.1.4 (incompatible with CUDA 10.1). I was using TF 1.14.0. Downgraded to 1.12.3 (compatible with CUDA 9.2, which both machines had, and CUDA 9.2 is compatible with cuDNN 7.1.4) and everything worked.