Page 1 of 1

AMD Vega 64 issues

Posted: Thu Nov 12, 2020 2:38 pm
by wentdot

Hey all,

So I am the rare AMD user - I have a Vega 64 and a Vega 64 FE.

I have two questions:

  1. I know I can "exclude" a card, but choosing to do so has no effect - is there something else I can try? Faceswap is defaulting to my 64 when I would like it to use the FE. I am guessing I could physically swap the cards but that seems like it should be unnecessary.

  2. Does anyone have experience tuning the cards? I have a very good power supply (850W IIRC) but if I run on anything but the lowest performance it seems like the power spike from the card triggers a crash. I am using rocm-smi --setperflevel low. I know the system can do better than this but I think I need to set up a custom profile, which I'm struggling with.

  3. I found another thread on here where someone set up faceswap to use rocm. I have attempted to follow their example but it didn't work (tells me tensorflow is not installed). But if anyone has done this successfully/recently, I would love to hear about it.

  4. Please don't tell me to buy Nvidia. I will... eventually...

Thank you.


Re: AMD Vega 64 issues

Posted: Thu Nov 12, 2020 4:07 pm
by bryanlyon

Exccluding a card is only used for Tensorflow not PlaidML. Not sure how PlaidML does it.

With reducing the power used by your card (assuming you're on Linux because of your rocm mention) check out https://www.phoronix.com/scan.php?page= ... MD-GPU-TDP which unlocks your TDP. Just make sure to turn it DOWN not up.

For the rocm, if it says tensorflow is not installed, it probably isn't. If it's saying that but it IS installed, that means something has broken out of conda into your system environment. This usually happens because you didn't install everything inside of the env. My suggestion there would be to start again from a fresh env. If that still doesn't work, post a reply on the rocm thread, more11o has expressed interest in if anyone else has gotten it working.


Re: AMD Vega 64 issues

Posted: Thu Nov 12, 2020 5:37 pm
by wentdot

Thanks for your reply. It is helpful to know the card exclusion works only with tensorflow.

Of course immediately after I posted that I fixed the "power" issue... turns out it was a temperature issue after all. Without changing any profile settings, I just turned the fan up before I began and now I'm running with the "compute" preset profile and no issues.

I am guessing that you are right that not everything is installed in the environment. I will pursue it in the other thread, thanks.


Re: AMD Vega 64 issues

Posted: Thu Nov 12, 2020 6:00 pm
by torzdf
bryanlyon wrote: Thu Nov 12, 2020 4:07 pm

Exccluding a card is only used for Tensorflow not PlaidML. Not sure how PlaidML does it.

This is not correct. Exclude GPU should work with plaidML too, but a) plaid is a bit of a nuisance and is entirely inconsistent on whether it wants to work properly and b) I neither have an AMD card, nor multi-gpu's to test it properly.

The best way (in theory) to exclude a GPU under plaidML is to run plaidml-setup from within your environment.

wentdot wrote: Thu Nov 12, 2020 2:38 pm
  1. I found another thread on here where someone set up faceswap to use rocm. I have attempted to follow their example but it didn't work (tells me tensorflow is not installed). But if anyone has done this successfully/recently, I would love to hear about it.

I believe the ROCm method was only working for tf 1.x. I'm not sure if the user got it working for tf 2.x yet, but he is often around in our Discord, so you could see if you can find him there.


Re: AMD Vega 64 issues

Posted: Thu Dec 03, 2020 5:30 pm
by akostadinov

Where is the "another thread"? Could you post a link?