AMD Vega 64 issues

Talk about Hardware used for Deep Learning


Locked
User avatar
wentdot
Posts: 8
Joined: Thu Nov 12, 2020 2:30 pm
Has thanked: 3 times

AMD Vega 64 issues

Post by wentdot »

Hey all,

So I am the rare AMD user - I have a Vega 64 and a Vega 64 FE.

I have two questions:

  1. I know I can "exclude" a card, but choosing to do so has no effect - is there something else I can try? Faceswap is defaulting to my 64 when I would like it to use the FE. I am guessing I could physically swap the cards but that seems like it should be unnecessary.

  2. Does anyone have experience tuning the cards? I have a very good power supply (850W IIRC) but if I run on anything but the lowest performance it seems like the power spike from the card triggers a crash. I am using rocm-smi --setperflevel low. I know the system can do better than this but I think I need to set up a custom profile, which I'm struggling with.

  3. I found another thread on here where someone set up faceswap to use rocm. I have attempted to follow their example but it didn't work (tells me tensorflow is not installed). But if anyone has done this successfully/recently, I would love to hear about it.

  4. Please don't tell me to buy Nvidia. I will... eventually...

Thank you.

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: AMD Vega 64 issues

Post by bryanlyon »

Exccluding a card is only used for Tensorflow not PlaidML. Not sure how PlaidML does it.

With reducing the power used by your card (assuming you're on Linux because of your rocm mention) check out https://www.phoronix.com/scan.php?page= ... MD-GPU-TDP which unlocks your TDP. Just make sure to turn it DOWN not up.

For the rocm, if it says tensorflow is not installed, it probably isn't. If it's saying that but it IS installed, that means something has broken out of conda into your system environment. This usually happens because you didn't install everything inside of the env. My suggestion there would be to start again from a fresh env. If that still doesn't work, post a reply on the rocm thread, more11o has expressed interest in if anyone else has gotten it working.

User avatar
wentdot
Posts: 8
Joined: Thu Nov 12, 2020 2:30 pm
Has thanked: 3 times

Re: AMD Vega 64 issues

Post by wentdot »

Thanks for your reply. It is helpful to know the card exclusion works only with tensorflow.

Of course immediately after I posted that I fixed the "power" issue... turns out it was a temperature issue after all. Without changing any profile settings, I just turned the fan up before I began and now I'm running with the "compute" preset profile and no issues.

I am guessing that you are right that not everything is installed in the environment. I will pursue it in the other thread, thanks.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: AMD Vega 64 issues

Post by torzdf »

bryanlyon wrote: Thu Nov 12, 2020 4:07 pm

Exccluding a card is only used for Tensorflow not PlaidML. Not sure how PlaidML does it.

This is not correct. Exclude GPU should work with plaidML too, but a) plaid is a bit of a nuisance and is entirely inconsistent on whether it wants to work properly and b) I neither have an AMD card, nor multi-gpu's to test it properly.

The best way (in theory) to exclude a GPU under plaidML is to run plaidml-setup from within your environment.

wentdot wrote: Thu Nov 12, 2020 2:38 pm
  1. I found another thread on here where someone set up faceswap to use rocm. I have attempted to follow their example but it didn't work (tells me tensorflow is not installed). But if anyone has done this successfully/recently, I would love to hear about it.

I believe the ROCm method was only working for tf 1.x. I'm not sure if the user got it working for tf 2.x yet, but he is often around in our Discord, so you could see if you can find him there.

My word is final

User avatar
akostadinov
Posts: 2
Joined: Thu Dec 03, 2020 5:18 pm

Re: AMD Vega 64 issues

Post by akostadinov »

Where is the "another thread"? Could you post a link?

Locked