GPU Usage Issues

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
extralush
Posts: 2
Joined: Tue Jan 07, 2020 2:22 pm

GPU Usage Issues

Post by extralush »

Hey guys-

New to FS so apologies in advance for what might be a rookie question:

I have 4x Titan Xp GPUs running on windows. I'm seeing some pretty slow training results and was hoping for some guidance on how i can improve my speeds.

Currently training with lightweight and seeing about 2 iterations per second. When training with Original i was seeing about 1 iteration per second.

With lightweight my CPU is running at 44% with one GPU at 36-44% and the other three GPUs at 16-18%.
With original my CPU is at 23% with the GPUs running the same.

I've adjusted the batch size in both models (as well as villain and realface) with little to no effect.

Task Manager screenshot below
https://www.dropbox.com/s/ft5mvmfok8kkd ... t.png?dl=0

One thing that I am not sure is normal or not is this message i get right up front when i load the model:
Loading...
Setting Faceswap backend to NVIDIA
01/07/2020 09:20:52 INFO Log level set to: INFO
Using TensorFlow backend.

Could this be an issue with my nvidia drivers?

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: GPU Usage Issues

Post by bryanlyon »

First off, please see app.php/faqpage#f0r3 regarding Task Manager. It's not a reliable measure.

Second, while 1 iter/sec sounds slow it is lacking very important context, what BS (Batch Size) are you trying to run at, and do you have your GPU count set correctly?

Lastly, Original and Lowmem are not going to be the best models for you, with your cards you would want to look up at the higher models. Lightweight will benefit very little from multiple GPUs due to it's size as it's built for RAM savings, not for scaling.

GPU usage is probably a problem for you here, you might want to follow the directions in other threads about how to make sure you're running tensorflow-gpu and that you have all the right pieces installed.

If you're still having a problem, please post more information including the command line you're running (Hit "generate" in the GUI) and the system info (Help/Output System Information).

User avatar
extralush
Posts: 2
Joined: Tue Jan 07, 2020 2:22 pm

Re: GPU Usage Issues

Post by extralush »

Thanks for the feedback bryanlyon.

I tried a variable number of batch sizes from 16-128. Seem to have the same effect. I was wondering if tensorflow-gpu was supposed to be running, figured that message was red for a reason.

I'll read through the other posts and re-evaluate if have everything installed properly. Thanks again for the direction!

User avatar
torzdf
Posts: 2687
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 135 times
Been thanked: 628 times

Re: GPU Usage Issues

Post by torzdf »

That message is normal.....

Keras outputs the backend it is using to stderr, which the GUI defaults to red.

My word is final

Locked