Search found 35 matches

by dheinz70
Sat Nov 21, 2020 3:27 am
Forum: Hardware
Topic: What Do you think of this MB
Replies: 4
Views: 217

Re: What Do you think of this MB

My feeling I'm seeing a linux driver issue or a tensorflow issue. I installed windows on the new computer and tried to run the same test. Single gpu performance was about 10-20% worse under Windows. I could not get distributed to work at all. Kept throwing out tensorflow illegal memory errors. I als...
by dheinz70
Thu Nov 19, 2020 12:39 am
Forum: Hardware
Topic: What Do you think of this MB
Replies: 4
Views: 217

Re: What Do you think of this MB

Alright.... The new setup is built. Ubuntu 20.10, Ryzen 3800x, 32gb ram, and the MPG x570 Plus mb. Interesting results on my first tests. Tests with one gpu (did both GPU0 and GPU1) give me 20-21 EGs/sec with a batch of 7. Tests with distributed, batch of 14 (2x7) give me 24 EGs/sec. Only a slight g...
by dheinz70
Sat Nov 07, 2020 12:29 am
Forum: Hardware
Topic: What Do you think of this MB
Replies: 4
Views: 217

What Do you think of this MB

https://www.newegg.com/msi-mpg-x570-gam ... 6813144262

Another question, When using both PCIe slots do all MBs switch them to 8x8 or do some keep both the slots at 16x?

by dheinz70
Wed Nov 04, 2020 10:15 pm
Forum: General Chat
Topic: Save As bug
Replies: 1
Views: 95

Save As bug

When I save a new project it adds the extension twice.

"realfacetest.fsw .FSW" is what it named the file, I only entered "realfacetest" in the name window.

by dheinz70
Sat Oct 31, 2020 7:45 pm
Forum: Training
Topic: Log and graph weirdness
Replies: 7
Views: 311

Re: Log and graph weirdness

The two bugs I've seen:

Changing the smoothing from 0.9 causes the stats to crash

It shows more iterations than the session has done. Hope that helps.

by dheinz70
Mon Oct 19, 2020 11:51 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

The fact it drops down to 8x8 tells me it is probably mostly hardware. Just thought it was weird that running two cards is almost exactly half as productive.

Screenshot from 2020-10-19 18-49-07.png
Screenshot from 2020-10-19 18-49-07.png (35.19 KiB) Viewed 138 times
by dheinz70
Mon Oct 19, 2020 8:54 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

Another test with the same results. 2000 iterations on Original, single batch 150, distributed batch 300. Distributed is almost exactly half as efficient. Screenshot from 2020-10-19 15-49-51.png Watching the nvtop the stats never pegged and held there. avgeraged about 70% of what the gpus could hand...
by dheinz70
Sun Oct 18, 2020 6:34 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

2000 iterations of a batch of 2 each on single and distributed. I doubt it was ever enuf data to clog the pipes. My feeling is there is some hardware limitation, but i suspect there is something else going on too.

Screenshot from 2020-10-18 13-28-38.png
Screenshot from 2020-10-18 13-28-38.png (892 Bytes) Viewed 414 times
by dheinz70
Sun Oct 18, 2020 5:41 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

I added the coolbits to my xorg.conf, so I could control the fans on the cards. I cranked them all to 100% and the cards ran at 55C. Same slow results on distributed. The nvidia control panel lists 93C as the slowdown temp. Batch of 12 gave me 23 EG/s. Screenshot from 2020-10-18 12-41-24.png I'm goi...
by dheinz70
Sun Oct 18, 2020 3:00 am
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

Did a couple thousand iterations to test. Screenshot from 2020-10-17 21-54-05.png A singe gpu batch of 10, dual with a batch of 16. I'm getting better EGs/sec from the single gpu. It took 4 mins to start on distributed. 1.5 mins to start on single gpu. I watch nvtop and I never saw the rx/tx get peg...
by dheinz70
Sun Oct 18, 2020 2:09 am
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

Yes, I do lower to 80% of what one can handle. It isn't just a startup issue. Depending on the model it can take 5-10 mins to start. It just runs really slowly once it starts. In terms of EG/s I'm getting better performance from 1 gpu doing 1/2 as many at a time. I'm watching nvtop with the single 2...
by dheinz70
Sun Oct 18, 2020 1:44 am
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

PCIe 2.0 8x should be somewhere near 4 Gb/sec. Faceswap uses that much?

by dheinz70
Sat Oct 17, 2020 10:42 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

The MSI site says 2x16. Other sites show 1x16 and 1x8. Which would explain why the drop down to 1x8. Well, gonna take out one of the cards and see if the single runs at 16x.

-edit-
Yep, one card shows 16x.

Screenshot from 2020-10-17 17-54-10.png
Screenshot from 2020-10-17 17-54-10.png (56.84 KiB) Viewed 453 times

Time to start saving up for a Ryzen.....

by dheinz70
Sat Oct 17, 2020 9:30 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

Hmmm, you might be on to something. This is showing only 8x PCIe when they are both in use. Specs on my MB show 2x 16...

Screenshot from 2020-10-17 16-22-51.png
Screenshot from 2020-10-17 16-22-51.png (81.12 KiB) Viewed 459 times

Now with just GPU1 doing the work....

Screenshot from 2020-10-17 16-27-45.png
Screenshot from 2020-10-17 16-27-45.png (81.61 KiB) Viewed 459 times

Still 8x

by dheinz70
Sat Oct 17, 2020 7:31 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

Still having the issue. Check the specs, and my mb has 2 PCI Express 2.0 x16 slots. If I train on either alone gpu it screams. If I use distributed the performance is awful. Training on 1 gpu is twice as fast as training on two. So... a batch of 8 on Villain on a single gpu is giving me 27.6 EGs/sec...
by dheinz70
Mon Oct 12, 2020 10:44 pm
Forum: Training
Topic: Training Speed on Multi-GPU
Replies: 1
Views: 186

Training Speed on Multi-GPU

Also I noticed this the other day.

Distributed with a batch of 14, and only gpu1 with a batch of 7.

Shouldn't the distributed batch of 14 have roughly 2x the EG/s of the single gpu with a batch of 7?
Screenshot from 2020-10-12 17-39-26.png
Screenshot from 2020-10-12 17-39-26.png (2.61 KiB) Viewed 193 times
by dheinz70
Mon Oct 12, 2020 2:08 am
Forum: Training
Topic: Log and graph weirdness
Replies: 7
Views: 311

Log and graph weirdness

The Analysis tab shows more iterations than the status bar.

Also, the graph crashes or doesn't respond if you change smoothing and his the refresh button.
Screenshot from 2020-10-11 20-20-01.png
Screenshot from 2020-10-11 20-20-01.png (18.35 KiB) Viewed 311 times
by dheinz70
Tue Oct 06, 2020 10:33 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

Due to my snapping off the sata connector to my hdd I'm on a fresh install of Ubuntu 20.04 with the 450 driver also.

My only problem is the DFL-SAE model. (Allow growth checked everywhere it is an option)

All other models seem to work fine.
by dheinz70
Tue Oct 06, 2020 9:39 pm
Forum: Training
Topic: Distributed with Dual 2060 supers
Replies: 43
Views: 1486

Re: Distributed with Dual 2060 supers

After further testing it looks like all my problems come from the DFL-SAE model. It will only train on GPU1. Training on GPU0 or Distributed fail.

Training Villain with a batch of 16 right now distributed.

Thanks for all the help.