Faceswap Forum

dheinz70

PCIe 2.0 8x should be somewhere near 4 Gb/sec. Faceswap uses that much?

dheinz70

The MSI site says 2x16. Other sites show 1x16 and 1x8. Which would explain why the drop down to 1x8. Well, gonna take out one of the cards and see if the single runs at 16x.

-edit-
Yep, one card shows 16x.

: Screenshot from 2020-10-17 17-54-10.png (56.84 KiB) Viewed 6892 times

Time to start saving up for a Ryzen.....

dheinz70

Hmmm, you might be on to something. This is showing only 8x PCIe when they are both in use. Specs on my MB show 2x 16...

: Screenshot from 2020-10-17 16-22-51.png (81.12 KiB) Viewed 6898 times

Now with just GPU1 doing the work....

: Screenshot from 2020-10-17 16-27-45.png (81.61 KiB) Viewed 6898 times

Still 8x

dheinz70

Still having the issue. Check the specs, and my mb has 2 PCI Express 2.0 x16 slots. If I train on either alone gpu it screams. If I use distributed the performance is awful. Training on 1 gpu is twice as fast as training on two. So... a batch of 8 on Villain on a single gpu is giving me 27.6 EGs/sec...

dheinz70

Also I noticed this the other day.

Distributed with a batch of 14, and only gpu1 with a batch of 7.

Shouldn't the distributed batch of 14 have roughly 2x the EG/s of the single gpu with a batch of 7?

: Screenshot from 2020-10-12 17-39-26.png (2.61 KiB) Viewed 1407 times

dheinz70

The Analysis tab shows more iterations than the status bar.

Also, the graph crashes or doesn't respond if you change smoothing and his the refresh button.

: Screenshot from 2020-10-11 20-20-01.png (18.35 KiB) Viewed 11688 times

dheinz70

Due to my snapping off the sata connector to my hdd I'm on a fresh install of Ubuntu 20.04 with the 450 driver also.

My only problem is the DFL-SAE model. (Allow growth checked everywhere it is an option)

All other models seem to work fine.

dheinz70

After further testing it looks like all my problems come from the DFL-SAE model. It will only train on GPU1. Training on GPU0 or Distributed fail.

Training Villain with a batch of 16 right now distributed.

Thanks for all the help.

dheinz70

I notice this A LOT in the verbose logs as it starts to train.

failed to allocate 3.24G (3477464576 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

Without MIXED PRECISION I see it a lot.

With it check just one or two instances. I beginning to think this is where the issue is lying.

dheinz70

Distributed Dlight works too.

Question: Is it normal for it to take 15-25 mins to start training on distributed? I've got a pretty beefy system, just wondering what's normal?

dheinz70

Distributed, DFL-SAE batch 2, FAILS.

Distributed, Villain batch 4, works.

Could it be an issue with SAE and distributed?

dheinz70

Redid alignments.

Distributed, Original, Batch 128 - worked!

Distributed, DFL-H128, batch 32 - worked!

I'll test more, but you might have figured it out. THANKS

dheinz70

Selecting ONLY GPU0 I can get lightweight (batch32), DF128 (10) and original working.

SAE with a batch of 1 fails.

Distributed - DFL-128 with batch of 8 running. This is wierd.

I'll try the alignment thing next.

dheinz70

Allow growth selected. Tried with batches of 2 and 1 still fails.

Attached is the system info.

dheinz70

My displayport to dvi adapter came this morning. I plugged monitor1 into gpu0. And Gnome came up using both screens as it did with my old single card. My guess is the Nvidia driver just doesn't want to have display0 plugged into gpu0 and display1 plugged into gpu1. It wants them both plugged into gp...

dheinz70

For any linux users I opened a ticket with Nvidia. Not a faceswap issue. It is clearly a driver problem with Nvidia.

https://forums.developer.nvidia.com/t/2 ... -04/156103

dheinz70

Definitely starting to look like it is a problem with X on linux and probably not a FaceSwap issue. It doesn't like anything to do with gpu0 no matter how I have the cards (1 or 2 installed). Found some others having similar probs, any help appreciated.

dheinz70

The verbose log is too big to attach. It keeps giving

2020-10-01 22:41:27.315577: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 5.06G (5437426176 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-10-

dheinz70

I've been doing some playing with this. It seems that GPU0 causes any attempt to use it to fail. It keeps spitting out OOM errors. When I just select GPU1 it works like it is supposed to. I'm using Ubuntu20.04 and I think it might be related to how my 2nd monitor (connected to the GPU1) won't work p...

dheinz70

Followup if I may. Why does it take so long to start? It stays at this point for quite a long time... 09/30/2020 22:31:50 INFO batch_all_reduce: 102 all-reduces with algorithm = nccl, num_packs = 1 09/30/2020 22:31:51 INFO Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/j...

Faceswap Forum

Search found 43 matches

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Training Speed on Multi-GPU

Log and graph weirdness

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers

Re: Distributed with Dual 2060 supers