Faceswap Model/Training folder on NVMe SSD - Great performance but with massive writes

Talk about Hardware used for Deep Learning


Locked
User avatar
vichitra5587
Posts: 17
Joined: Fri Aug 19, 2022 5:08 am
Location: India
Has thanked: 7 times
Been thanked: 4 times

Faceswap Model/Training folder on NVMe SSD - Great performance but with massive writes

Post by vichitra5587 »

Just last week I completed building my deep learning rig featuring the beast RTX 3080 ti with 2 NVMe Samsung SSDs & 1 HDD.

Right now I testing two of my most interested models Realface & Disney 256 plugin under Phaze A.
What I have noticed is when using my Model & Training images folder on SSD, my GPU is used to 90% in Realface & 75% in Disney model.
The Realface model is set to 128px input 256px output.

I have to use batch 1 while training in Realface model as batch 8 results in OOM error after sometime.
On batch of 1, I get around 30k iterations in 1 hour on Realface model while having my Model & Training images folder on NVMe SSD.
But when I checked the Samsung Magician software, it said I have done a massive 1TB write in just one day.
As far as I remember, I had used faceswap for around 4-5 hrs in that one day.

I think this massive writes on SSD has to do with the Save Interval which was set to 250 & since I was getting the speed of 30k iterations/hr on batch 1,
it performed a massive 1 TB write in that one day on my SSD.

Therefore to save my SSD's life, I have shifted my Model's folder to HDD & increased the Save Interval to 1000.
But this has resulted in downgrading my speed which has now come around to 20k iterations/hr on batch 1 on Realface model.

The other difference I have seen is the GPU consumption which has gone down to around 75% from 90% which was on NVMe SSD.
It does gives me the benefit of lower temperatures & reduced watt usage & it will be useful for me since I am planning on training
for 7-8hrs/day in sessions of 1hr.

Following are the observations that I made using Open Hardware Monitor software :-

On NVMe SSD

GPU - RTX 3080 ti
Realface model - 128px Input 256px Output - Batch 1
30k iterations/hr
GPU usage - around 90%
GPU watt usage - around 280w - 310w

GPU - RTX 3080 ti
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 75%
GPU watt usage - around 240w - 270w


On HDD

GPU - RTX 3080 ti
Realface model - 128px Input 256px Output - Batch 1
20k iterations/hr
GPU usage - around 75%
GPU watt usage - around 240w - 270w

GPU - RTX 3080 ti
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 50%
GPU watt usage - around 190w - 220w

I have only shifted my Model's folder to HDD & while Training image's folder still remains on NVMe SSD.


I again ran the same benchmarks to get some more data updates on things like GPU temperature :-

On NVMe SSD

GPU - Asus TUF RTX 3080 ti 12GB OC
Realface model - 128px Input 256px Output - Batch 1
30k iterations/hr
GPU usage - around 90%
GPU watt usage - around 280w - 310w
GPU temperature - 67°C-69°C

GPU - Asus TUF RTX 3080 ti 12GB OC
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 75%
GPU watt usage - around 240w - 270w
GPU temperature - 60°C-62°C


On HDD

GPU - Asus TUF RTX 3080 ti 12GB OC
Realface model - 128px Input 256px Output - Batch 1
20k iterations/hr
GPU usage - around 75%
GPU watt usage - around 240w - 270w
GPU temperature - 62°C-64°C

GPU - Asus TUF RTX 3080 ti 12GB OC
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 40-50%
GPU watt usage - around 100w - 170w
GPU temperature - 42°C-46°C

Ex-Dunning Kruger-ian

User avatar
Boogie
Posts: 6
Joined: Sun Jun 21, 2020 2:45 pm

Re: Faceswap Model/Training folder on NVMe SSD - Great performance but with massive writes

Post by Boogie »

How much disk space is required? if it's not too much, you could use a ramdisk.

User avatar
vichitra5587
Posts: 17
Joined: Fri Aug 19, 2022 5:08 am
Location: India
Has thanked: 7 times
Been thanked: 4 times

Re: Faceswap Model/Training folder on NVMe SSD - Great performance but with massive writes

Post by vichitra5587 »

Sorry for the late reply, I was kinda of busy.
I have already founded the solution for this.
I just increased the SAVE INTERVAL from default 250 to 10000.
To do that, click on the SAVE INTERVAL & enter the desired value manually.

My RTX 3080 Ti achieves 10k iterations in around 15-20 minutes on DIsney 512 model with default settings.
So now, it writes to my SSD only once in every 15-20 minutes.

The large writes in my case with RealFace 256 model was also due to its large file size which is just above 1GB.
So it was writing 1GB of data on my SSD within every minute for continuously for 4-5 hours.
But now, I am using the DIsney 512 model which has a file size of less than 200mb & that too it is saved only
once in every 15-20 minutes on my SSD.

Ex-Dunning Kruger-ian

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 177 times
Been thanked: 13 times

Re: Faceswap Model/Training folder on NVMe SSD - Great performance but with massive writes

Post by MaxHunter »

Aren't you worried that such writing will ruin your NVME?

User avatar
vichitra5587
Posts: 17
Joined: Fri Aug 19, 2022 5:08 am
Location: India
Has thanked: 7 times
Been thanked: 4 times

Re: Faceswap Model/Training folder on NVMe SSD - Great performance but with massive writes

Post by vichitra5587 »

MaxHunter wrote: Fri Oct 07, 2022 6:44 am

Aren't you worried that such writing will ruin your NVME?

I am using my Model's folder on a 120GB Kingston HyperXfury SSD which I had no use & it was lying around.
It has a tolerance of 354 TBW which I don't think any 120GB SSD has (I think this model has been discontinued).
I have not even written 1 TBW data on it till now & this SSD is dedicated for faceswap only &
no other write works will be done on it.

So, if you calculate by a write rate of less than 200Mb(filesize of Disney model) once in every 15-20 minutes,
it will take a very very long time to complete 354 TB.

Even if I was going to use faceswap now on my Samsung NVMe, it too has a tolerance of 300 TBW.
So in both cases, I think it will take about 15 years or more at this write rate before my SSD's life finishes.
By that time, I will have completed all the experiments I need to do with Faceswap.

Also, I have the Samsung Magician software to check my NVMe's write value.
I could easily monitor the write usage on my NVMe SSD & stop using faceswap
when it would reach half way of tolerance value (If I were to use faceswap on my Samsung NVMe).

Ex-Dunning Kruger-ian

Locked