Just last week I completed building my deep learning rig featuring the beast RTX 3080 ti with 2 NVMe Samsung SSDs & 1 HDD.
Right now I testing two of my most interested models Realface & Disney 256 plugin under Phaze A.
What I have noticed is when using my Model & Training images folder on SSD, my GPU is used to 90% in Realface & 75% in Disney model.
The Realface model is set to 128px input 256px output.
I have to use batch 1 while training in Realface model as batch 8 results in OOM error after sometime.
On batch of 1, I get around 30k iterations in 1 hour on Realface model while having my Model & Training images folder on NVMe SSD.
But when I checked the Samsung Magician software, it said I have done a massive 1TB write in just one day.
As far as I remember, I had used faceswap for around 4-5 hrs in that one day.
I think this massive writes on SSD has to do with the Save Interval which was set to 250 & since I was getting the speed of 30k iterations/hr on batch 1,
it performed a massive 1 TB write in that one day on my SSD.
Therefore to save my SSD's life, I have shifted my Model's folder to HDD & increased the Save Interval to 1000.
But this has resulted in downgrading my speed which has now come around to 20k iterations/hr on batch 1 on Realface model.
The other difference I have seen is the GPU consumption which has gone down to around 75% from 90% which was on NVMe SSD.
It does gives me the benefit of lower temperatures & reduced watt usage & it will be useful for me since I am planning on training
for 7-8hrs/day in sessions of 1hr.
Following are the observations that I made using Open Hardware Monitor software :-
On NVMe SSD
GPU - RTX 3080 ti
Realface model - 128px Input 256px Output - Batch 1
30k iterations/hr
GPU usage - around 90%
GPU watt usage - around 280w - 310w
GPU - RTX 3080 ti
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 75%
GPU watt usage - around 240w - 270w
On HDD
GPU - RTX 3080 ti
Realface model - 128px Input 256px Output - Batch 1
20k iterations/hr
GPU usage - around 75%
GPU watt usage - around 240w - 270w
GPU - RTX 3080 ti
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 50%
GPU watt usage - around 190w - 220w
I have only shifted my Model's folder to HDD & while Training image's folder still remains on NVMe SSD.
I again ran the same benchmarks to get some more data updates on things like GPU temperature :-
On NVMe SSD
GPU - Asus TUF RTX 3080 ti 12GB OC
Realface model - 128px Input 256px Output - Batch 1
30k iterations/hr
GPU usage - around 90%
GPU watt usage - around 280w - 310w
GPU temperature - 67°C-69°C
GPU - Asus TUF RTX 3080 ti 12GB OC
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 75%
GPU watt usage - around 240w - 270w
GPU temperature - 60°C-62°C
On HDD
GPU - Asus TUF RTX 3080 ti 12GB OC
Realface model - 128px Input 256px Output - Batch 1
20k iterations/hr
GPU usage - around 75%
GPU watt usage - around 240w - 270w
GPU temperature - 62°C-64°C
GPU - Asus TUF RTX 3080 ti 12GB OC
Disney 256 model - Defaults - Batch 1
I didn't ran it for an hour so not recorded the iterations
GPU usage - around 40-50%
GPU watt usage - around 100w - 170w
GPU temperature - 42°C-46°C