Out of Memory after 6 Hours of Steady Training

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

Hi! I have been training a model on DFL-SAE for the past month, and each time I press train and it starts training, it will run smoothly for about 4-6 hours, before giving me an out-of-memory error. Here's the latest output as an example:

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
07/10/2022 21:19:47 INFO     Log level set to: INFO
07/10/2022 21:19:50 INFO     Model A Directory: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\data-dst-v2\training-faces2' (645 images)
07/10/2022 21:19:50 INFO     Model B Directory: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\data-src\training_faces' (4501 images)
07/10/2022 21:19:50 INFO     Training data directory: C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\model\latest
07/10/2022 21:19:50 INFO     ===================================================
07/10/2022 21:19:50 INFO       Starting
07/10/2022 21:19:50 INFO     ===================================================
07/10/2022 21:19:51 INFO     Loading data, this may take a while...
07/10/2022 21:19:51 INFO     Loading Model from Dfl_Sae plugin...
07/10/2022 21:19:51 INFO     Using configuration saved in state file
07/10/2022 21:19:51 INFO     Setting allow growth for GPU: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
07/10/2022 21:19:53 INFO     Loaded model from disk: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\model\latest\dfl_sae.h5'
07/10/2022 21:19:53 INFO     Loading Trainer from Original plugin...

07/10/2022 21:20:55 INFO     [Saved models] - Average loss since last save: face_a: 0.05369, face_b: 0.10284

07/10/2022 21:24:36 INFO     [Saved models] - Average loss since last save: face_a: 0.07755, face_b: 0.09259

07/10/2022 21:28:17 INFO     [Saved models] - Average loss since last save: face_a: 0.07797, face_b: 0.09622

07/10/2022 21:31:56 INFO     [Saved models] - Average loss since last save: face_a: 0.07773, face_b: 0.09112

07/10/2022 21:35:34 INFO     [Saved models] - Average loss since last save: face_a: 0.07760, face_b: 0.09373

07/10/2022 21:39:11 INFO     [Saved models] - Average loss since last save: face_a: 0.07949, face_b: 0.09314

07/10/2022 21:42:50 INFO     [Saved models] - Average loss since last save: face_a: 0.07961, face_b: 0.09702

07/10/2022 21:46:29 INFO     [Saved models] - Average loss since last save: face_a: 0.07759, face_b: 0.09640

07/10/2022 21:50:12 INFO     [Saved models] - Average loss since last save: face_a: 0.07960, face_b: 0.09500

07/10/2022 21:53:56 INFO     [Saved models] - Average loss since last save: face_a: 0.07620, face_b: 0.09647

07/10/2022 21:57:38 INFO     [Saved models] - Average loss since last save: face_a: 0.07826, face_b: 0.09882

07/10/2022 22:01:17 INFO     [Saved models] - Average loss since last save: face_a: 0.07747, face_b: 0.09434

07/10/2022 22:04:54 INFO     [Saved models] - Average loss since last save: face_a: 0.08103, face_b: 0.09872

07/10/2022 22:08:30 INFO     [Saved models] - Average loss since last save: face_a: 0.07650, face_b: 0.09557

07/10/2022 22:12:06 INFO     [Saved models] - Average loss since last save: face_a: 0.07962, face_b: 0.09446

07/10/2022 22:15:42 INFO     [Saved models] - Average loss since last save: face_a: 0.07678, face_b: 0.09453

07/10/2022 22:19:17 INFO     [Saved models] - Average loss since last save: face_a: 0.08229, face_b: 0.09658

07/10/2022 22:22:53 INFO     [Saved models] - Average loss since last save: face_a: 0.07912, face_b: 0.09463

07/10/2022 22:26:29 INFO     [Saved models] - Average loss since last save: face_a: 0.07907, face_b: 0.09408

07/10/2022 22:30:05 INFO     [Saved models] - Average loss since last save: face_a: 0.07969, face_b: 0.09567

07/10/2022 22:33:40 INFO     [Saved models] - Average loss since last save: face_a: 0.07814, face_b: 0.09592

07/10/2022 22:37:17 INFO     [Saved models] - Average loss since last save: face_a: 0.08148, face_b: 0.09338

07/10/2022 22:40:53 INFO     [Saved models] - Average loss since last save: face_a: 0.07844, face_b: 0.09644

07/10/2022 22:44:30 INFO     [Saved models] - Average loss since last save: face_a: 0.07917, face_b: 0.09601

07/10/2022 22:48:06 INFO     [Saved models] - Average loss since last save: face_a: 0.08062, face_b: 0.09612

07/10/2022 22:51:42 INFO     [Saved models] - Average loss since last save: face_a: 0.07801, face_b: 0.09760

07/10/2022 22:55:18 INFO     [Saved models] - Average loss since last save: face_a: 0.07940, face_b: 0.09125

07/10/2022 22:58:54 INFO     [Saved models] - Average loss since last save: face_a: 0.07934, face_b: 0.09323

07/10/2022 23:02:29 INFO     [Saved models] - Average loss since last save: face_a: 0.07820, face_b: 0.09602

07/10/2022 23:06:06 INFO     [Saved models] - Average loss since last save: face_a: 0.07865, face_b: 0.09869

07/10/2022 23:09:41 INFO     [Saved models] - Average loss since last save: face_a: 0.07935, face_b: 0.09586

07/10/2022 23:13:18 INFO     [Saved models] - Average loss since last save: face_a: 0.07851, face_b: 0.09326

07/10/2022 23:16:55 INFO     [Saved models] - Average loss since last save: face_a: 0.07774, face_b: 0.09633

07/10/2022 23:20:31 INFO     [Saved models] - Average loss since last save: face_a: 0.07908, face_b: 0.09649

07/10/2022 23:24:07 INFO     [Saved models] - Average loss since last save: face_a: 0.07925, face_b: 0.09691

07/10/2022 23:27:43 INFO     [Saved models] - Average loss since last save: face_a: 0.08005, face_b: 0.09347

07/10/2022 23:31:19 INFO     [Saved models] - Average loss since last save: face_a: 0.07891, face_b: 0.09387

07/10/2022 23:34:55 INFO     [Saved models] - Average loss since last save: face_a: 0.07824, face_b: 0.09808

07/10/2022 23:38:31 INFO     [Saved models] - Average loss since last save: face_a: 0.07809, face_b: 0.09391

07/10/2022 23:42:07 INFO     [Saved models] - Average loss since last save: face_a: 0.07788, face_b: 0.09536

07/10/2022 23:45:42 INFO     [Saved models] - Average loss since last save: face_a: 0.08059, face_b: 0.09563

07/10/2022 23:49:18 INFO     [Saved models] - Average loss since last save: face_a: 0.07899, face_b: 0.09372

07/10/2022 23:52:54 INFO     [Saved models] - Average loss since last save: face_a: 0.07613, face_b: 0.09840

07/10/2022 23:56:30 INFO     [Saved models] - Average loss since last save: face_a: 0.08088, face_b: 0.09789

07/10/2022 23:56:36 INFO     Saved snapshot (1700000 iterations)

07/11/2022 00:00:08 INFO     [Saved models] - Average loss since last save: face_a: 0.07852, face_b: 0.09435

07/11/2022 00:03:44 INFO     [Saved models] - Average loss since last save: face_a: 0.07968, face_b: 0.09321

07/11/2022 00:07:20 INFO     [Saved models] - Average loss since last save: face_a: 0.08029, face_b: 0.09513

07/11/2022 00:10:55 INFO     [Saved models] - Average loss since last save: face_a: 0.07831, face_b: 0.09470

07/11/2022 00:14:31 INFO     [Saved models] - Average loss since last save: face_a: 0.08060, face_b: 0.09495

07/11/2022 00:16:25 ERROR    Caught exception in thread: '_training_0'
07/11/2022 00:16:25 ERROR    You do not have enough GPU memory available to train the selected model at the selected settings. You can try a number of things:
07/11/2022 00:16:25 ERROR    1) Close any other application that is using your GPU (web browsers are particularly bad for this).
07/11/2022 00:16:25 ERROR    2) Lower the batchsize (the amount of images fed into the model each iteration).
07/11/2022 00:16:25 ERROR    3) Try enabling 'Mixed Precision' training.
07/11/2022 00:16:25 ERROR    4) Use a more lightweight model, or select the model's 'LowMem' option (in config) if it has one.
Process exited.

I would attach a crash log, but it doesn't output one in the faceswap folder.

When it does give me the error, I simply click the train button again and it trains smoothly for another 6 hours, before eventually crashing as it did before. Is there a way to make it run without crashing, or a way to have it automatically restart when it does run out of memory? Here is a summary of the model if you need it. Also, I use the computer strictly for deepfakes, and the only thing that runs while it's training is Faceswap. No browsers or other ram hogs. Hopefully there is a solution.

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
07/11/2022 10:45:31 INFO     Log level set to: INFO
07/11/2022 10:45:34 INFO     Loading Model from Dfl_Sae plugin...
07/11/2022 10:45:34 INFO     Using configuration saved in state file
07/11/2022 10:45:34 INFO     Setting allow growth for GPU: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
07/11/2022 10:45:36 INFO     Loaded model from disk: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\model\latest\dfl_sae.h5'
Model: "encoder_df"
____________________________________________________________________________________________________
 Layer (type)                                Output Shape                            Param #
====================================================================================================
 input_1 (InputLayer)                        [(None, 144, 144, 3)]                   0

 conv_126_0_conv2d (Conv2D)                  (None, 72, 72, 126)                     9576

 conv_126_0_leakyrelu (LeakyReLU)            (None, 72, 72, 126)                     0

 conv_252_0_conv2d (Conv2D)                  (None, 36, 36, 252)                     794052

 conv_252_0_leakyrelu (LeakyReLU)            (None, 36, 36, 252)                     0

 conv_504_0_conv2d (Conv2D)                  (None, 18, 18, 504)                     3175704

 conv_504_0_leakyrelu (LeakyReLU)            (None, 18, 18, 504)                     0

 conv_1008_0_conv2d (Conv2D)                 (None, 9, 9, 1008)                      12701808

 conv_1008_0_leakyrelu (LeakyReLU)           (None, 9, 9, 1008)                      0

 flatten (Flatten)                           (None, 81648)                           0

 dense (Dense)                               (None, 512)                             41804288

 dense_1 (Dense)                             (None, 41472)                           21275136

 reshape (Reshape)                           (None, 9, 9, 512)                       0

 upscale_512_0_conv2d_conv2d (Conv2D)        (None, 9, 9, 2048)                      9439232

 upscale_512_0_conv2d_leakyrelu (LeakyReLU)  (None, 9, 9, 2048)                      0

 upscale_512_0_pixelshuffler (PixelShuffler)  (None, 18, 18, 512)                    0

====================================================================================================
Total params: 89,199,796
Trainable params: 89,199,796
Non-trainable params: 0
____________________________________________________________________________________________________
Model: "decoder_a"
____________________________________________________________________________________________________
 Layer (type)                    Output Shape          Param #     Connected to
====================================================================================================
 input_2 (InputLayer)            [(None, 18, 18, 512)  0           []
                                 ]

 upscale_504_0_conv2d_conv2d (Co  (None, 18, 18, 2016)  9291744    ['input_2[0][0]']
 nv2D)

 upscale_504_0_pixelshuffler (Pi  (None, 36, 36, 504)  0           ['upscale_504_0_conv2d_conv2d[0][
 xelShuffler)                                                      0]']

 leaky_re_lu (LeakyReLU)         (None, 36, 36, 504)   0           ['upscale_504_0_pixelshuffler[0][
                                                                   0]']

 residual_504_0_conv2d_0 (Conv2D  (None, 36, 36, 504)  2286648     ['leaky_re_lu[0][0]']
 )

 residual_504_0_leakyrelu_1 (Lea  (None, 36, 36, 504)  0           ['residual_504_0_conv2d_0[0][0]']
 kyReLU)

 residual_504_0_conv2d_1 (Conv2D  (None, 36, 36, 504)  2286648     ['residual_504_0_leakyrelu_1[0][0
 )                                                                 ]']

 add (Add)                       (None, 36, 36, 504)   0           ['residual_504_0_conv2d_1[0][0]',
                                                                    'leaky_re_lu[0][0]']

 residual_504_0_leakyrelu_3 (Lea  (None, 36, 36, 504)  0           ['add[0][0]']
 kyReLU)

 residual_504_1_conv2d_0 (Conv2D  (None, 36, 36, 504)  2286648     ['residual_504_0_leakyrelu_3[0][0
 )                                                                 ]']

 residual_504_1_leakyrelu_1 (Lea  (None, 36, 36, 504)  0           ['residual_504_1_conv2d_0[0][0]']
 kyReLU)

 residual_504_1_conv2d_1 (Conv2D  (None, 36, 36, 504)  2286648     ['residual_504_1_leakyrelu_1[0][0
 )                                                                 ]']

 add_1 (Add)                     (None, 36, 36, 504)   0           ['residual_504_1_conv2d_1[0][0]',
                                                                    'residual_504_0_leakyrelu_3[0][0
                                                                   ]']

 residual_504_1_leakyrelu_3 (Lea  (None, 36, 36, 504)  0           ['add_1[0][0]']
 kyReLU)

 upscale_252_0_conv2d_conv2d (Co  (None, 36, 36, 1008)  4573296    ['residual_504_1_leakyrelu_3[0][0
 nv2D)                                                             ]']

 upscale_252_0_pixelshuffler (Pi  (None, 72, 72, 252)  0           ['upscale_252_0_conv2d_conv2d[0][
 xelShuffler)                                                      0]']

 leaky_re_lu_1 (LeakyReLU)       (None, 72, 72, 252)   0           ['upscale_252_0_pixelshuffler[0][
                                                                   0]']

 residual_252_0_conv2d_0 (Conv2D  (None, 72, 72, 252)  571788      ['leaky_re_lu_1[0][0]']
 )

 residual_252_0_leakyrelu_1 (Lea  (None, 72, 72, 252)  0           ['residual_252_0_conv2d_0[0][0]']
 kyReLU)

 residual_252_0_conv2d_1 (Conv2D  (None, 72, 72, 252)  571788      ['residual_252_0_leakyrelu_1[0][0
 )                                                                 ]']

 add_2 (Add)                     (None, 72, 72, 252)   0           ['residual_252_0_conv2d_1[0][0]',
                                                                    'leaky_re_lu_1[0][0]']

 residual_252_0_leakyrelu_3 (Lea  (None, 72, 72, 252)  0           ['add_2[0][0]']
 kyReLU)

 residual_252_1_conv2d_0 (Conv2D  (None, 72, 72, 252)  571788      ['residual_252_0_leakyrelu_3[0][0
 )                                                                 ]']

 residual_252_1_leakyrelu_1 (Lea  (None, 72, 72, 252)  0           ['residual_252_1_conv2d_0[0][0]']
 kyReLU)

 residual_252_1_conv2d_1 (Conv2D  (None, 72, 72, 252)  571788      ['residual_252_1_leakyrelu_1[0][0
 )                                                                 ]']

 add_3 (Add)                     (None, 72, 72, 252)   0           ['residual_252_1_conv2d_1[0][0]',
                                                                    'residual_252_0_leakyrelu_3[0][0
                                                                   ]']

 residual_252_1_leakyrelu_3 (Lea  (None, 72, 72, 252)  0           ['add_3[0][0]']
 kyReLU)

 upscale_126_0_conv2d_conv2d (Co  (None, 72, 72, 504)  1143576     ['residual_252_1_leakyrelu_3[0][0
 nv2D)                                                             ]']

 upscale_126_0_pixelshuffler (Pi  (None, 144, 144, 126  0          ['upscale_126_0_conv2d_conv2d[0][
 xelShuffler)                    )                                 0]']

 leaky_re_lu_2 (LeakyReLU)       (None, 144, 144, 126  0           ['upscale_126_0_pixelshuffler[0][
                                 )                                 0]']

 residual_126_0_conv2d_0 (Conv2D  (None, 144, 144, 126  143010     ['leaky_re_lu_2[0][0]']
 )                               )

 residual_126_0_leakyrelu_1 (Lea  (None, 144, 144, 126  0          ['residual_126_0_conv2d_0[0][0]']
 kyReLU)                         )

 upscale_168_0_conv2d_conv2d (Co  (None, 18, 18, 672)  3097248     ['input_2[0][0]']
 nv2D)

 residual_126_0_conv2d_1 (Conv2D  (None, 144, 144, 126  143010     ['residual_126_0_leakyrelu_1[0][0
 )                               )                                 ]']

 upscale_168_0_conv2d_leakyrelu   (None, 18, 18, 672)  0           ['upscale_168_0_conv2d_conv2d[0][
 (LeakyReLU)                                                       0]']

 add_4 (Add)                     (None, 144, 144, 126  0           ['residual_126_0_conv2d_1[0][0]',
                                 )                                  'leaky_re_lu_2[0][0]']

 upscale_168_0_pixelshuffler (Pi  (None, 36, 36, 168)  0           ['upscale_168_0_conv2d_leakyrelu[
 xelShuffler)                                                      0][0]']

 residual_126_0_leakyrelu_3 (Lea  (None, 144, 144, 126  0          ['add_4[0][0]']
 kyReLU)                         )

 upscale_84_0_conv2d_conv2d (Con  (None, 36, 36, 336)  508368      ['upscale_168_0_pixelshuffler[0][
 v2D)                                                              0]']

 residual_126_1_conv2d_0 (Conv2D  (None, 144, 144, 126  143010     ['residual_126_0_leakyrelu_3[0][0
 )                               )                                 ]']

 upscale_84_0_conv2d_leakyrelu (  (None, 36, 36, 336)  0           ['upscale_84_0_conv2d_conv2d[0][0
 LeakyReLU)                                                        ]']

 residual_126_1_leakyrelu_1 (Lea  (None, 144, 144, 126  0          ['residual_126_1_conv2d_0[0][0]']
 kyReLU)                         )

 upscale_84_0_pixelshuffler (Pix  (None, 72, 72, 84)   0           ['upscale_84_0_conv2d_leakyrelu[0
 elShuffler)                                                       ][0]']

 residual_126_1_conv2d_1 (Conv2D  (None, 144, 144, 126  143010     ['residual_126_1_leakyrelu_1[0][0
 )                               )                                 ]']

 upscale_42_0_conv2d_conv2d (Con  (None, 72, 72, 168)  127176      ['upscale_84_0_pixelshuffler[0][0
 v2D)                                                              ]']

 add_5 (Add)                     (None, 144, 144, 126  0           ['residual_126_1_conv2d_1[0][0]',
                                 )                                  'residual_126_0_leakyrelu_3[0][0
                                                                   ]']

 upscale_42_0_conv2d_leakyrelu (  (None, 72, 72, 168)  0           ['upscale_42_0_conv2d_conv2d[0][0
 LeakyReLU)                                                        ]']

 residual_126_1_leakyrelu_3 (Lea  (None, 144, 144, 126  0          ['add_5[0][0]']
 kyReLU)                         )

 upscale_42_0_pixelshuffler (Pix  (None, 144, 144, 42)  0          ['upscale_42_0_conv2d_leakyrelu[0
 elShuffler)                                                       ][0]']

 face_out_32_a_conv2d (Conv2D)   (None, 36, 36, 3)     37803       ['residual_504_1_leakyrelu_3[0][0
                                                                   ]']

 face_out_64_a_conv2d (Conv2D)   (None, 72, 72, 3)     18903       ['residual_252_1_leakyrelu_3[0][0
                                                                   ]']

 face_out_128_a_conv2d (Conv2D)  (None, 144, 144, 3)   9453        ['residual_126_1_leakyrelu_3[0][0
                                                                   ]']

 mask_out_a_conv2d (Conv2D)      (None, 144, 144, 1)   1051        ['upscale_42_0_pixelshuffler[0][0
                                                                   ]']

 face_out_32_a (Activation)      (None, 36, 36, 3)     0           ['face_out_32_a_conv2d[0][0]']

 face_out_64_a (Activation)      (None, 72, 72, 3)     0           ['face_out_64_a_conv2d[0][0]']

 face_out_128_a (Activation)     (None, 144, 144, 3)   0           ['face_out_128_a_conv2d[0][0]']

 mask_out_a (Activation)         (None, 144, 144, 1)   0           ['mask_out_a_conv2d[0][0]']

====================================================================================================
Total params: 30,814,402
Trainable params: 30,814,402
Non-trainable params: 0
____________________________________________________________________________________________________
Model: "decoder_b"
____________________________________________________________________________________________________
 Layer (type)                    Output Shape          Param #     Connected to
====================================================================================================
 input_3 (InputLayer)            [(None, 18, 18, 512)  0           []
                                 ]

 upscale_504_1_conv2d_conv2d (Co  (None, 18, 18, 2016)  9291744    ['input_3[0][0]']
 nv2D)

 upscale_504_1_pixelshuffler (Pi  (None, 36, 36, 504)  0           ['upscale_504_1_conv2d_conv2d[0][
 xelShuffler)                                                      0]']

 leaky_re_lu_3 (LeakyReLU)       (None, 36, 36, 504)   0           ['upscale_504_1_pixelshuffler[0][
                                                                   0]']

 residual_504_2_conv2d_0 (Conv2D  (None, 36, 36, 504)  2286648     ['leaky_re_lu_3[0][0]']
 )

 residual_504_2_leakyrelu_1 (Lea  (None, 36, 36, 504)  0           ['residual_504_2_conv2d_0[0][0]']
 kyReLU)

 residual_504_2_conv2d_1 (Conv2D  (None, 36, 36, 504)  2286648     ['residual_504_2_leakyrelu_1[0][0
 )                                                                 ]']

 add_6 (Add)                     (None, 36, 36, 504)   0           ['residual_504_2_conv2d_1[0][0]',
                                                                    'leaky_re_lu_3[0][0]']

 residual_504_2_leakyrelu_3 (Lea  (None, 36, 36, 504)  0           ['add_6[0][0]']
 kyReLU)

 residual_504_3_conv2d_0 (Conv2D  (None, 36, 36, 504)  2286648     ['residual_504_2_leakyrelu_3[0][0
 )                                                                 ]']

 residual_504_3_leakyrelu_1 (Lea  (None, 36, 36, 504)  0           ['residual_504_3_conv2d_0[0][0]']
 kyReLU)

 residual_504_3_conv2d_1 (Conv2D  (None, 36, 36, 504)  2286648     ['residual_504_3_leakyrelu_1[0][0
 )                                                                 ]']

 add_7 (Add)                     (None, 36, 36, 504)   0           ['residual_504_3_conv2d_1[0][0]',
                                                                    'residual_504_2_leakyrelu_3[0][0
                                                                   ]']

 residual_504_3_leakyrelu_3 (Lea  (None, 36, 36, 504)  0           ['add_7[0][0]']
 kyReLU)

 upscale_252_1_conv2d_conv2d (Co  (None, 36, 36, 1008)  4573296    ['residual_504_3_leakyrelu_3[0][0
 nv2D)                                                             ]']

 upscale_252_1_pixelshuffler (Pi  (None, 72, 72, 252)  0           ['upscale_252_1_conv2d_conv2d[0][
 xelShuffler)                                                      0]']

 leaky_re_lu_4 (LeakyReLU)       (None, 72, 72, 252)   0           ['upscale_252_1_pixelshuffler[0][
                                                                   0]']

 residual_252_2_conv2d_0 (Conv2D  (None, 72, 72, 252)  571788      ['leaky_re_lu_4[0][0]']
 )

 residual_252_2_leakyrelu_1 (Lea  (None, 72, 72, 252)  0           ['residual_252_2_conv2d_0[0][0]']
 kyReLU)

 residual_252_2_conv2d_1 (Conv2D  (None, 72, 72, 252)  571788      ['residual_252_2_leakyrelu_1[0][0
 )                                                                 ]']

 add_8 (Add)                     (None, 72, 72, 252)   0           ['residual_252_2_conv2d_1[0][0]',
                                                                    'leaky_re_lu_4[0][0]']

 residual_252_2_leakyrelu_3 (Lea  (None, 72, 72, 252)  0           ['add_8[0][0]']
 kyReLU)

 residual_252_3_conv2d_0 (Conv2D  (None, 72, 72, 252)  571788      ['residual_252_2_leakyrelu_3[0][0
 )                                                                 ]']

 residual_252_3_leakyrelu_1 (Lea  (None, 72, 72, 252)  0           ['residual_252_3_conv2d_0[0][0]']
 kyReLU)

 residual_252_3_conv2d_1 (Conv2D  (None, 72, 72, 252)  571788      ['residual_252_3_leakyrelu_1[0][0
 )                                                                 ]']

 add_9 (Add)                     (None, 72, 72, 252)   0           ['residual_252_3_conv2d_1[0][0]',
                                                                    'residual_252_2_leakyrelu_3[0][0
                                                                   ]']

 residual_252_3_leakyrelu_3 (Lea  (None, 72, 72, 252)  0           ['add_9[0][0]']
 kyReLU)

 upscale_126_1_conv2d_conv2d (Co  (None, 72, 72, 504)  1143576     ['residual_252_3_leakyrelu_3[0][0
 nv2D)                                                             ]']

 upscale_126_1_pixelshuffler (Pi  (None, 144, 144, 126  0          ['upscale_126_1_conv2d_conv2d[0][
 xelShuffler)                    )                                 0]']

 leaky_re_lu_5 (LeakyReLU)       (None, 144, 144, 126  0           ['upscale_126_1_pixelshuffler[0][
                                 )                                 0]']

 residual_126_2_conv2d_0 (Conv2D  (None, 144, 144, 126  143010     ['leaky_re_lu_5[0][0]']
 )                               )

 residual_126_2_leakyrelu_1 (Lea  (None, 144, 144, 126  0          ['residual_126_2_conv2d_0[0][0]']
 kyReLU)                         )

 upscale_168_1_conv2d_conv2d (Co  (None, 18, 18, 672)  3097248     ['input_3[0][0]']
 nv2D)

 residual_126_2_conv2d_1 (Conv2D  (None, 144, 144, 126  143010     ['residual_126_2_leakyrelu_1[0][0
 )                               )                                 ]']

 upscale_168_1_conv2d_leakyrelu   (None, 18, 18, 672)  0           ['upscale_168_1_conv2d_conv2d[0][
 (LeakyReLU)                                                       0]']

 add_10 (Add)                    (None, 144, 144, 126  0           ['residual_126_2_conv2d_1[0][0]',
                                 )                                  'leaky_re_lu_5[0][0]']

 upscale_168_1_pixelshuffler (Pi  (None, 36, 36, 168)  0           ['upscale_168_1_conv2d_leakyrelu[
 xelShuffler)                                                      0][0]']

 residual_126_2_leakyrelu_3 (Lea  (None, 144, 144, 126  0          ['add_10[0][0]']
 kyReLU)                         )

 upscale_84_1_conv2d_conv2d (Con  (None, 36, 36, 336)  508368      ['upscale_168_1_pixelshuffler[0][
 v2D)                                                              0]']

 residual_126_3_conv2d_0 (Conv2D  (None, 144, 144, 126  143010     ['residual_126_2_leakyrelu_3[0][0
 )                               )                                 ]']

 upscale_84_1_conv2d_leakyrelu (  (None, 36, 36, 336)  0           ['upscale_84_1_conv2d_conv2d[0][0
 LeakyReLU)                                                        ]']

 residual_126_3_leakyrelu_1 (Lea  (None, 144, 144, 126  0          ['residual_126_3_conv2d_0[0][0]']
 kyReLU)                         )

 upscale_84_1_pixelshuffler (Pix  (None, 72, 72, 84)   0           ['upscale_84_1_conv2d_leakyrelu[0
 elShuffler)                                                       ][0]']

 residual_126_3_conv2d_1 (Conv2D  (None, 144, 144, 126  143010     ['residual_126_3_leakyrelu_1[0][0
 )                               )                                 ]']

 upscale_42_1_conv2d_conv2d (Con  (None, 72, 72, 168)  127176      ['upscale_84_1_pixelshuffler[0][0
 v2D)                                                              ]']

 add_11 (Add)                    (None, 144, 144, 126  0           ['residual_126_3_conv2d_1[0][0]',
                                 )                                  'residual_126_2_leakyrelu_3[0][0
                                                                   ]']

 upscale_42_1_conv2d_leakyrelu (  (None, 72, 72, 168)  0           ['upscale_42_1_conv2d_conv2d[0][0
 LeakyReLU)                                                        ]']

 residual_126_3_leakyrelu_3 (Lea  (None, 144, 144, 126  0          ['add_11[0][0]']
 kyReLU)                         )

 upscale_42_1_pixelshuffler (Pix  (None, 144, 144, 42)  0          ['upscale_42_1_conv2d_leakyrelu[0
 elShuffler)                                                       ][0]']

 face_out_32_b_conv2d (Conv2D)   (None, 36, 36, 3)     37803       ['residual_504_3_leakyrelu_3[0][0
                                                                   ]']

 face_out_64_b_conv2d (Conv2D)   (None, 72, 72, 3)     18903       ['residual_252_3_leakyrelu_3[0][0
                                                                   ]']

 face_out_128_b_conv2d (Conv2D)  (None, 144, 144, 3)   9453        ['residual_126_3_leakyrelu_3[0][0
                                                                   ]']

 mask_out_b_conv2d (Conv2D)      (None, 144, 144, 1)   1051        ['upscale_42_1_pixelshuffler[0][0
                                                                   ]']

 face_out_32_b (Activation)      (None, 36, 36, 3)     0           ['face_out_32_b_conv2d[0][0]']

 face_out_64_b (Activation)      (None, 72, 72, 3)     0           ['face_out_64_b_conv2d[0][0]']

 face_out_128_b (Activation)     (None, 144, 144, 3)   0           ['face_out_128_b_conv2d[0][0]']

 mask_out_b (Activation)         (None, 144, 144, 1)   0           ['mask_out_b_conv2d[0][0]']

====================================================================================================
Total params: 30,814,402
Trainable params: 30,814,402
Non-trainable params: 0
____________________________________________________________________________________________________
Model: "dfl_sae_df"
____________________________________________________________________________________________________
 Layer (type)                    Output Shape          Param #     Connected to
====================================================================================================
 face_in_a (InputLayer)          [(None, 144, 144, 3)  0           []
                                 ]

 face_in_b (InputLayer)          [(None, 144, 144, 3)  0           []
                                 ]

 encoder_df (Functional)         (None, 18, 18, 512)   89199796    ['face_in_a[0][0]',
                                                                    'face_in_b[0][0]']

 decoder_a (Functional)          [(None, 36, 36, 3),   30814402    ['encoder_df[0][0]']
                                  (None, 72, 72, 3),
                                  (None, 144, 144, 3)
                                 , (None, 144, 144, 1
                                 )]

 decoder_b (Functional)          [(None, 36, 36, 3),   30814402    ['encoder_df[1][0]']
                                  (None, 72, 72, 3),
                                  (None, 144, 144, 3)
                                 , (None, 144, 144, 1
                                 )]

====================================================================================================
Total params: 150,828,600
Trainable params: 150,828,600
Non-trainable params: 0
____________________________________________________________________________________________________
Process exited.
User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory after 6 Hours of Steady Training

Post by torzdf »

This looks to be a regression in Tensorflow which appears to impact some users and not others.

In theory, this shouldn't be able to happen, as TF allocates all of the VRAM it requires at the start, so it should not run out of VRAM several hours in to a training session.

I have seen this happen once (under Linux), but it has not been repeatable. The best I can advise is to hope this issue goes away when we upgrade our Tensorflow dependency at some point in the future. Not ideal, I know.

If you are comfortable downgrading Tensorflow, then you could try downgrading to version 2.6 (which is still supported by Faceswap) to see if the issue persists.

My word is final

User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Re: Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

Great. I think I'll give that a try! How can I do that? :?

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory after 6 Hours of Steady Training

Post by torzdf »

If you're asking how to do it, it probably won't be a good idea to do it.

However, I will detail the steps to downgrade Tensorflow here.

Please note, after this, you are on your own. If it goes wrong, then I am unlikely to help with troubleshooting the issue as it is not a great use of my time (if it does go wrong, you can always delete your faceswap folder and reinstall the app).

  • Start an Anaconda Prompt: Start > Anaconda Prompt
  • Exexute the following commands in the window that pops up:

    Code: Select all

    conda activate faceswap
    pip install tensorflow-gpu==2.6.5

Close the anaconda prompt and launch faceswap as usual

My word is final

User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Re: Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

Well, I tried it out, and faceswap didn't start. I know you said that you continue to look into it, and that's alright. I will reinstall faceswap. I appreciate your help, thank you for your time. :)

I thought I might put the crash log here, just in case.

crash_report.2022.07.13.084739020906.log
(106.53 KiB) Downloaded 85 times
User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory after 6 Hours of Steady Training

Post by torzdf »

That's actually an unrelated bug...

Looking at your installed packages, it looks like you have downgraded successfully.

If you give me 10 mins, I can probably fix that issue.

My word is final

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory after 6 Hours of Steady Training

Post by torzdf »

Ok, downgrading Tensorflow looks like it forced a downgrade of typing extensions.

You should be able to fix the issue in exactly the same way as you did before, but instead of pip install tensorflow do:

Code: Select all

pip install typing-extensions">=4.0.0"

My word is final

User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Re: Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

Hmm.. Well, I got farther this time! :) Faceswap was able to start after installing the typing-extensions through pip, but when I went to start training, I get this error. It says I need to upgrade Tensorflow to proceed. :|

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
07/13/2022 09:46:39 INFO     Log level set to: INFO
07/13/2022 09:46:42 INFO     Model A Directory: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\data-dst-v2\training-faces2' (645 images)
07/13/2022 09:46:42 INFO     Model B Directory: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\data-src\training_faces' (4501 images)
07/13/2022 09:46:42 INFO     Training data directory: C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\model\latest
07/13/2022 09:46:42 INFO     ===================================================
07/13/2022 09:46:42 INFO       Starting
07/13/2022 09:46:42 INFO     ===================================================
07/13/2022 09:46:43 INFO     Loading data, this may take a while...
07/13/2022 09:46:43 INFO     Loading Model from Dfl_Sae plugin...
07/13/2022 09:46:43 CRITICAL Error caught! Exiting...
07/13/2022 09:46:43 ERROR    Caught exception in thread: '_training_0'
07/13/2022 09:46:48 ERROR    Got Exception on main handler:
Traceback (most recent call last):
  File "C:\Users\Administrator\faceswap\lib\cli\launcher.py", line 188, in execute_script
    process.process()
  File "C:\Users\Administrator\faceswap\scripts\train.py", line 204, in process
    self._end_thread(thread, err)
  File "C:\Users\Administrator\faceswap\scripts\train.py", line 244, in _end_thread
    thread.join()
  File "C:\Users\Administrator\faceswap\lib\multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "C:\Users\Administrator\faceswap\lib\multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Administrator\faceswap\scripts\train.py", line 266, in _training
    raise err
  File "C:\Users\Administrator\faceswap\scripts\train.py", line 254, in _training
    model = self._load_model()
  File "C:\Users\Administrator\faceswap\scripts\train.py", line 278, in _load_model
    model: "ModelBase" = PluginLoader.get_model(self._args.trainer)(
  File "C:\Users\Administrator\faceswap\plugins\plugin_loader.py", line 97, in get_model
    return PluginLoader._import("train.model", name, disable_logging)
  File "C:\Users\Administrator\faceswap\plugins\plugin_loader.py", line 163, in _import
    module = import_module(mod)
  File "C:\Users\Administrator\MiniConda3\envs\faceswap\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "C:\Users\Administrator\faceswap\plugins\train\model\dfl_sae.py", line 11, in <module>
    from ._base import ModelBase, KerasModel
  File "C:\Users\Administrator\faceswap\plugins\train\model\_base\__init__.py", line 4, in <module>
    from .model import get_all_sub_models, KerasModel, ModelBase  # noqa
  File "C:\Users\Administrator\faceswap\plugins\train\model\_base\model.py", line 23, in <module>
    from .settings import Loss, Optimizer, Settings
  File "C:\Users\Administrator\faceswap\plugins\train\model\_base\settings.py", line 35, in <module>
    from lib.model.autoclip import AutoClipper  # pylint:disable=ungrouped-imports
  File "C:\Users\Administrator\faceswap\lib\model\autoclip.py", line 8, in <module>
    import tensorflow_probability as tfp
  File "C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\tensorflow_probability\__init__.py", line 20, in <module>
    from tensorflow_probability import substrates
  File "C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\tensorflow_probability\substrates\__init__.py", line 17, in <module>
    from tensorflow_probability.python.internal import all_util
  File "C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\tensorflow_probability\python\__init__.py", line 138, in <module>
    dir(globals()[pkg_name])  # Forces loading the package from its lazy loader.
  File "C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\tensorflow_probability\python\internal\lazy_loader.py", line 57, in __dir__
    module = self._load()
  File "C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\tensorflow_probability\python\internal\lazy_loader.py", line 37, in _load
    self._on_first_access()
  File "C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\tensorflow_probability\python\__init__.py", line 59, in _validate_tf_environment
    raise ImportError(
ImportError: This version of TensorFlow Probability requires TensorFlow version >= 2.8; Detected an installation of version 2.6.5. Please upgrade TensorFlow to proceed.
07/13/2022 09:46:48 CRITICAL An unexpected crash has occurred. Crash report written to 'C:\Users\Administrator\faceswap\crash_report.2022.07.13.094643988794.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.

Here's the crash log as well.

crash_report.2022.07.13.094643988794.log
(136.28 KiB) Downloaded 85 times
User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Re: Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

I think I might've solved it! I installed an earlier version of tensorflow-probability in the Anaconda Prompt, and now it seems to be training just fine!

Code: Select all

pip install tensorflow-probability==0.14.1

I'm going to let it run for the rest of the day to see if it will remain stable. If it stays that way, I will mark this issue as resolved. I did get this error while it was starting up, but hopefully it isn't too important:

Code: Select all

C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\keras\utils\generic_utils.py:494: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
warnings.warn('Custom mask layers require a config and must override '

I am using a Bisnet-FP head mask on the model, so maybe that's why the error came up.

User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory after 6 Hours of Steady Training

Post by torzdf »

Glad you got there in the end :)

The type of mask used should make no difference.

Please do let me know if rolling back solves your issue, so I can keep track of the problem.

My word is final

User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Re: Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

Will do! :D

User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Re: Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

Well, I just checked up on the training, and unfortunately, it crashed as it did before. It ran for about as long as it usually does. Not sure where to go from here. :(

Code: Select all

Loading...
Setting Faceswap backend to NVIDIA
07/13/2022 10:25:15 INFO     Log level set to: INFO
07/13/2022 10:25:18 INFO     Model A Directory: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\data-dst-v2\training-faces2' (645 images)
07/13/2022 10:25:18 INFO     Model B Directory: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\data-src\training_faces' (4501 images)
07/13/2022 10:25:18 INFO     Training data directory: C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\model\latest
07/13/2022 10:25:18 INFO     ===================================================
07/13/2022 10:25:18 INFO       Starting
07/13/2022 10:25:18 INFO     ===================================================
07/13/2022 10:25:19 INFO     Loading data, this may take a while...
07/13/2022 10:25:19 INFO     Loading Model from Dfl_Sae plugin...
07/13/2022 10:25:20 INFO     Using configuration saved in state file
07/13/2022 10:25:20 INFO     Mixed precision compatibility check (mixed_float16): OK\nYour GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: NVIDIA GeForce RTX 2060 SUPER, compute capability 7.5
07/13/2022 10:25:20 INFO     Enabling Mixed Precision Training.
07/13/2022 10:25:22 INFO     Loaded model from disk: 'C:\Projects\TMR\Deepfake\Smith\v0.4\Workspace\model\latest\dfl_sae.h5'
07/13/2022 10:25:22 INFO     Loading Trainer from Original plugin...
C:\Users\Administrator\MiniConda3\envs\faceswap\lib\site-packages\keras\utils\generic_utils.py:494: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
warnings.warn('Custom mask layers require a config and must override '

07/13/2022 10:26:13 INFO     [Saved models] - Average loss since last save: face_a: 0.14487, face_b: 0.09168



07/13/2022 10:29:31 INFO     [Saved models] - Average loss since last save: face_a: 0.07632, face_b: 0.09160



07/13/2022 10:32:51 INFO     [Saved models] - Average loss since last save: face_a: 0.07661, face_b: 0.09145



07/13/2022 10:36:12 INFO     [Saved models] - Average loss since last save: face_a: 0.07662, face_b: 0.09547



07/13/2022 10:39:30 INFO     [Saved models] - Average loss since last save: face_a: 0.07629, face_b: 0.09619



07/13/2022 10:42:47 INFO     [Saved models] - Average loss since last save: face_a: 0.08005, face_b: 0.09803



07/13/2022 10:46:04 INFO     [Saved models] - Average loss since last save: face_a: 0.08111, face_b: 0.09542



07/13/2022 10:49:21 INFO     [Saved models] - Average loss since last save: face_a: 0.07906, face_b: 0.09186



07/13/2022 10:52:38 INFO     [Saved models] - Average loss since last save: face_a: 0.08009, face_b: 0.09606



07/13/2022 10:55:55 INFO     [Saved models] - Average loss since last save: face_a: 0.07936, face_b: 0.09605



07/13/2022 10:59:12 INFO     [Saved models] - Average loss since last save: face_a: 0.07837, face_b: 0.09571



07/13/2022 11:02:28 INFO     [Saved models] - Average loss since last save: face_a: 0.07817, face_b: 0.09320



07/13/2022 11:05:45 INFO     [Saved models] - Average loss since last save: face_a: 0.08189, face_b: 0.09417



07/13/2022 11:09:02 INFO     [Saved models] - Average loss since last save: face_a: 0.08185, face_b: 0.09866



07/13/2022 11:12:19 INFO     [Saved models] - Average loss since last save: face_a: 0.07931, face_b: 0.09255



07/13/2022 11:15:39 INFO     [Saved models] - Average loss since last save: face_a: 0.07976, face_b: 0.09497



07/13/2022 11:18:57 INFO     [Saved models] - Average loss since last save: face_a: 0.07991, face_b: 0.09554



07/13/2022 11:22:15 INFO     [Saved models] - Average loss since last save: face_a: 0.07920, face_b: 0.09570



07/13/2022 11:25:34 INFO     [Saved models] - Average loss since last save: face_a: 0.08030, face_b: 0.09630



07/13/2022 11:28:53 INFO     [Saved models] - Average loss since last save: face_a: 0.07966, face_b: 0.09407



07/13/2022 11:32:11 INFO     [Saved models] - Average loss since last save: face_a: 0.08032, face_b: 0.09499



07/13/2022 11:35:29 INFO     [Saved models] - Average loss since last save: face_a: 0.07692, face_b: 0.09590



07/13/2022 11:38:49 INFO     [Saved models] - Average loss since last save: face_a: 0.07941, face_b: 0.09675



07/13/2022 11:42:08 INFO     [Saved models] - Average loss since last save: face_a: 0.08095, face_b: 0.09355



07/13/2022 11:45:27 INFO     [Saved models] - Average loss since last save: face_a: 0.08037, face_b: 0.09428



07/13/2022 11:48:46 INFO     [Saved models] - Average loss since last save: face_a: 0.07971, face_b: 0.09837



07/13/2022 11:52:06 INFO     [Saved models] - Average loss since last save: face_a: 0.07814, face_b: 0.09467



07/13/2022 11:55:26 INFO     [Saved models] - Average loss since last save: face_a: 0.08054, face_b: 0.09653



07/13/2022 11:58:45 INFO     [Saved models] - Average loss since last save: face_a: 0.08097, face_b: 0.09621



07/13/2022 12:02:06 INFO     [Saved models] - Average loss since last save: face_a: 0.07743, face_b: 0.09421



07/13/2022 12:05:27 INFO     [Saved models] - Average loss since last save: face_a: 0.08084, face_b: 0.09665



07/13/2022 12:08:50 INFO     [Saved models] - Average loss since last save: face_a: 0.07741, face_b: 0.09540



07/13/2022 12:12:10 INFO     [Saved models] - Average loss since last save: face_a: 0.08099, face_b: 0.09152



07/13/2022 12:15:30 INFO     [Saved models] - Average loss since last save: face_a: 0.07900, face_b: 0.09535



07/13/2022 12:16:56 INFO     Saved snapshot (1800000 iterations)

07/13/2022 12:18:53 INFO     [Saved models] - Average loss since last save: face_a: 0.07860, face_b: 0.09353



07/13/2022 12:22:14 INFO     [Saved models] - Average loss since last save: face_a: 0.07927, face_b: 0.09474



07/13/2022 12:25:35 INFO     [Saved models] - Average loss since last save: face_a: 0.07843, face_b: 0.09833



07/13/2022 12:28:55 INFO     [Saved models] - Average loss since last save: face_a: 0.08116, face_b: 0.09135



07/13/2022 12:32:16 INFO     [Saved models] - Average loss since last save: face_a: 0.07972, face_b: 0.09610



07/13/2022 12:35:38 INFO     [Saved models] - Average loss since last save: face_a: 0.07755, face_b: 0.09340



07/13/2022 12:39:00 INFO     [Saved models] - Average loss since last save: face_a: 0.08127, face_b: 0.09485



07/13/2022 12:42:21 INFO     [Saved models] - Average loss since last save: face_a: 0.07932, face_b: 0.09551



07/13/2022 12:45:43 INFO     [Saved models] - Average loss since last save: face_a: 0.07930, face_b: 0.09489



07/13/2022 12:49:05 INFO     [Saved models] - Average loss since last save: face_a: 0.07974, face_b: 0.09474



07/13/2022 12:52:26 INFO     [Saved models] - Average loss since last save: face_a: 0.07835, face_b: 0.09729



07/13/2022 12:55:46 INFO     [Saved models] - Average loss since last save: face_a: 0.07872, face_b: 0.09479



07/13/2022 12:59:07 INFO     [Saved models] - Average loss since last save: face_a: 0.08014, face_b: 0.09586



07/13/2022 13:02:28 INFO     [Saved models] - Average loss since last save: face_a: 0.08200, face_b: 0.09793



07/13/2022 13:05:49 INFO     [Saved models] - Average loss since last save: face_a: 0.07818, face_b: 0.09528



07/13/2022 13:09:10 INFO     [Saved models] - Average loss since last save: face_a: 0.08010, face_b: 0.09442



07/13/2022 13:12:30 INFO     [Saved models] - Average loss since last save: face_a: 0.07845, face_b: 0.09661



07/13/2022 13:15:52 INFO     [Saved models] - Average loss since last save: face_a: 0.07854, face_b: 0.09317



07/13/2022 13:19:13 INFO     [Saved models] - Average loss since last save: face_a: 0.07940, face_b: 0.09409



07/13/2022 13:22:34 INFO     [Saved models] - Average loss since last save: face_a: 0.07874, face_b: 0.09514



07/13/2022 13:25:54 INFO     [Saved models] - Average loss since last save: face_a: 0.08009, face_b: 0.09717



07/13/2022 13:29:14 INFO     [Saved models] - Average loss since last save: face_a: 0.08281, face_b: 0.09604



07/13/2022 13:32:34 INFO     [Saved models] - Average loss since last save: face_a: 0.07676, face_b: 0.09536



07/13/2022 13:35:54 INFO     [Saved models] - Average loss since last save: face_a: 0.07940, face_b: 0.09117



07/13/2022 13:39:14 INFO     [Saved models] - Average loss since last save: face_a: 0.07973, face_b: 0.09378



07/13/2022 13:42:34 INFO     [Saved models] - Average loss since last save: face_a: 0.08234, face_b: 0.09327



07/13/2022 13:45:55 INFO     [Saved models] - Average loss since last save: face_a: 0.07850, face_b: 0.09702



07/13/2022 13:49:17 INFO     [Saved models] - Average loss since last save: face_a: 0.07889, face_b: 0.09635



07/13/2022 13:52:38 INFO     [Saved models] - Average loss since last save: face_a: 0.08020, face_b: 0.09588



07/13/2022 13:56:00 INFO     [Saved models] - Average loss since last save: face_a: 0.07975, face_b: 0.09496



07/13/2022 13:59:22 INFO     [Saved models] - Average loss since last save: face_a: 0.07808, face_b: 0.09557



07/13/2022 14:02:43 INFO     [Saved models] - Average loss since last save: face_a: 0.07869, face_b: 0.09512



07/13/2022 14:06:04 INFO     [Saved models] - Average loss since last save: face_a: 0.07742, face_b: 0.09500



07/13/2022 14:09:26 INFO     [Saved models] - Average loss since last save: face_a: 0.08193, face_b: 0.09535



07/13/2022 14:12:47 INFO     [Saved models] - Average loss since last save: face_a: 0.07905, face_b: 0.09302



07/13/2022 14:16:09 INFO     [Saved models] - Average loss since last save: face_a: 0.07915, face_b: 0.09600



07/13/2022 14:19:30 INFO     [Saved models] - Average loss since last save: face_a: 0.07947, face_b: 0.09305



07/13/2022 14:22:52 INFO     [Saved models] - Average loss since last save: face_a: 0.07932, face_b: 0.09674



07/13/2022 14:26:14 INFO     [Saved models] - Average loss since last save: face_a: 0.07946, face_b: 0.09619



07/13/2022 14:29:37 INFO     [Saved models] - Average loss since last save: face_a: 0.08023, face_b: 0.09534



07/13/2022 14:32:58 INFO     [Saved models] - Average loss since last save: face_a: 0.07852, face_b: 0.09576



07/13/2022 14:36:20 INFO     [Saved models] - Average loss since last save: face_a: 0.07708, face_b: 0.09439



07/13/2022 14:39:42 INFO     [Saved models] - Average loss since last save: face_a: 0.08124, face_b: 0.09258



07/13/2022 14:43:04 INFO     [Saved models] - Average loss since last save: face_a: 0.07750, face_b: 0.09545



07/13/2022 14:46:26 INFO     [Saved models] - Average loss since last save: face_a: 0.08208, face_b: 0.09575



07/13/2022 14:49:48 INFO     [Saved models] - Average loss since last save: face_a: 0.08213, face_b: 0.09502



07/13/2022 14:53:10 INFO     [Saved models] - Average loss since last save: face_a: 0.07940, face_b: 0.09654



07/13/2022 14:56:33 INFO     [Saved models] - Average loss since last save: face_a: 0.08063, face_b: 0.09553



07/13/2022 14:59:55 INFO     [Saved models] - Average loss since last save: face_a: 0.07918, face_b: 0.09684



07/13/2022 15:03:18 INFO     [Saved models] - Average loss since last save: face_a: 0.08131, face_b: 0.09446



07/13/2022 15:06:41 INFO     [Saved models] - Average loss since last save: face_a: 0.07750, face_b: 0.09454



07/13/2022 15:10:03 INFO     [Saved models] - Average loss since last save: face_a: 0.07961, face_b: 0.09726



07/13/2022 15:13:26 INFO     [Saved models] - Average loss since last save: face_a: 0.07983, face_b: 0.09611



07/13/2022 15:16:48 INFO     [Saved models] - Average loss since last save: face_a: 0.07931, face_b: 0.09413



07/13/2022 15:20:10 INFO     [Saved models] - Average loss since last save: face_a: 0.08032, face_b: 0.09347



07/13/2022 15:23:30 INFO     [Saved models] - Average loss since last save: face_a: 0.07918, face_b: 0.09712



07/13/2022 15:26:51 INFO     [Saved models] - Average loss since last save: face_a: 0.07819, face_b: 0.09695



07/13/2022 15:30:12 INFO     [Saved models] - Average loss since last save: face_a: 0.07817, face_b: 0.09678



07/13/2022 15:33:33 INFO     [Saved models] - Average loss since last save: face_a: 0.07937, face_b: 0.09645



07/13/2022 15:36:54 INFO     [Saved models] - Average loss since last save: face_a: 0.08002, face_b: 0.09463



07/13/2022 15:40:16 INFO     [Saved models] - Average loss since last save: face_a: 0.07862, face_b: 0.09653



07/13/2022 15:43:37 INFO     [Saved models] - Average loss since last save: face_a: 0.07843, face_b: 0.09436



07/13/2022 15:46:59 INFO     [Saved models] - Average loss since last save: face_a: 0.07868, face_b: 0.09638



07/13/2022 15:50:22 INFO     [Saved models] - Average loss since last save: face_a: 0.07862, face_b: 0.09297



07/13/2022 15:53:44 INFO     [Saved models] - Average loss since last save: face_a: 0.07833, face_b: 0.09602



07/13/2022 15:57:07 INFO     [Saved models] - Average loss since last save: face_a: 0.07979, face_b: 0.09481



07/13/2022 16:00:29 INFO     [Saved models] - Average loss since last save: face_a: 0.07672, face_b: 0.09399



07/13/2022 16:03:52 INFO     [Saved models] - Average loss since last save: face_a: 0.08020, face_b: 0.09215



07/13/2022 16:07:14 INFO     [Saved models] - Average loss since last save: face_a: 0.07916, face_b: 0.09627



07/13/2022 16:10:37 INFO     [Saved models] - Average loss since last save: face_a: 0.07805, face_b: 0.09250



07/13/2022 16:14:00 INFO     [Saved models] - Average loss since last save: face_a: 0.08139, face_b: 0.09365



07/13/2022 16:17:23 INFO     [Saved models] - Average loss since last save: face_a: 0.07841, face_b: 0.09835



07/13/2022 16:20:45 INFO     [Saved models] - Average loss since last save: face_a: 0.07915, face_b: 0.09479



07/13/2022 16:24:08 INFO     [Saved models] - Average loss since last save: face_a: 0.07706, face_b: 0.09504



07/13/2022 16:27:32 INFO     [Saved models] - Average loss since last save: face_a: 0.08060, face_b: 0.09605



07/13/2022 16:30:55 INFO     [Saved models] - Average loss since last save: face_a: 0.07957, face_b: 0.09339



07/13/2022 16:34:18 INFO     [Saved models] - Average loss since last save: face_a: 0.08115, face_b: 0.09335



07/13/2022 16:37:42 INFO     [Saved models] - Average loss since last save: face_a: 0.08102, face_b: 0.09599



07/13/2022 16:41:06 INFO     [Saved models] - Average loss since last save: face_a: 0.07889, face_b: 0.09463



07/13/2022 16:44:30 INFO     [Saved models] - Average loss since last save: face_a: 0.08101, face_b: 0.09668



07/13/2022 16:47:54 INFO     [Saved models] - Average loss since last save: face_a: 0.07840, face_b: 0.09580



07/13/2022 17:16:33 ERROR    Caught exception in thread: '_training_0'
07/13/2022 17:16:33 ERROR    You do not have enough GPU memory available to train the selected model at the selected settings. You can try a number of things:
07/13/2022 17:16:33 ERROR    1) Close any other application that is using your GPU (web browsers are particularly bad for this).
07/13/2022 17:16:33 ERROR    2) Lower the batchsize (the amount of images fed into the model each iteration).
07/13/2022 17:16:33 ERROR    3) Try enabling 'Mixed Precision' training.
07/13/2022 17:16:33 ERROR    4) Use a more lightweight model, or select the model's 'LowMem' option (in config) if it has one.
Process exited.
User avatar
torzdf
Posts: 2651
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 129 times
Been thanked: 622 times

Re: Out of Memory after 6 Hours of Steady Training

Post by torzdf »

Well, that sucks.

Not really sure what else to suggest, as this is not a repeated complaint.

You can try enabling "Allow Growth" under training settings, but I think it's a long shot that it will solve your issue.

My word is final

User avatar
WoahNoah
Posts: 9
Joined: Thu Dec 16, 2021 9:28 pm
Has thanked: 2 times

Re: Out of Memory after 6 Hours of Steady Training

Post by WoahNoah »

I'll try it out and let you know if it works, otherwise, I'll leave it here for now. Thank you for your help, torzdf.

Locked