Page 1 of 2

AMD - CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Mon Jan 27, 2020 8:36 am
by komputerek

Hey there!

I am currently testing DeepFaceLab.
I own Radeon R7 370 STRIX GAMING 4GB 256bit.

It seems to work only with 64 resolution - I can set batch on 8-12 then.
But when I try to set the resolution on 128 even with batch as low as 0 I get "Unable to allocate device-local memory cl_mem_object_allocation_failure"

Does it mean that no matter how low the batch and other options are, my card won't train on the resolution set to 128?
And why when I train it says "VRAM = 3GB" while, in fact, my GPU has 4GB VRAM?

And the most important - is it even worth the effort to try to train on 64 resolution? The result will be at least a little bit convincing on 300k-400k iterations?

Those are my example settings that work:

random_flip: True ==
== resolution: 64 ==
== face_type: f ==
== learn_mask: False ==
== optimizer_mode: 1 ==
== archi: df ==
== ae_dims: 32 ==
== ed_ch_dims: 10 ==
== lr_dropout: False ==
== random_warp: False ==
== true_face_training: False ==
== face_style_power: 0.0 ==
== bg_style_power: 0.0 ==
== ct_mode: none ==
== clipgrad: False ==
== batch_size: 10


Re: Is it worth the effort to train on 64 resolution?

Posted: Mon Jan 27, 2020 10:55 am
by torzdf

Unfortunately we don't provide support for DFL. However if you do choose to try Faceswap, the we will be happy to provide help where we can.


AMD R5 M335 4GB not working

Posted: Thu Mar 12, 2020 2:20 pm
by Deepbie

It runs on intel HD 520 but unable to work on AMD R5 M335 4GB VRAM .It says Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE ............... How do i fix this ??? Help!!!


Re: AMD R5 M335 4GB not working

Posted: Fri Mar 13, 2020 11:21 am
by torzdf

Problem about 'Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE' during Training

Posted: Tue Apr 21, 2020 12:42 pm
by coregosu12

Help!! I have a problem! :o As I introduced on the title, I have a problem with map memory and openCL.

I am Korean, and I am sorry for my bad English wording.

My Computer's GPU has 2GB, and the RAM has 8GB and CPU has intel-i3 3.7GHz, and 64bit Windows.
I used the original Trainer and my batch size is 48.

And I also have question. I want to change video of Alpha with the face of Beta. Then how to set the Alpha, Beta with input A, B?

Following is the code that the program showed to me.

Loading...
Setting Faceswap backend to AMD
04/21/2020 21:22:42 INFO Log level set to: INFO
04/21/2020 21:22:43 INFO Setting up for PlaidML
04/21/2020 21:22:43 INFO Setting GPU to largest available supported device. If you want to override this selection, run plaidml-setup from the command line.
04/21/2020 21:22:43 INFO Using GPU: ['opencl_nvidia_geforce_gtx_750_ti.0', 'opencl_nvidia_geforce_gtx_750_ti.0']
04/21/2020 21:22:43 INFO Successfully set up for PlaidML
Using plaidml.keras.backend backend.
04/21/2020 21:22:46 INFO Model A Directory: C:\Users\work\faceswap\faces\dahyeon
04/21/2020 21:22:46 INFO Model B Directory: C:\Users\work\faceswap\faces\yeong
04/21/2020 21:22:46 INFO Training data directory: C:\Users\work\faceswap\yeongmodel
04/21/2020 21:22:46 INFO ===================================================
04/21/2020 21:22:46 INFO Starting
04/21/2020 21:22:46 INFO Press 'Stop' to save and quit
04/21/2020 21:22:46 INFO ===================================================
04/21/2020 21:22:47 INFO Loading data, this may take a while...
04/21/2020 21:22:47 INFO Loading Model from Original plugin...
04/21/2020 21:22:47 INFO No existing state file found. Generating.
04/21/2020 21:22:47 INFO Opening device "opencl_nvidia_geforce_gtx_750_ti.0"
04/21/2020 21:22:48 INFO Creating new 'original' model in folder: 'C:\Users\work\faceswap\yeongmodel'
04/21/2020 21:22:48 INFO Loading Trainer from Original plugin...
04/21/2020 21:22:48 INFO Enabled TensorBoard Logging
Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
04/21/2020 21:22:50 ERROR Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
04/21/2020 21:22:50 CRITICAL Error caught! Exiting...
04/21/2020 21:22:50 ERROR Caught exception in thread: 'training_0'
04/21/2020 21:22:54 ERROR Got Exception on main handler:
Traceback (most recent call last):
File "C:\Users\work\faceswap\lib\cli.py", line 128, in execute_script
process.process()
File "C:\Users\work\faceswap\scripts\train.py", line 161, in process
self.
end_thread(thread, err)
File "C:\Users\work\faceswap\scripts\train.py", line 201, in end_thread
thread.join()
File "C:\Users\work\faceswap\lib\multithreading.py", line 121, in join
raise thread.err[1].with_traceback(thread.err[2])
File "C:\Users\work\faceswap\lib\multithreading.py", line 37, in run
self.
target(*self.args, **self.kwargs)
File "C:\Users\work\faceswap\scripts\train.py", line 226, in _training
raise err
File "C:\Users\work\faceswap\scripts\train.py", line 216, in training
self.
run_training_cycle(model, trainer)
File "C:\Users\work\faceswap\scripts\train.py", line 305, in run_training_cycle
trainer.train_one_step(viewer, timelapse)
File "C:\Users\work\faceswap\plugins\train\trainer\_base.py", line 316, in train_one_step
raise err
File "C:\Users\work\faceswap\plugins\train\trainer\_base.py", line 283, in train_one_step
loss[side] = batcher.train_one_batch()
File "C:\Users\work\faceswap\plugins\train\trainer\_base.py", line 424, in train_one_batch
loss = self.
model.predictors[self.side].train_on_batch(model_inputs, model_targets)
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\keras\engine\training.py", line 1216, in train_on_batch
self.
make_train_function()
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\keras\engine\training.py", line 509, in make_train_function
loss=self.total_loss)
File "C:\Users\work\faceswap\lib\model\optimizers.py", line 51, in get_updates
ms, vs, vhats = self.update_1(params)
File "C:\Users\work\faceswap\lib\model\optimizers.py", line 78, in update_1
ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
File "C:\Users\work\faceswap\lib\model\optimizers.py", line 78, in <listcomp>
ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\plaidml\keras\backend.py", line 1743, in zeros
return constant(0.0, shape=shape, dtype=dtype, name=
prepend_name_scope(name, 'zeros'))
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\plaidml\keras\backend.py", line 482, in constant
return variable(np_value, dtype=dtype, name=prepend_name_scope(name, 'constant'))
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\plaidml\keras\backend.py", line 1735, in variable
with tensor.mmap_discard(
ctx) as view:
File "C:\Users\work\anaconda3\envs\faceswap\lib\contextlib.py", line 112, in enter
return next(self.gen)
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\plaidml\__init.py", line 1252, in mmap_discard
mapping = _lib().plaidml_map_buffer_discard(ctx, self.buffer)
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\plaidml\__init
.py", line 777, in _check_err
self.raise_last_status()
File "C:\Users\work\anaconda3\envs\faceswap\lib\site-packages\plaidml\library.py", line 131, in raise_last_status
raise self.last_status()
plaidml.exceptions.Unknown: Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
04/21/2020 21:22:54 CRITICAL An unexpected crash has occurred. Crash report written to 'C:\Users\work\faceswap\crash_report.2020.04.21.212254189152.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.


Re: Problem about 'Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE' during Training

Posted: Tue Apr 21, 2020 4:30 pm
by torzdf

Re: Problem about 'Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE' during Training

Posted: Wed Apr 22, 2020 1:42 am
by coregosu12

I couldn't solve my problem with those solutions.

And I have another question about setting input A, B on the training process.


Re: Problem about 'Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE' during Training

Posted: Wed Apr 22, 2020 9:55 am
by torzdf

Ultimately that error is telling you you don't have enough GPU RAM. If you were unable to solve it with any of the links/googling, then sadly you just don't have enough VRAM.

A is the original face. B is the face you want to put on A.


Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Sun Aug 16, 2020 4:26 am
by waseemmd38

i saved training and stopped again re train the same traning i got an error

Code: Select all


8/16/2020 09:48:46 ERROR    Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

08/16/2020 09:48:46 CRITICAL Error caught! Exiting...
08/16/2020 09:48:46 ERROR    Caught exception in thread: '_training_0'
08/16/2020 09:49:06 ERROR    Got Exception on main handler:
Traceback (most recent call last):
File "C:\Users\SHAKEEL\faceswap\lib\cli\launcher.py", line 155, in execute_script
process.process()
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 161, in process
self._end_thread(thread, err)
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 201, in _end_thread
thread.join()
File "C:\Users\SHAKEEL\faceswap\lib\multithreading.py", line 121, in join
raise thread.err[1].with_traceback(thread.err[2])
File "C:\Users\SHAKEEL\faceswap\lib\multithreading.py", line 37, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 226, in _training
raise err
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 216, in _training
self._run_training_cycle(model, trainer)
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 305, in _run_training_cycle
trainer.train_one_step(viewer, timelapse)
File "C:\Users\SHAKEEL\faceswap\plugins\train\trainer\_base.py", line 316, in train_one_step
raise err
File "C:\Users\SHAKEEL\faceswap\plugins\train\trainer\_base.py", line 283, in train_one_step
loss[side] = batcher.train_one_batch()
File "C:\Users\SHAKEEL\faceswap\plugins\train\trainer\_base.py", line 424, in train_one_batch
loss = self._model.predictors[self._side].train_on_batch(model_inputs, model_targets)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\keras\backend.py", line 175, in __call__
self._invoker.invoke()
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\__init__.py", line 1455, in invoke
return Invocation(self._ctx, self)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\__init__.py", line 1464, in __init__
self._as_parameter_ = _lib().plaidml_schedule_invocation(ctx, invoker)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\__init__.py", line 777, in _check_err
self.raise_last_status()
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\library.py", line 131, in raise_last_status
raise self.last_status()
plaidml.exceptions.Unknown: Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
08/16/2020 09:49:06 CRITICAL An unexpected crash has occurred. Crash report written to 'C:\Users\SHAKEEL\faceswap\crash_report.2020.08.16.094846825457.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.

Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Sun Aug 16, 2020 8:47 am
by torzdf

You really should provide the full crash report when reporting failures ('C:\Users\SHAKEEL\faceswap\crash_report.2020.08.16.094846825457.log')

However, this means that you are out of GPU Memory. Try lowering your batch size


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Sun Aug 16, 2020 2:12 pm
by waseemmd38

how can i reduce batch size


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Sun Aug 16, 2020 8:34 pm
by abigflea

It is on the training screen, right below selecting the trainer.


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Mon Aug 17, 2020 10:58 am
by waseemmd38

I reduced batch size again I got this error help

Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
08/17/2020 16:20:58 ERROR Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
08/17/2020 16:20:58 CRITICAL Error caught! Exiting...
08/17/2020 16:20:58 ERROR Caught exception in thread: 'training_0'
08/17/2020 16:21:13 ERROR Got Exception on main handler:
Traceback (most recent call last):
File "C:\Users\SHAKEEL\faceswap\lib\cli\launcher.py", line 155, in execute_script
process.process()
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 161, in process
self.
end_thread(thread, err)
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 201, in end_thread
thread.join()
File "C:\Users\SHAKEEL\faceswap\lib\multithreading.py", line 121, in join
raise thread.err[1].with_traceback(thread.err[2])
File "C:\Users\SHAKEEL\faceswap\lib\multithreading.py", line 37, in run
self.
target(*self.args, **self.kwargs)
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 226, in _training
raise err
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 216, in training
self.
run_training_cycle(model, trainer)
File "C:\Users\SHAKEEL\faceswap\scripts\train.py", line 305, in run_training_cycle
trainer.train_one_step(viewer, timelapse)
File "C:\Users\SHAKEEL\faceswap\plugins\train\trainer\_base.py", line 316, in train_one_step
raise err
File "C:\Users\SHAKEEL\faceswap\plugins\train\trainer\_base.py", line 283, in train_one_step
loss[side] = batcher.train_one_batch()
File "C:\Users\SHAKEEL\faceswap\plugins\train\trainer\_base.py", line 424, in train_one_batch
loss = self.
model.predictors[self.side].train_on_batch(model_inputs, model_targets)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\keras\backend.py", line 175, in call
self.
invoker.invoke()
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\__init.py", line 1455, in invoke
return Invocation(self.ctx, self)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\__init
.py", line 1464, in init
self.
as_parameter_ = _lib().plaidml_schedule_invocation(ctx, invoker)
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\__init__.py", line 777, in _check_err
self.raise_last_status()
File "C:\Users\SHAKEEL\MiniConda3\envs\faceswap\lib\site-packages\plaidml\library.py", line 131, in raise_last_status
raise self.last_status()
plaidml.exceptions.Unknown: Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
08/17/2020 16:21:13 CRITICAL An unexpected crash has occurred. Crash report written to 'C:\Users\SHAKEEL\faceswap\crash_report.2020.08.17.162058819006.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Mon Aug 17, 2020 11:45 am
by abigflea

Art the bottom of that post

Code: Select all

 Crash report written to 'C:\Users\SHAKEEL\faceswap\crash_report.2020.08.16.094846825457.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting
Process exited.

Can you post that "crash_report.2020.08.16.094846825457.log" ?


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Mon Aug 17, 2020 12:13 pm
by waseemmd38
crash_report.2020.08.17.173146614626.log
this is a crash report
(191.37 KiB) Downloaded 1446 times

Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Tue Aug 18, 2020 1:34 am
by waseemmd38

plz reply


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Tue Aug 18, 2020 7:18 am
by torzdf

In all honesty, a 2GB GPU is probably not going to be large enough to train a model.

You can try the lightweight model at a very low batchsize (maybe 2?) and it may work. But equally, it may not.


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Tue Aug 18, 2020 1:57 pm
by waseemmd38

i have issue regarding training i stopped training and re-trained the model but un expected crash occured. here i reduced batch size to 2 now before i have irritations 1001 and I stopped the training i retrained the samoe project same crash occurred there I thick it is due to batch size now i reduced batch is 2,now i have same issue occurred. plz help


Re: Restarting traiing results in Unable to allocate device-local memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Tue Aug 18, 2020 2:46 pm
by torzdf

I don't know how else i can help here.

A 2GB GPU is just too small to train anything other than maybe the lightweight model on a low batch size.

Try that model, and if you still have problems, then I'm afraid your GPU just can't handle faceswap.

Locking, as there is no where else for this thread to go.


Problem with training: Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Posted: Wed Aug 19, 2020 12:52 am
by breezyhaze

The extracting process went well, and when I try to train the model, the following error appears:

Loading...
Setting Faceswap backend to AMD
08/18/2020 20:42:02 INFO Log level set to: INFO
08/18/2020 20:42:02 INFO Setting up for PlaidML
08/18/2020 20:42:02 INFO Using GPU(s): ['NVIDIA Corporation - GeForce RTX 2080 SUPER (experimental)', 'NVIDIA Corporation - GeForce RTX 2080 SUPER (supported)']
08/18/2020 20:42:02 INFO Successfully set up for PlaidML
08/18/2020 20:42:04 INFO Model A Directory: C:\Users\breez\Desktop\snoop
08/18/2020 20:42:04 INFO Model B Directory: C:\Users\breez\Desktop\hxm
08/18/2020 20:42:04 INFO Training data directory: C:\Users\breez\Desktop\model1
08/18/2020 20:42:04 INFO ===================================================
08/18/2020 20:42:04 INFO Starting
08/18/2020 20:42:04 INFO Press 'Stop' to save and quit
08/18/2020 20:42:04 INFO ===================================================
08/18/2020 20:42:05 INFO Loading data, this may take a while...
08/18/2020 20:42:05 INFO Loading Model from Dfaker plugin...
Using plaidml.keras.backend backend.
08/18/2020 20:42:05 INFO No existing state file found. Generating.
08/18/2020 20:42:05 INFO Opening device "opencl_nvidia_geforce_rtx_2080_super.0"
08/18/2020 20:42:05 INFO Loading Trainer from Original plugin...

Reading training images (A): 0%| | 0/1444 [00:00<?, ?it/s]
Reading training images (A): 40%|███▉ | 572/1444 [00:00<00:00, 5662.07it/s]
Reading training images (A): 92%|█████████▏| 1323/1444 [00:00<00:00, 6113.01it/s]

Reading training images (B): 0%| | 0/569 [00:00<?, ?it/s]
Reading training images (B): 27%|██▋ | 151/569 [00:00<00:00, 1494.72it/s]
08/18/2020 20:42:06 INFO Reading alignments from: 'C:\Users\breez\Desktop\snoop_alignments.fsa'
08/18/2020 20:42:06 INFO Reading alignments from: 'C:\Users\breez\Desktop\huoxian_alignments.fsa'
08/18/2020 20:42:07 ERROR Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE

Unable to map memory: CL_MEM_OBJECT_ALLOCATION_FAILURE
08/18/2020 20:42:07 CRITICAL Error caught! Exiting...
08/18/2020 20:42:07 ERROR Caught exception in thread: '_training_0'
08/18/2020 20:42:07 ERROR You do not have enough GPU memory available to train the selected model at the selected settings. You can try a number of things:
08/18/2020 20:42:07 ERROR 1) Close any other application that is using your GPU (web browsers are particularly bad for this).
08/18/2020 20:42:07 ERROR 2) Lower the batchsize (the amount of images fed into the model each iteration).
08/18/2020 20:42:07 ERROR 3) Use a more lightweight model, or select the model's 'LowMem' option (in config) if it has one.
Process exited.

I don't know why this happens, I tried with the original trainer, which specifies that no alignment is needed. Still, it won't work. Any idea why this happens?