an illegal memory access was encountered

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
dheinz70
Posts: 43
Joined: Sat Aug 15, 2020 2:43 am
Has thanked: 4 times

an illegal memory access was encountered

Post by dheinz70 »

I've been getting this error alot. The model runs for a while and this then spits out.

AMD 3800x
32 GB ram
2060 Super

Any Ideas. This same hardware setup used to run like a top on the same (or heavier) models. No crashlog is created,

2021-04-12 04:43:11.527466: F ./tensorflow/core/kernels/conv_2d_gpu.h:459] Non-OK-status: GpuLaunchKernel(ShuffleInTensor3Simple<T, 1, 2, 0>, config.block_count, config.thread_per_block, 0, d.stream(), config.virtual_thread_count, in.data(), combined_dims, out.data()) status: Internal: an illegal memory access was encountered
2021-04-12 04:43:11.527467: E tensorflow/stream_executor/dnn.cc:613] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(3176): 'cudnnConvolutionBackwardData( cudnn.handle(), alpha, filter_nd.handle(), filter_data.opaque(), output_nd.handle(), output_data.opaque(), conv.handle(), ToConvBackwardDataAlgo(algorithm_desc), scratch_memory.opaque(), scratch_memory.size(), beta, input_nd.handle(), input_data.opaque())'

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 623 times

Re: an illegal memory access was encountered

Post by torzdf »

This is most likely hardware/psu related.

With out a full system output there is little more to add.

My word is final

Locked