cuda error an illegal memory access was encountered что значит

[BUG] Illegal memory access was encountered #434

Comments

TrickyT1964 commented Apr 22, 2021

Fairly consistent crashing, will crash persistently for hours before being perfectly fine again

Temps stable @ 64C
running 1080ti rig settings are on efficient was getting 41MH/s each before crashing constantly for hours with message

[[2021-04-22 17:59:25.096961] [thread=0x00002908] [warning]]

[[2021-04-22 17:59:25.099981] [thread=0x00002c08] [warning]]

Version: STABLE v0.5.1.3
NVIDIA Drivers: 466.11

The text was updated successfully, but these errors were encountered:

4runnerwanted commented Apr 22, 2021 •

I’m having this exact issue. Code is identical for my two 1070ti’s. The second card is still crushing it at 31.5 MH/s and now the first card can barely hold 27 MH/s.

This error comes up and crashes excavator within 5 minutes of applying the same OC that my second card is running smooth at.

cuda error an illegal memory access was encountered что значит. Смотреть фото cuda error an illegal memory access was encountered что значит. Смотреть картинку cuda error an illegal memory access was encountered что значит. Картинка про cuda error an illegal memory access was encountered что значит. Фото cuda error an illegal memory access was encountered что значит

nicehashdev commented Apr 25, 2021

These errors signalize too high OC. Note that every card cannot be fully compatible with every optimization profile. You need to be lucky to have a good chip. To maximize potential of your device, you have to use manual OC using OCTune. Check Wiki, there are plenty information and instructions on how to use OCTune.

Источник

«CUDA failure 77: an illegal memory access was encountered» over a simple dataset #2663

Comments

vermorel commented Nov 23, 2017

We are frequently facing CUDA failures with CNTK.

In order to make the problem easily reproducible, we have compile both a BrainScript and a small binary dataset, attached to this ticket.

Here is the full output:

This is a blocking problem for us. Any help would be highly appreciated.

The text was updated successfully, but these errors were encountered:

ke1337 commented Nov 24, 2017

You need to set environment variable CUDA_LAUNCH_BLOCKING=1 to get the precise cuda error location. Here’s the callstack with that:

The code path seems to be in gradient optimization in PlusNode’s BackProp, when automatically reducing an input of 32×1000 to 32×1. I tried to disable gradient optimization by setting optimizeGradientAccumulation=false and the problem seems went away. I’ll dig a bit more on this.

vermorel commented Nov 24, 2017

vermorel commented Nov 29, 2017

The option optimizeGradientAccumulation=false is actually not solving all the problems. We are now facing again crashes. Attached, a small BrainScript script and a binary file to reproduce the failure.

When using CPU, we observe the failure:

Then, with GPU, the error message is:

Any help would be highly appreciated. Thanks!

Источник

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *