RAM Failure: Symptoms and Diagnosis
Summary
If you see random application crashes, kernel panics, and failures in
simple programs that have worked for a long time, it is possible you
are experiencing RAM failure. Here at frequal.com we had RAM go bad
in a server recently. The failures were initially infrequent but
became commonplace, preventing the server from functioning properly.
Memtest86+ and a quick RAM replacement fixed the problem once
recognized, but determining the cause was not easy.
Symptoms
When RAM starts going bad, what you write to an address is not what
you get back when you read it later. This causes random corruption of
data, programs that crash, and even kernel "oops"es and kernel
panics. If it's only a small amount of RAM that is corrupt then few
programs will fail, but something will eventually fail when that
memory gets used.
Some kernel panics may bring the system to a halt. If they reoccur
soon after the system is off for a while you can be more confident it
isn't a heat-related issue.
Even small programs (like ls ) may fail if they happen to
use the affected memory. If they work once but then fail on a
different attempt, this can be a sign of memory failure, since the
memory used will be different from run to run.
Diagnosis
Once you suspect RAM is failing, I recommend using Memtest86+. The easiest way to
use it is find an Ubuntu CD. One of the options when booting from the
CD is to run a memory test. Let it run through at least one entire
test suite, such that the Pass column increase to 1. Preferably let
it run overnight to get many successful runs.
If Memtest shows errors, it is probably best to replace the RAM. New
RAM is inexpensive and you can upgrade at the same time. With the new
RAM in place, run Memtest once more to ensure that your memory errors
are fixed.
|
|