Perspectivesfrequal.com

RAM Failure: Symptoms and Diagnosis

Summary

If you see random application crashes, kernel panics, and failures in simple programs that have worked for a long time, it is possible you are experiencing RAM failure. Here at frequal.com we had RAM go bad in a server recently. The failures were initially infrequent but became commonplace, preventing the server from functioning properly. Memtest86+ and a quick RAM replacement fixed the problem once recognized, but determining the cause was not easy.

Symptoms

When RAM starts going bad, what you write to an address is not what you get back when you read it later. This causes random corruption of data, programs that crash, and even kernel "oops"es and kernel panics. If it's only a small amount of RAM that is corrupt then few programs will fail, but something will eventually fail when that memory gets used.

Some kernel panics may bring the system to a halt. If they reoccur soon after the system is off for a while you can be more confident it isn't a heat-related issue.

Even small programs (like ls) may fail if they happen to use the affected memory. If they work once but then fail on a different attempt, this can be a sign of memory failure, since the memory used will be different from run to run.

Diagnosis

Once you suspect RAM is failing, I recommend using Memtest86+. The easiest way to use it is find an Ubuntu CD. One of the options when booting from the CD is to run a memory test. Let it run through at least one entire test suite, such that the Pass column increase to 1. Preferably let it run overnight to get many successful runs.

If Memtest shows errors, it is probably best to replace the RAM. New RAM is inexpensive and you can upgrade at the same time. With the new RAM in place, run Memtest once more to ensure that your memory errors are fixed.


Last modified on 1 Aug 2007 by AO

Copyright © 2024 Andrew Oliver