This post gives a short overview of how memory diagnostic tools help you troubleshoot RAM problems.

Physical memory failures do not happen often, but when they occur they are fatal. Sometimes they won't just crash the operating system but will also cause hard disk corruption. In the worst case, a destroyed database is the end result. This is just because one of the billions of memory cells confused a zero with a one. ECC (Error-Correcting Code) memory improves fault tolerance, but it is expensive and so are the supporting motherboards. And, of course, ECC doesn't guarantee that RAM failures won't occur.

Memory.Diagnostics.Memory.Modules

Memory diagnostic tools exist for Windows. However, testing RAM while a full-blown operating system has been loaded doesn't make much sense because too many memory cells can't be probed that way. Thus, it is important to use a standalone memory diagnostic tool that allows you to boot up from a CD or a USB stick.

Of course, you can also use the memory diagnostic tool of the BIOS. You can typically choose between a quick and a thorough memory test. However, even if the BIOS confirms that your memory is okay, this does not guarantee that all memory cells are working properly.

For one, BIOS memory diagnostic tools usually can't find intermittent memory problems—that is, problems that only occur at specific conditions, such as when two adjacent memory cells affect each other. Moreover, old memory modules in particular sometimes only fail at certain temperatures. Frequent temperature changes make silicon brittle, causing micro fractures on the chip. Those micro fractures often only produce problems at very specific temperatures because of the uneven expansion of the chip at different temperatures.

Hence, I recommend starting the memory diagnostic tool when the machine is still cold and then running the test for 20 minutes or so until the computer has reached its operating temperature. Unfortunately, even then you can't be 100% sure that all memory cells are working properly. Since memory diagnostic tools use different algorithms to probe memory cells, it can't be wrong to obtain multiple opinions.

The ideal way for diagnosing memory is to write a certain value (1 or 0) to a memory cell, then write the opposite value to all adjacent cells and probe the original cell to see if it still holds the right value. This method ensures that writing to a memory cell doesn't affect adjacent cells, which is often the cause of intermittent errors.

The problem is that different chip designs make it difficult to determine adjacent memory cells. Memory diagnostic tools thus work with strategies that approximate this testing method. Usually they fill the memory with certain patterns, verify that the pattern has been written correctly, and then do the same with the pattern's complement. This still doesn't guarantee that a memory chip is flawless, but it increases the likelihood considerably.

If the memory diagnostic tool finds an error, it can be unclear sometimes which memory module contains the corrupt cells. In this case, the best way to find the faulty module is to test all modules independently by inserting them one by one into the computer. If this isn't possible, for instance because the board requires an even number of modules, you can rotate modules and see if the memory diagnostic tool reports the error at different addresses. Another option is to replace a single module and then check if the error recurs.

If you are uncertain whether a memory module has flaws or not, replacing it is usually the best option. It certainly depends on the importance of the corresponding machine, but in most cases the crash of a productive server is more expensive than buying new RAM. In any case, I recommend testing the memory of old machines every now then, for instance if you have to reboot the server anyway. In my next post, I will review a free memory diagnostic tool that can be used for this purpose.

Read 4sysops without ads by becoming a member!

Your question was not answered? Ask in the forum!

0
Share
4 Comments
  1. Bob 9 years ago

    Two free standalone memory testers:

    Memtest86: http://www.memtest86.com/

    Memtest86+: http://www.memtest.org/

    0

  2. steve 9 years ago

    We had an 8GB system that passed memtest86+ but clearly had problems in Windows. We used the Windows 7 memory test (Win 7 F8 - repair option) and it failed the test in a few minutes. We later found a few bent address pins on the CPU socket was the cause. Anyone know of a good memory test program to test 4GB+ systems?

    0

  3. Bob, thanks! Just posted a review about Memtest86 Do you know if Memtest86+ supports systems with more than 4GB?

    Steve, did you try Memtest86 (without +). It supports more than 4GB.

    0

  4. Bob 9 years ago

    As far as I know, Memtest86+ will test all memory reported by the BIOS. According to several posts on the Memtest86+ forum, you might run into issues where memory from 3.2-4MB address range is remapped to 8+MB. The standard suggestion is to alway test memory modules individually to isolate the problem to a specific module. If they work individually but fail together, it may be a motherboard or PSU issue instead.

    @steve: I too have a G45 board with 8GB (4x2GB) that has sporadic issues that seem to be memory related, especially when resuming from S5 sleep. The problems disappear if I only use 4GB, so I'll take a look at the socket for bent contacts. Thanks for the suggestion.

    0

Leave a reply

Your email address will not be published. Required fields are marked *

*

© 4sysops 2006 - 2020

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account