I’ve put up a number of posts recently regarding our ReadyNAS which largely stem from the fact that I am working on it extensively due to a total failure. I’m just going to list the timeline for the failure scenario and what happened. At this moment I am very glad I chose the Business Edition rather than the Pioneer Edition due simply to the 5 year Business class warranty which was a necessity and convenience recently. The warranty certainly paid for itself.
- ReadyNAS Pro Business Editiona was purchased in August 2010 with 8 WDEADS drives with TLER on and parking disabled (6 installed with 2 hot swaps). ReadyNAS was responsive with lots of capacity and high speed. Great!
- Played with more and more utilities and systems. ReadyNAS was slowed down over time and some services were removed at various points. Firmware updates were performed regularly. Drive use was somewhat extensive due to continuous video archiving from security system.
- Dropbox and Crashplan were installed, slowing system down and eating memory. Sometimes the ReadyNAS https admin panel was unresponsive or unreachable.
- ReadyNAS lack of responsiveness became dramatically worse. A FAN CAS message failure dialog showed up on health screen and showed a yellow warning with a correlating temperature discrepancy on AUX. At this stage, any changes made in the admin panel were not saved and variables and modules which were previously set became unset or failed directly.Error Message Below:“Disk 1 WDC WD20EADS-32S2B0 1863 GB , C / 32 F , Write-cache ON OK
Disk 2 WDC WD20EADS-32S2B0 1863 GB , C / 32 F , Write-cache ON OK
Disk 3 WDC WD20EADS-32S2B0 1863 GB , C / 32 F , Write-cache ON OK
Disk 4 WDC WD20EADS-32S2B0 1863 GB , C / 32 F , Write-cache ON OK
Disk 5 WDC WD20EADS-32S2B0 1863 GB , C / 32 F , Write-cache ON OK
Disk 6 WDC WD20EADS-32S2B0 1863 GB , C / 32 F , Write-cache ON OK
Fan SYS 958 RPM OK
Fan CPU 2136 RPM OK
Fan CAS 0 RPM Out of Spec
Fan RPM OK
Fan RPM OK
Temp SYS 49 C / 120 F [Normal 0-65 C / 32-149 F] OK
Temp CPU 13 C / 55 F [Normal 0-60 C / 32-140 F] OK
Temp AUX 3 C / 37 F [Normal 0-0 C / 32-32 F] Out of Spec”
- The lack of free memory and perhaps memory issues were thought to be at fault. Offline memory testing was performed and the memory tested fine. New memory was eventually purchased (PSD24G8002, see other post for details) and installed.
- The ReadyNAS continued to fail, be unresponsive and eventually became dead to all outside inputs except for a periodic ping responses. OS Re-install was performed at the direction of Netgear support. No actions remedied the problems. Offline disk testing was NOT performed but should have been in retrospect. Factory Default was not desired because we did not want to lose the information.
- A new ReadyNAS chassis was sent out. The drive holders have a slightly different design. Offline memory testing was performed and all was well (with the oem memory). Drives were moved to the new chassis. This unit was also unresponsive. A factory default was performed after it was determined the critical data could be salvaged from other sources (Crashplan & Dropbox)
- After factory default, disks 1 and 6 were found to be faulty as the ReadyNAS performed a disk test on RAID array creation. Later extensive testing at a computer found disk 1 to have excessive bad sectors and no problems with disk 6. Both WD20EADS drives were warranty replaced by Western Digital.
- An XRAID2 was rebuilt with the 4 remaining drives. Several days of memory testing with the new Patriot RAM were performed on both the old and new ReadyNAS devices with 100% pass rates. Patriot RAM was installed in new ReadyNAS.
- Warranty drives came in and were installed. ReadyNAS completed building the array.
- Patriot Ram was installed in new ReadyNAS and all is well at the moment. A review reveals that weekly file consistency checks were being performed as well as regular RAID scrubbing. Neither check caught the errors. A new procedure has been put in place where the ReadyNAS is brought offline and offline disk testing is performed monthly. In addition, every 6 months all drives are removed and extensive low-level testing performed at another workstation to validate disk function.
In the end, the errors seem to have stemmed from gradual malfunction of the ReadyNAS which masked drive issues. It is possible early drive issues began to cause the ReadyNAS problems. In either case, the failure is disconcerting. 8TB of data being archived was lost. Fortunately, we had the foresight to have this data concurrently duplicated at Dropbox and Crashplan which turn out to be the saviors here.