Thursday, 6 June 2013

How to check hard disk failure in linux

smartctl is a command line utility designed to perform SMART tasks such as printing the SMART self-test and error logs, enabling and disabling SMART automatic testing, and initiating device self-tests. First, make sure S.M.A.R.T. support is enabled in the BIOS. Next, run the following command to see if your hard disks support S.M.A.R.T technology or not:

[root@vellore ~]# smartctl -i /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST3250318AS
Serial Number:    9VM8EC2E
LU WWN Device Id: 5 000c50 01a4a684d
Firmware Version: CC38
User Capacity:    250,058,268,160 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Jun  6 10:19:02 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

If not enabled by default, To enable SMART, run:

[root@vellore ~]# smartctl -s on -d ata /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

Run overall-health self-assessment test, enter:

[root@vellore ~]# smartctl -d ata -H /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

A sample output from failing hard disk: 

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   044   033   045    Old_age   Always   FAILING_NOW 56 (96 110 58 25)
 
The following will provide even more information about failing hard disk:

[root@vellore ~]# smartctl --attributes --log=selftest /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       221288423
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   093   093   020    Old_age   Always       -       7332
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   083   060   030    Pre-fail  Always       -       205719784
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8550
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   020    Old_age   Always       -       3611
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   046   046   000    Old_age   Always       -       54
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       68720526384
189 High_Fly_Writes         0x003a   095   095   000    Old_age   Always       -       5
190 Airflow_Temperature_Cel 0x0022   066   048   045    Old_age   Always       -       34 (Min/Max 20/37)
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       -       34 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   038   032   000    Old_age   Always       -       221288423
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   192   000    Old_age   Always       -       2109
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       269139830654254
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1152744560
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       177103330

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      8550         -
# 2  Short offline       Completed without error       00%      8550         -