[SATLUG] Using the Data Recovery and HD maintenance tools

Robert Pearson e2eiod at gmail.com
Mon Jun 16 12:17:08 CDT 2008


On Mon, Jun 16, 2008 at 10:39 AM, herb cee <hc at lookcee.com> wrote:
> *** HD, maint & recovery
> Whew, thanks guys for the recent discussion on HD failure. I do know from a
> foot high stack of broke HDs that all HDs will fail at some future moment. I
> do not use A/C so all my equipment runs hot. I do try to stay backed up to a
> separate unit on my LAN but at times I could lose a month of generated data
> and changes. (BTW inside every HD is one or two of the strongest magnets in
> the world.)
>
> I now am trying to understand how to use the tools mentioned in the recent
> 'Data Recovery help!' thread, I would very much appreciate any help and
> pointers you may think of passing on. Below is good as I can do in
> portraying where I currently sit.
>
> Special thanks to Robert P, great job of digging ...
> **[Prevention Tools]
> smartctl is part of the smartmontools package
> http://smartmontools.sourceforge.net/
> "The smartmontools package contains two utility programs (smartctl and
> smartd) to control and monitor storage systems using the
> Self-Monitoring, Analysis and Reporting Technology System (SMART)
> built into most modern ATA and SCSI harddisks. In many cases, these
> utilities will provide advanced warning of disk degradation and
> failure."
> The key phrase is "provide advanced warning".
> An ounce of prevention is worth a pound of cure...
>
> The SATLUG email show several references to using Google and the
> search string "xcssa smartctl" to view examples of using smartctl. I
> was unable to find anything with that Google search string other than
> email references. XCSSA is at:
> http://xcssa.org/
> (X-otic Computer Systems of San Antonio) Lots of good information.
>
> *** Can someone familar with Ubuntu-Gnome please tell me how I can use this
> tool to know when I should think of replacing my HDs?
> So far I opened Synaptic PM and installed the smartmontools package and
> TestDisk, hdparm was already installed.
>
> I ran the command Tweeks suggested but it returned 'smart' not on. I read
> the man pages for smartctl, (I had this silly urge to print it out and go to
> my grove of Oaks and read it to them)
>
> I read: * LINUX: Use the forms "/dev/hd[a-t]" for IDE/ATA devices ....
> -s VALUE, --smart=VALUE
> Enables or disables SMART on device. The valid arguments to
> this option are on and off. Note that the command ´-s on´
> (perhaps used with with the ´-o on´ and ´-S on´ options) should
> be placed in a start-up script for your machine, for example in
> rc.local or rc.sysinit. In principle the SMART feature settings
> are preserved over power-cycling, but it doesn´t hurt to be
> sure.
>
> EXAMPLES
> smartctl -a /dev/hda
> Print all SMART information for drive /dev/hda (Primary Master).
>
> smartctl -s off /dev/hdd
> Disable SMART on drive /dev/hdd (Secondary Slave).
>
> smartctl --smart=on --offlineauto=on --saveauto=on /dev/hda
> Enable SMART on drive /dev/hda, enable automatic offline testing every
> four hours, and enable autosaving of SMART Attributes. This is a good
> start-up line for your system´s init files. You can issue this command
> on a running system.
>
> * I tried to turn smart 'on' .. here are the results ...
> herb at Celf:~$ sudo smartctl --smart=on --offlineauto=on --saveauto=on
> /dev/hda
> [sudo] password for herb:
> smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> *
> *Smartctl open device: /dev/hda failed: No such file or directory
> herb at Celf:~$
>
> I don't have a clue how to use the CLI it just plain scares me since I am
> aware that I dunno what I am doing! Can you tell me what I did wrong in the
> command string above please? Once I get 'smart' turned ON what should I run
> to test the drive?*

*** You need to run fdisk -l (that is the letter "l" for list) to show
what disks you have and their names.
*** On my system this returns:

[root at white-knight ~]# fdisk -l

Disk /dev/hda: 60.0 GB, 60040544256 bytes
255 heads, 63 sectors/track, 7299 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          29      232911   83  Linux
/dev/hda2              30        7300    58400401+   5  Extended
/dev/hda5              30         977     7614778+  83  Linux
/dev/hda6             978        2703    13864063+  83  Linux
/dev/hda7            2704        2768      522081   82  Linux swap / Solaris
/dev/hda8            2769        7300    36399352+  8e  Linux LVM

Disk /dev/hdb: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdb1   *           1       14593   117218241   83  Linux

*** Then when I run smartctl for /dev/hda I get:

[root at white-knight ~]# smartctl -a /dev/hda
smartctl version 5.38 [i586-mandriva-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     QUANTUM FIREBALLP AS60.0
Serial Number:    196033339106
Firmware Version: A1Y.1400
User Capacity:    60,040,544,256 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   5
ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
Local Time is:    Mon Jun 16 11:39:13 2008 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (  34) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  35) minutes.

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0029   100   253   020    Pre-fail
Offline      -       0
  3 Spin_Up_Time            0x0027   054   049   020    Pre-fail
Always       -       5868
  4 Start_Stop_Count        0x0032   093   093   008    Old_age
Always       -       4709
  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000b   100   093   023    Pre-fail
Always       -       0
  9 Power_On_Hours          0x0012   087   087   001    Old_age
Always       -       8765
 10 Spin_Retry_Count        0x0026   100   100   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0013   100   100   020    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   093   093   008    Old_age
Always       -       4686
 13 Read_Soft_Error_Rate    0x000b   100   100   023    Pre-fail
Always       -       0
195 Hardware_ECC_Recovered  0x001a   100   001   000    Old_age
Always       -       1750990032
196 Reallocated_Event_Count 0x0010   100   100   020    Old_age
Offline      -       0
197 Current_Pending_Sector  0x0032   100   100   020    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x001a   001   001   000    Old_age
Always       -       2184

Warning: device does not support Error Logging
SMART Error Log Version: 0
No Errors Logged

Warning: device does not support Self Test Logging
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7154         -
# 2  Short offline       Completed without error       00%      7154         -

Device does not support Selective Self Tests/Logging

*** And when I run smartctl for /dev/hdb I get:

[root at white-knight ~]# smartctl -a /dev/hdb
smartctl version 5.38 [i586-mandriva-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:     ST3120026A
Serial Number:    3JT0TWJX
Firmware Version: 3.06
User Capacity:    120,034,123,776 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Mon Jun 16 11:39:19 2008 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  85) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   064   054   006    Pre-fail
Always       -       134382187
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       17
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail
Always       -       256821318
  9 Power_On_Hours          0x0032   096   096   000    Old_age
Always       -       3998
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   020    Old_age
Always       -       1551
194 Temperature_Celsius     0x0022   039   055   000    Old_age
Always       -       39
195 Hardware_ECC_Recovered  0x001a   064   054   000    Old_age
Always       -       134382187
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   192   000    Old_age
Always       -       67
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age
Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age
Always       -       0

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 143 hours (5 days + 23 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 16 b8 6d f0  Error: ICRC, ABRT 1 sectors at LBA =
0x006db816 = 7190550

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 0f b8 6d f0 00      00:03:43.978  READ DMA
  c4 00 08 07 b8 6d f0 00      00:03:43.976  READ MULTIPLE
  ec 00 00 00 00 00 b0 00      00:03:43.975  IDENTIFY DEVICE
  c4 00 08 ff b7 6d f0 00      00:03:43.973  READ MULTIPLE
  c4 00 08 f7 b7 6d f0 00      00:03:43.972  READ MULTIPLE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3708         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

*** The main thing you are after here is a baseline "snapshot" to
compare future "snapshots" against.
*** Particularly if you start getting drive errors.

Here is a real good starter "howto/goby" for using smartctl:
"Checking Hard Disk Sanity With Smartmontools (Debian & Ubuntu)"
http://www.howtoforge.com/checking-hard-disk-sanity-with-smartmontools-debian-ubuntu

*** My SMART was on from the first try. Go figure...

*** CLI disk commands I commonly use every day are:"
fdisk -l  (show me what disks I have and their names)
df -h   (show me what disks are mounted and where and how much space
is used in common human format-MB,GB)
df -h . (the period "." says only show me the disk for the directory I
am currently in - very handy)
mount -l
mount
umount
mount -a

>
> Here are the other examples in the Man:
>
> smartctl -t long /dev/hdc
> Begin an extended self-test of drive /dev/hdc. You can issue this com-
> mand on a running system. The results can be seen in the self-test log
> visible with the ´-l selftest´ option after it has completed.
>
> smartctl -s on -t offline /dev/hda
> Enable SMART on the disk, and begin an immediate offline test of drive
> /dev/hda. You can issue this command on a running system. The results
> are only used to update the SMART Attributes, visible with the ´-A´
> option. If any device errors occur, they are logged to the SMART error
> log, which can be seen with the ´-l error´ option.
>
> smartctl -A -v 9,minutes /dev/hda
> Shows the vendor Attributes, when the disk stores its power-on time
> internally in minutes rather than hours.
>
> smartctl -q errorsonly -H -l selftest /dev/hda
> Produces output only if the device returns failing SMART status, or if
> some of the logged self-tests ended with errors.
>
> smartctl -q silent -a /dev/hda
> Examine all SMART data for device /dev/hda, but produce no printed out-
> put. You must use the exit status (the $? shell variable) to learn if
> any Attributes are out of bound, if the SMART status is failing, if
> there are errors recorded in the self-test log, or if there are errors
> recorded in the disk error log.
>
> Thanks gang
> herb
> --


More information about the SATLUG mailing list