[SATLUG] RAID5 Recovery - ddrescue: 2nd round

David Labens adlabens at swbell.net
Mon Aug 24 06:29:01 CDT 2009


Good morning, all!  Day 2 of ddrescue is underway.  The program has been running for somewhere in the neighborhood of 20 hours, and it is apparently still making progress, although it is slow (but I'm patient and want it to run it's full course).  Here is the current read-out from the server:

root at RCH-SERVER:/# ddrescue --max-retries=3 /dev/sdb /dev/sda resclog1

Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:   250057 MB,  errsize:   1835 kB,  errors:      40
Current status
rescued:   250058 MB,  errsize:    442 kB,  current rate:        0 B/s
   ipos:   168577 MB,   errors:     864,    average rate:       23 B/s
   opos:   168577 MB
Copying bad blocks... Retry 3

It's on "Retry 3" and the command was "--max-retries=3" - so I presume that means it will be finished sometime today, while I am at work.  I was up a lot later last night than I wanted to be (not because of this, but the kids), so I may not do anything with it tonight except report how it finished - which will be the very least that I will do under any circumstances.  

Just like the Energizer Bunny - it's still going!

David Labens

San Antonio, TX

--- On Sun, 8/23/09, David Labens <adlabens at swbell.net> wrote:

From: David Labens <adlabens at swbell.net>
Subject: Re: [SATLUG] RAID5 Recovery - ddrescue: 2nd round
To: "The San Antonio Linux User's Group Mailing List" <satlug at satlug.org>
Date: Sunday, August 23, 2009, 9:09 PM

I HAD thought that the server was stuck in one place, but upon further review, and comparing screen captures along the way, I see that it is actually making process...

Notice that each of the 3 readings shows that the ipos & opos (matching) are actually changing and moving higher.  This is good as it shows me that the process is progressing.  However (& I'm not too concerned about this), the speed is showing an "average rate" of 53 B/s - 53 BYTES per Second (a hair over 400 kilobits per second) - about half the speed it was running when it was "splitting error areas" now that it's "copying bad blocks" - & it's just on "retry 1"

SO, I'm going to pretty much call it a night and let this thing "do that voo doo that it do so well" (to borrow and butcher a line), and I'll post whatever the screen says in the morning!  No use in trying to get it to go any faster when I'm certain it's already pedaling as fast as it can!

Here's the printout from progress on the current process:

root at RCH-SERVER:/# ddrescue --max-retries=3 /dev/sdb /dev/sda resclog1

Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:   250057 MB,  errsize:   1835 kB,  errors:      40
Current status
rescued:   250058 MB,  errsize:    442 kB,  current rate:        0 B/s
   ipos:   131264 MB,   errors:     864,    average rate:       53 B/s
   opos:   131264 MB
Copying bad blocks... Retry 1



root at RCH-SERVER:/# ddrescue --max-retries=3 /dev/sdb /dev/sda resclog1

Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:   250057 MB,  errsize:   1835 kB,  errors:      40
Current status
rescued:   250058 MB,  errsize:    442 kB,  current rate:        0 B/s
   ipos:   130459 MB,   errors:     864,    average rate:       54 B/s
   opos:   130459 MB
Copying bad blocks... Retry 1



root at RCH-SERVER:/# ddrescue --max-retries=3 /dev/sdb /dev/sda resclog1

Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:   250057 MB,  errsize:   1835 kB,  errors:      40
Current status
rescued:   250058 MB,  errsize:    442 kB,  current rate:        0 B/s
   ipos:    26575 MB,   errors:     864,    average rate:       76 B/s
   opos:    26575 MB
Copying bad blocks... Retry 1


David Labens

San Antonio, TX

--- On Sun, 8/23/09, David Labens <adlabens at swbell.net> wrote:

From: David Labens <adlabens at swbell.net>
Subject: Re: [SATLUG] RAID5 Recovery - ddrescue: 2nd round
To: "The San Antonio Linux User's Group Mailing List" <satlug at satlug.org>
Date: Sunday, August 23, 2009, 6:59 PM

Sam (& all),



Before I go any further than my last posting, let me show you what is
now (after several hours of errands & dinner) on the screen...



root at RCH-SERVER:/# ddrescue --max-retries=3 /dev/sdb /dev/sda resclog1



Press Ctrl-C to interrupt

Initial status (read from logfile)

rescued:   250057 MB,  errsize:   1835 kB,  errors:      40

Current status

rescued:   250058 MB,  errsize:    442 kB,  current rate:        0 B/s

   ipos:    26575 MB,   errors:     864,    average rate:       76 B/s

   opos:    26575 MB

Copying bad blocks... Retry 1





This makes it look like the initial error size of 1835 kB is now 442
kB, and this is better - 75% better - than it was originally - & down below a half- mB on a 250 gb drive.  So, I'm VERY happy with it's progress, so far.  But, it still has to finish!



But, it's still working, so I am still letting it do it's thing.  I'll update when new progress occurs.



THANK YOU!
David Labens

San Antonio, TX

--- On Sun, 8/23/09, Samuel Leon <satlug at net153.net> wrote:

From: Samuel Leon <satlug at net153.net>
Subject: Re: [SATLUG] RAID5 Recovery - ddrescue: 2nd round
To: "The San Antonio Linux User's Group Mailing List" <satlug at satlug.org>
Date: Sunday, August 23, 2009, 5:13 PM

I will have to step out soon and won't be back until 9pm. If you are confident that all of your data was properly copied then reconnect everything and bring the computer up.  Might also make sure that all your disks are linked to the right device names as needed (sda,sdb ect)

Then make sure that sdc atleast has the raid superblock on it:
mdadm --examine /dev/sdc
If that brings back what looks like a raid drive then we should be good (you could probably actually do that before you reconnect all of the other drives)

then force the array to reassemble:
mdadm --assemble --force /dev/md0 /dev/sde1 /dev/sda1 /dev/sdb1 /dev/sdc1

(make sure those drive numbers are right, I copied them from an earlier post. Also, if you don't get some output saying that the array has been started then you need to stop right here.  The output you should get should be similar to:
mdadm: forcing event count in /dev/hdd1(2) from 228 upto 232
mdadm: clearing FAULTY flag for device 1 in /dev/md0 for /dev/hdd1
mdadm: /dev/md0 has been started with 2 drives (out of 3).
)

That should bring it up in degraded mode with one failed drive. So then do:
mdadm -D /dev/md0

and that will list some info and at the bottom you will see what drive is still removed.  So add it to the array with:

mdadm --add /dev/md0 /dev/sdX

but if there is still more than one drive missing from the array at this point never use mdadm --add because something is wrong and that will make it worse.

If added correctly, the array should start to resync.  Let it finish and then and make sure the filesystem on /dev/md0 stays unmounted.  Once it is done resyncing check the filesystem (if it is ext3) with:
e2fsck -D -f /dev/md0

You will probably get some errors and have to press "y" a hundred times. Any files that are lost will be put in your lost+found folder and will be given their inode number as the file name.

Then you should be done.

Sam
-- _______________________________________________
SATLUG mailing list
SATLUG at satlug.org
http://alamo.satlug.org/mailman/listinfo/satlug to manage/unsubscribe
Powered by Rackspace (www.rackspace.com)
--
_______________________________________________
SATLUG mailing list
SATLUG at satlug.org
http://alamo.satlug.org/mailman/listinfo/satlug to manage/unsubscribe
Powered by Rackspace (www.rackspace.com)
--
_______________________________________________
SATLUG mailing list
SATLUG at satlug.org
http://alamo.satlug.org/mailman/listinfo/satlug to manage/unsubscribe
Powered by Rackspace (www.rackspace.com)


More information about the SATLUG mailing list