This is a text-only version of the following page on https://raymii.org:
---
Title       : 	Linux software raid, rebuilding broken raid 1
Author      : 	Remy van Elst
Date        : 	14-04-2014
URL         : 	https://raymii.org/s/blog/Linux_software_raid_rebuilding_broken_raid_1.html
Format      : 	Markdown/HTML
---



Last week Nagios alerted me about a broken disk in one of my clients testing
servers. There is a best effort SLA on the thing, and there were spare drives of
the same type and size in the datacenter. Lucky me. This particular data center
is on biking distance, so I enjoyed a sunny ride there.

<p class="ad"> <b>Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:</b><br><br> <a href="https://leafnode.nl">I'm developing an open source monitoring app called  Leaf Node Monitoring, for windows, linux & android. Go check it out!</a><br><br> <a href="https://github.com/sponsors/RaymiiOrg/">Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.</a><br><br> <a href="https://www.digitalocean.com/?refcode=7435ae6b8212">You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days. </a><br><br> </p>


Simply put, I needed to replace the disk and rebuild the raid 1 array. This
server is a simple Ubuntu 12.04 LTS server with two disks running in raid 1, no
spare. Client has a tight budget, and with a best effort SLA not in production,
fine with me. Consultant tip, make sure you have those things signed.

The `_` in the `cat /proc/mdstat` tells me the second disk (`/dev/sdb`) has
failed:

    
    
    Personalities : [raid1] [raid6] [raid5] [raid4]
    md0 : active raid1 sda1[0] sdb1[1]
          129596288 blocks [2/2] [U_]
    

`U` means up, `_` means down [[source]][2]

First we remove the disk from the RAID array:

    
    
    mdadm --manage /dev/md0 --remove /dev/sdb1
    

Make sure the server can boot from a degraded RAID array:

    
    
    grep BOOT_DEGRADED /etc/initramfs-tools/conf.d/mdadm
    

If it says true, continue on. If not, add or change it and rebuild the initramfs
using the following command:

    
    
    update-initramfs -u
    

(Thank you [Karssen][3])

We can now safely shut down the server:

    
    
    shutdown -h 10
    

Replacing the disk was an issue on itself, it is a [Supermicro 512L-260B][4]
chassis where the disks are not in a drive bay, rather they are screwed in from
the bottom. Therefore the whole server needs to be removed from the rack (no
rails...) when replacing the disk.

Normally I would replace them while the server is on, but this server has no hot
swap disks so that would be kind of an issue in a full rack.

After that, boot the server from the first disk (via the BIOS/UEFI). Make sure
you boot to recovery mode. Select the root shell and mount the disk read/write:

    
    
    mount -o remount,rw /dev/sda1
    

Now copy the partition table to the new (in my case, empty) disk:

    
    
    sfdisk -d /dev/sda > sfdisk /dev/sdb
    

This will erase data on the new disk.

Add the disk to the RAID array and wait for the rebuilding to be complete:

    
    
    mdadm --manage /dev/md0 --add /dev/sdb1
    

This is a nice progress command:

    
    
    watch cat /proc/mdstat
    

It will take a while on large disks:

    
    
    Personalities : [raid1] [raid6] [raid5] [raid4]
    md0 : active raid1 sda1[0] sdb1[1]
          129596288 blocks [2/2] [U_]
          [=>...................]  recovery = 2.6% (343392/129596288) finish=67min speed=98840K/sec
    
    unused devices: <none> 
    

   [1]: https://www.digitalocean.com/?refcode=7435ae6b8212
   [2]: https://raid.wiki.kernel.org/index.php/Mdstat#.2Fproc.2Fmdstat
   [3]: http://blog.karssen.org/2013/01/04/booting-an-ubuntu-server-with-a-degraded-software-raid-array/
   [4]: http://www.supermicro.com/products/chassis/1U/512/SC512L-260.cfm

---

License:
All the text on this website is free as in freedom unless stated otherwise. 
This means you can use it in any way you want, you can copy it, change it 
the way you like and republish it, as long as you release the (modified) 
content under the same license to give others the same freedoms you've got 
and place my name and a link to this site with the article as source.

This site uses Google Analytics for statistics and Google Adwords for 
advertisements. You are tracked and Google knows everything about you. 
Use an adblocker like ublock-origin if you don't want it.

All the code on this website is licensed under the GNU GPL v3 license 
unless already licensed under a license which does not allows this form 
of licensing or if another license is stated on that page / in that software:

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

Just to be clear, the information on this website is for meant for educational 
purposes and you use it at your own risk. I do not take responsibility if you 
screw something up. Use common sense, do not 'rm -rf /' as root for example. 
If you have any questions then do not hesitate to contact me.

See https://raymii.org/s/static/About.html for details.