Following this guide you'll be able to create and do various operations (like add a new disk, replace a disk, do recovery etc...) on a hard disk based RAID 5 array.
1st identify your disks from /dev/disk/by-id/
Setup array --
for i in <your identified disk names in /dev/disk/by-id/, space separated> do cryptsetup --perf-no_read_workqueue --perf-no_write_workqueue --sector-size 4096 -d - -c aes-xts-essiv:sha256 --key-size 256 create raid-$i /dev/disk/by-id/$i; done
Enter and remember your passphrase -- there is no option to forget it.
mdadm -C /dev/md/archive -l 5 --home-cluster=archive --homehost=any -z <your target disk size in KB>K -p left-symmetric -x 0 -n 3 -c 512K --data-offset=8K -N archive -k resync /dev/mapper/raid-{<your identified disk names in /dev/disk/by-id/, comma separated>}
Chose your <your target disk size in KB> in KB carefully. Otherwise in the future you may not be able to add marginally smaller disks. Study the market on available disk sizes (the exact disk size, not approximate) and set this value accordingly. For eg. for 10TB HDDs, this value a value of 9765623976K is safe enough to add a 10 TB disk from another vendor. -c 512K -- this stripe size may not be optimal for your setup; there maybe other values which may improve performance.
Create FS --
mkfs.xfs -m rmapbt=0,reflink=0 /dev/md/archive
Integrity testing --
Note that raid5 and raid6 do not check data consistency on assembly automatically (therefore it’s not going to detect corrupt data). I've found that XFS's crc32 feature is also not good enough for the purpose (it can only detect major corruption). Therefore you must use any intrusion detection solution (like Aide) for this and regularly do a test using aide for all the data of the archive.
You may also use mdadm --action=check /dev/md/archive or both aide and --action=check.
Aide setup --
Version needed v0.19 minimum.
This is the config file of aide.conf that you need to use --
database_in=file:<aide dir path>/init.dbgz
database_out=file:<aide dir path>/current-next.dbgz
database_attrs=sha3_512
gzip_dbout=true
warn_dead_symlinks=true
# RAID is faster with multithread, but too many threads will slow it down.
num_workers=<your specific values>
report_url=file:<aide dir path>/report-latest.log
report_level=added_removed_entries
report_format=json
report_detailed_init=false
report_append=true
<mount path of your RAID array> ftype+sha3_512
<aide dir path> is any place of your liking .
Fill in your RAID array with some data (once you assemble it) and initialize aide (TO BE RUN ONCE ONLY) --
aide -i -c <aide dir path>/aide.conf -L info
CURDATE=`date +%s`
mv <aide dir path>/init.dbgz <aide dir path>/${CURDATE}.dbgz
sed -i -r "s|^database_in=.*|database_in=file:<aide dir path>/${CURDATE}.dbgz|" <aide dir path>/aide.conf
sed -i -r 's|^database_out=.*|database_out=file:<aide dir path>/current-next.dbgz|' <aide dir path>/aide.conf
The additional commands using sed and mv is a mechanism to retain old aide databases and reports for the record which maybe useful if things go wrong and also to 'rotate' the database.
Verify/report --
aide -C -c <aide dir path>/aide.conf -L info
mv <aide dir path>/report-latest.log <aide dir path>/report-`date +%s`.log
Update command --
aide -u -c /home/de/docs/Architecture/Software/archive_RAID/aide.conf -L info
CURDATE=`date +%s`
mv /home/de/docs/Architecture/Software/archive_RAID/current-next.dbgz /home/de/docs/Architecture/Software/archive_RAID/${CURDATE}.dbgz
sed -i -r "s|^database_in=.*|database_in=file:/home/de/docs/Architecture/Software/archive_RAID/${CURDATE}.dbgz|" /home/de/docs/Architecture/Software/archive_RAID/aide.conf
mv /home/de/docs/Architecture/Software/archive_RAID/report-latest.log /home/de/docs/Architecture/Software/archive_RAID/report-${CURDATE}.log
List contents in aide database --
aide --list -c /home/de/docs/Architecture/Software/archive_RAID/aide.conf -L info
To assemble --
First assemble RO always and disassemble in case your work was only for reading. Then check /proc/mdstats if a resync is pending and plan appropriately if it is pending (See 'In case resync=pending --'). You can keep the RAID array in RO, but NEVER write to it until a resync is complete.
Occasionally rewrite all of the disks to prevent a bitrot using command --
dd=/dev/sd... of=/dev/sd... bs=1M conv=notrunc iflag=fullblock
This command is safe to run even if there is a power loss (I've tested this).
Next assemble the array RO and verify using aide and Run 'Force fsck -- ', then proceed to the next disk.
Do a full SMART tests occasionally using smartctl
for i in <your identified disk names in /dev/disk/by-id/, space separated> do cryptsetup --perf-no_read_workqueue --perf-no_write_workqueue --sector-size 4096 -d - -c aes-xts-essiv:sha256 --key-size 256 create raid-$i /dev/disk/by-id/$i; done
'Force fsck --' occasionally.
mdadm -A -o /dev/md/archive /dev/mapper/raid-*
-o is optional to assemble the array RO
Check /proc/mdstat for issues
mount -o logbufs=8,logbsize=256k,noquota,noatime,ro /dev/md/archive /mnt/archive
,ro is optional to mount the FS readonly.
Occasionally using aide sometimes.
Intermittently check 'Force fsck -- '
To write data --
I had done power failure tests to ensure that the existing data remains intact in even of a powerloss during a write operation.
Regenerate database using aide after getting the archive to RO; this will also generate a report. In the report, check for changed checksums of old files; in case of changes, start a recovery operation.
To stop the RAID array --
Mount a crashed XFS FS RO (without replaying the journals) --
This is useful in case you want to read your data quickly but the XFS FS will not mount without some long recovery operation.
mount -o logbufs=8,logbsize=256k,noquota,noatime,ro,norecovery /dev/md/archive /mnt/archive
In case resync=pending --
This message can be seen in /proc/mdstats.
To recover from this status mount array RO
Check database using aide (see above commands).
If everything is correct, resync the RAID array (this can be done by normally mounting the RAID array RW). NOTE: this'll take a long time.
I've tested with power failure during a resync operation and does not cause any data loss.
If database is incorrect, follow 'Action plan after failed inconsistency test (as per aide or mdadm --check) --'
To replace a failed disk --
mdadm -S /dev/md/archive
To replace a disk (for e.g. you determined that it's about to fail) --
powerloss during rebuilding/recovery was tested and it does not cause any data loss as per the tests.
1st assemble the array and then check using aide
mdadm /dev/md/archive --fail <device which is about to fail Device must be /dev/mapper/raid-*>
mdadm -S /dev/md/archive
execute 'Force assemble a failed array (rw) -- '
Check using aide.
Setup the new disk using cryptsetup
mdadm /dev/md/archive -a <new disk /dev/mapper/raid-<new disk name as setup in cryptsetup>
Stop the array once recovery is complete and poweroff the device.
Verify using aide
Run 'Force fsck -- '
To add a new disk --
powerloss during rebuilding/recovery/reshape operation was tested and it does not cause any data loss as per the tests.
Update aide database
1st assemble the array
Setup new disk using cryptsetup
mdadm /dev/md/archive -a <new disk /dev/mapper/raid-<new disk name as setup in cryptsetup>
mdadm /dev/md/archive -G -n <increase count by 1>
Wait for resync to complete
verify using aide.
Force assemble a failed array (ro) --
mdadm -A -o -f -R /dev/md/archive /dev/sde /dev/sdd /dev/sdc
Force assemble a failed array (rw) --
mdadm -A -f -R /dev/md/archive /dev/sde /dev/sdd /dev/sdc
Force fsck (mdadm level) --
mdadm --action=check /dev/md/archive
dmseg and /sys/block/md<int>/md/mismatch_cnt (non 0 value) to check for issues.
NOTE: This takes a long time.
Action plan after failed inconsistency test (as per aide or mdadm --check) --
Verify the corruption --
mdadm --action=check /dev/md/archive
Start the array using commands 'Force assemble a failed array (ro) -- ' by removing 1 block device at a time and check data using aide. Once you've found the culprit drive (i.e. by removing the drive aide does not complaint), remove it using procedures 'To replace a disk (for e.g. you determined that it's about to fail) -- ' and then as per your findings, you may add a new HDD or add the removed drive after destroying data in it (after verifying that the HDD is fixed) using the same commands as in 'To replace a disk (for e.g. you determined that it's about to fail) -- ' (i.e. continue after removal procedures).
get details of an array --
mdadm -D /dev/md/archive
No comments:
Post a Comment