Lazy Raid - Sunday

Sometimes technology is like magic. I wrote the underlying code for this program. I tested it, iterated over it, and added new features as I went. But still at the end of the day when I purposefully delete a file and then run the command to recover that file from the parity blocks and other files on different drives. I get positively giddy when the file reappears. It's like magic, except a well understood process that's merely being obscured behind a command line interface.

For testing purposes I created 3 drives of 5Gb each. Then placed on them some large files totalling about 5.3Gb in size. Then I added them to the LazyRaid configuration, told it to generate parity bits for the drives and it spit out about 2.7Gb of parity bits spread across all three drives. That's single drive redundancy using roughly 1/2 the required space. You can get even bigger space savings when using more drives (ParitySize = FileSize/(NumDisks-1)).

The code is available on github and will require you to compile a ruby C extension for your machine.
https://github.com/pcorliss/LazyRaid

Challenges:
  • Ruby is slow - I posted about some challenges mid-week with Ruby's lack of an XOR function for Strings. The code I posted was slow but workable for small datasets. However when working with files that can be up to 2Gb in size a slow XOR function just isn't going to cut it. I ended up writing a Ruby C extension to take care of the heavy lifting since it seems Ruby just wasn't up to the task. I'll post a little more with some speed comparisons next week. Perhaps I can stir up the hornets nest in the Ruby community to generate some traffic and perhaps a more elegant solution.

Features Missed:
  • Double Disk Failure Redundancy - RAID6 uses Galois field calculations to do parity calculations in addition to XOR calculations. Unfortunately I wasn't really up to the task of implementing that this week considering I was working only on basic functionality and struggling with Ruby speed limitations.
  • FUSE Integration - As the project moved forward integrating it into the OS seemed less important and I headed instead towards running it as a command line app.
  • Background Parity Calculations - I just ran out of time on this one. Although adding in some sort of IO monitoring and throttling wouldn't be too difficult it wasn't as high on the priority list as some of the other items.