Jens Axboe's blog - Cheaper SSD reliability?
Jun. 18th, 2009
12:10 pm - Cheaper SSD reliability?
In earlier blog entries, I praised the Intel X25-E for its performance. I also have high hopes for the reliability of the drive. By reliability, I refer to data integrity as well as endurance. It's not that I have much information to back this up, but I know that Intel have put a lot of testing into them.
So while the X25-E is extremely nice, it's also very pricey. Recently I needed a few more SSD's for testing purposes, and despite public begging on this blog, Intel hasn't sent me any more drives. As I'm sure most are well aware, the cheaper SSD drives were mostly utter crap. Even most expensive SSD drives have been crap, mostly due to using that infamous JMicron flash controller that would have done more good as nice sand on the beach instead of being manufactured into silicon. Now there are other alternatives though, and the Indilinx controller looked like a good option. OCZ recently introduced a Vertex series that uses this controller. Not only does it perform better, it's also not 80s ATA tech. It has NCQ and TRIM support, which is very nice.
I went out and bought a few of these for testing. One I put in the laptop and the other in a test box. Performance is good, even random 4kb writes actually work. This is where the crap SSD's fall apart. However, as opposed to the Intel drive, I didn't have a lot of faith in the reliability of these drives. Early firmwares were plagued with errors, and even the just released v1.30 firmware fixes issues that seem like rather basic functionality. An example of that would be mishandling ATA commands with zero sector count. But I decided to give them the benefit of the doubt.
A few days ago, I was working on the laptop at night as usual. Pushing out a few changes from my block git repository, git complained of a corrupt pack file. The pack file in question was from when I lasted repacked the repo back in February. It's read-only, about 380MB in size, and thus hasn't been written to since it was created some 4 months ago. I usually don't keep backups of my laptop data, since it's just a development environment and all my source is safe with git on a public server. As it just so happened, I had tested the new btrfs format a week earlier. In doing so, Chris asked me to keep an image of the drive so we could debug any potential problems with the new format. So I went and fetched the pack file from the backup and compared the two. The backup file was, as expected, fine. Looking into the nature of the corruption (basically finding out who to blame for the corruption), I found out that the corruption started 64519680 bytes into the file. So that's nicely 512b aligned, but not 4kb aligned. The corruption spanned 16KB in total. So far, so good. What I found out next was even more interesting: every other byte in file was correct, every other byte was 0xff!
That type of corruption just reeks of drive problems. I reported this issue to OCZ about a week ago. First level support quickly replied and passed the issue on to the engineers, but I have yet to hear anything from that side. I've kept the drive as-is if they want to inspect it. I'm not keeping my hopes up though, and I'm glad I'm not the OCZ engineer tasked with fixing it.
Meanwhile I put the other Vertex in the laptop and recreated my git tree. No issues seen so far, but suffice to say that my confidence level in these drives aren't that high. I'll be keeping backups if I put anything interesting on the laptop!