Cheaper SSD reliability, continued - Jens Axboe's blog
Jun. 22nd, 2009
09:22 pm - Cheaper SSD reliability, continued
So after twice promising me to get info from 'an engineer' in a 10 day time span, I pushed OCZ again today. The answer is that it's likely "bad blocks" on the drive and they offered to exchange it. Now, I don't know the internal secret sauce to their flash chip striping, but a bad flash block may explain the issue. And personally I care a lot less about this specific drive than the larger issue at hand, which is: Can we trust these SSD drives? From this experience, the answer so far is unequivocally no. If the flash block/page/erase block is indeed part (or partially bad), I want to know! Don't just send me all ones in the data, that's not acceptable. If their drive/fw doesn't error handling, I'm quite sure the customers would like to know this fact.
Apparently Indilinx does the firmware for these drives for at least two manufacturers. Given that the SSD consumer market is steaming ahead at this point, I'm guessing there's a huge rush to reduce the time to market. A safe bet would be that the firmware is perhaps a little too rushed in this case. Coincidentally, the bad drive is running fw 1.10. Version 1.30 lists this little juicy fix among the others: "Read fail handling". I'll try 1.30 on this drive and re-read the data, just to see what happens.
For the time being, I can't recommend using Indilinx based drives anywhere except for throw away data. If they can't even tell you when the data has gone bad, then they really can't be used for much else. At least use btrfs with data checksums enabled, then you could catch a problem like this. Yes I did run btrfs on my Vertex, and yes I did disabled datacsum to avoid the extra CPU use on my laptop... My personal recommendation would be to stick with Intel or Samsung SSD drives where data matters.