SETTINGS
Appearance
Language
About

Settings

Select a category to the left.

Appearance

Theme

Light or dark? Choose how the site looks to you by clicking an image below.

Light Dark AMOLED

Language

Preferred Language

All content on blog.claranguyen.me is originally in UK English. However, if content exists in your preferred language, it will display as that instead. Feel free to choose that below. This will require a page refresh to take effect.

About

"blog.claranguyen.me" details

Domain Name: claranguyen.me
Site Version: 1.0.0
Last Updated: 2020/12/29
Storage Upgrade! Western Digital easystore 14 TB × 3
Wednesday, November 25, 2020

Introduction

My 42 TB storage is running out and Black Friday is around the corner. Deals are already out online. This sounds like a pretty good time for an upgrade. I need something that can last for a few more years. The plan? Same as the last time. Shuck external drives to get high capacity drives for cheap. Can't be a data hoarder without the capacity to store data, right?

The setup from 2017

Let's take a trip back for a sec, so we can compare past to present. Those WD (Western Digital) easystore drives have consistently been a nice way to get high capacity drives for cheap. Back in 2017, I needed more storage because some external 2 and 3 TB drives weren't enough. I wanted something overkill. Something that will last me for a few years. So I bought 4 of the 8 TB models for $160 each. That's just a bit above $20/TB when tax is considered. I have tweets with pictures covering these and what shucking them looks like. You can see those below:

In all 4 of those, there was a red WD 8 TB internal drive which can be plugged into a computer. It is then treated as an internal hard drive as opposed to an external one. To my surprise, they are capable of handling me recording gameplay to them over USB 3.0. Raw 1920x1080 60 fps footage via Fraps and Dxtory. So I kept one drive to use as an external drive, and popped the rest into my server. I had a laptop at the time as my gaming computer, so making that last drive an internal one was not an option.

"So, why do you want so much storage Clara?"

For context on why I want so much storage, I am a data hoarder. I am obsessed with data. When I play games, I record every match. In university, I salvage and back up all data from as many courses and semesters as possible. I am into photography as well, and always capture in RAW. Those files are multiples larger than your average JPG. I don't just store them, though. Recording and shooting in RAW gives me full control of my content in post, which I prefer. Data hoarding is a hobby, and quite something I enjoy doing. But it's also a way of archiving my life. Organising it, watching it grow over time, and looking back is nice.

The Deal

Now then, let's jump back to 2020. 8 TB is kind of puny now. There's higher capacity options out there. I've seen up to 18 TB offered on Best Buy's site. In fact, there's the 18 TB external drive. At first, I thought this had the best deal, so I bought two of them for $330 ($18.33/TB). A few days later, I checked back and saw this magic:

Black Friday sales do this kind of thing a lot. It's kind of funny to see the 14 TB drive cheaper than then 12 TB one. Another funny example was on Amazon where a 500 GB Samsung SSD was $4 more than the 250 GB variant. But I digress. When I saw this, I cancelled my order of the 2 18 TB HDDs and went with 3 of the 14 TB ones. They were even available for pickup that very same day. Sweet.

They intentionally limited the number of purchases to 1 per customer. The way around this is to make a Best Buy Business Account. It lets you purchase up to 3 via that. The price came out much better than those 18 TB drives would have. $190 × 3 is $570. The original deal I had on the 18 TB drives was 2 × $330, being $660. Saving $90 and getting 42 TB rather than 36 TB was a steal. Plus, it was around $13.57/TB.

Boxes holding the 14 TB HDDs.

So, what now?

Drive Dumping

I'm contemplating on how to configure the drives. However, while I do that, I did want to experiment on something. So I hopped on Arch Linux and blasted dd on the drives. But, rather than wiping them clean, I'm doing the opposite. I wanted to see how compression would be if I backed up an almost empty drive (they come with installers for Windows and Mac). In my Christmas Deathmatch Production Procedure post, I mentioned a 4.63 GB file being compressed down to a mere 773 KB. So I wanted to beat that. As an experiment, it's pretty useless, I know. But it would be nice to have a raw image of the drive's original state upon purchase.

Let's assume those three drives are connected and are identified as /dev/sda, /dev/sdb, and /dev/sdc. Let's also assume you are the superuser (run su...).

UNIX Command
dd if=/dev/sda bs=1M | pv -s $(blockdev --getsize64 /dev/sda) | gzip -9 > "wd14tb_SERIAL_NUMBER.img.gz"
Obviously, I don't have a drive lying around that can hold 3 disk images of 14 TB. So compression has to be applied on-the-fly while reading the bytes off the disk directly. UNIX Shell makes this easy. Just pipe dd into gzip via the |. Why gzip? It's for the sake of speed. I'll extract and pipe it into another algorithm like bz2 or xz later. Trying to run xz on the piped output of dd does work. But it's more CPU intensive, and slowed down the transfer rate. Dumping these drives would've taken 4 days rather than 23 hours if I took that route.

Watching this work via tmux, running it, top, and ls combined with watch was pretty cool too:

Example of running tmux to observe compression progress live.


After some time, it finished. These were running in parallel so having my computer sit for a day was enough to get all three drive contents dumped. The speed could've been faster, I suppose. But it'll do.

dd and pv output post-transfer of a single drive.


Hilariously, trying to get the compression ratio and the original size from gzip directly fails because the size causes an integer overflow:

14000519643136 % 4294967296 = 3221225472. Oops, someone used a uint32_t.


Take note, the serial numbers are edited out. So I've replaced it with SERIAL_NUMBER_1, SERIAL_NUMBER_2, and so on. With that noted, here's a fa (fl -a) of the directory with all three drive image dumps:

GZip'd filesizes for all 3 drive dumps.


Interesting. They were all not tampered with prior to being dumped, and they all contain the same exact files. So I'm guessing there's some other very minor differences between them that is causing the different file sizes.

Right. That was phase 1. Now for phase 2. Let's extract each of these and repack them into bz2. This gz→bz2 repacking is as simple as:

UNIX Command
gunzip < "wd14tb_SERIAL_NUMBER.img.gz" | pv -s 14000519643136 | bzip2 -9c > "wd14tb_SERIAL_NUMBER.img.bz2"
Sure, it's possible to just bzip2 the gzip'd file. But compressing a file twice is almost never a good idea. So, we have to extract it and then let bzip2 recompress it. Again, one of the beauties of piping is that we don't need a 14 TB drive to hold the decompressed file midway. It's just passed into bzip2.

So, why use bzip2 over xz? Simple. I'm under the assumption most of this data is just 00 bytes repeating over and over. This is one of the very few scenarios where bz2 beats every other algorithm, and by a fairly large amount. If you want a ratio comparison of pure 00's being crammed into each of these algorithms, here you go:

File Params Size (bytes) Ratio
8GiB.bin 8589934592 1:1
8GiB.bin.gz -9 8336315 1030.42:1
8GiB.bin.bz2 -9 6030 1424533.09:1
8GiB.bin.xz -9e 1249556 6874.39:1

So yeah. bz2 is objectively better in this scenario. And I'm pretty sure that xz is better in almost every other case.

With that out of the way, my assumption about the bytes was correct... so behold:

Finalised gz→bz2 results.

Beautiful. Around a 1190709:1 compression ratio for all three drives (original size: 14000519643136 bytes), and a 1155x smaller file compared to the gzip'd version. It feels redundant and useless, but now I have image files of the original states that these drives were in upon shipment. And, I have them in a very compact state. I'm curious what is different between them. But, I'm not going to go into that in this blog post.

Drive Shucking

Now we get to the fun part. Let's shuck those drives.

These were somewhat more difficult to shuck compared to the earlier models. Honestly, it could just be that my memory is bad. There weren't many pins though. I just slipped some guitar picks through and then a screwdriver and it was over. Here's the inside of one of them:

It's a WD140EDFZ. Notice something? It's not explicitly stating that it's a WD Red drive this time. Instead, it's white-labelled. Supposedly, even back in 2017, some of the easystores contained a white label instead of red label. There's some posts regarding these on r/DataHoarder. I'm not going to go into an analysis of it myself.

For the curious, here's a screenshot from CrystalDiskInfo:

5400 RPM. Hmm. I've read reports that it can hit up to 7200 RPM when it's under load. But, again, I'm not going to go into an analysis of it to prove or disprove that. Since I'm just looking for raw storage capability, this is good. Also, the Power On Count is "5". This screenshot is from the first time I have personally plugged this drive up and powered it on.

Anyways, I eventually got the other drives shucked and extracted the other hard drives. Here's some pictures of those.

There were reports online that you would have to block a specific pin for the HDD to work as an internal drive. For me, this was not the case. I plugged them in and saw my 14 × 3 TB of storage. I formatted them to btrfs and started structuring data from previous drives.

As for what to do with this storage, raw archival. My media drive has been running low due to a recent interest in multiple friends recording multi-perspective videos for my YouTube channel. This'll keep me going for a few more years. I'll upgrade once again in a few years when inevitably more storage space is needed. Redundancy also has to be considered.




Clara Nguyễn
Hi! I am a Vietnamese/Italian mix with a Master's Degree in Computer Science from UTK. I have been programming since I was 6 and love to write apps and tools to make people's lives easier. I also love to do photography and media production. Nice to meet you!


Blog Links
Post Archive
Affiliates/Cool People
Nigoli's Blog
Raas's Blog