IO Traces: AnandTech Storage Bench (ATSB)

Our in-house suite of storage traces has been a staple of our SSD reviews for almost a decade. The current versions of the AnandTech Storage Bench (ATSB) Light and Heavy tests were introduced in 2011, but soon proved to be insufficient for stressing newer SSDs that were faster and larger. The Destroyer was added in 2013 to provide a much longer and more intense test, and in 2015 we started running the Light and Heavy tests twice: once on an empty drive, and again on a full drive to show the worst-case performance drop.

While these IO traces were recorded a number of years ago, upon regression analysis to a modern environment, the mix of random vs sequential transfers are almost identical. IO patterns involved in ordinary consumer PC usage haven't changed drastically in the past decade. Downloading and installing a recent AAA video game involves a lot more total IO, but things like the mix of random vs sequential transfers are still pretty similar.

The way we have enabled recorded the tests also removes any historic limitations of queue depth limits - IO tracing at the OS level enables the capture of dozens or hundreds of IOs at once, such that playback on modern NVMe systems feel the brute force of the IO required. In practice, queue depths during ordinary desktop usage are usually in the single digits, and on fast NVMe drives like Intel's Optane SSD the queue depths experienced by the drive will tend to be even a bit lower. In the case of TRIM or flush commands, these are relatively uncommon during ordinary consumer workloads, and are often highly variable and inconsistent in their timing, and so those are not present in the test. This was more an issue with older SSDs anyway. Similarly with enterprise drives, most of which enable their own data management.

The IO traces used for the ATSB tests originally recorded the precise timing of every IO operation. During the playback process, the idle times in the trace are cut short to a maximum of 25ms each. This is necessary for the test duration to be reasonable, but it does mean that drives don't get long stretches of idle time to use for garbage collection and flushing SLC caches. But in aggregate, the drives are still spending a lot more time idle than busy: during a relatively quick 7 hour run of The Destroyer, a fast NVMe SSD will only spend a bit more than 2 hours busy with one or more pending IO operation. The rest of the time is accounted for by a very large number of short idle periods.

AnandTech Storage Bench IO Traces
  The Destroyer Heavy Light
Reads 38.83 million 2.17 million 373k
Writes 10.98 million 1.78 million 460k
Total IO Operations 49.8 million 3.99 million 832k
Total GB Read 1583.02 GiB 48.63 GiB 17.97 GiB
Total GB Written 875.62 GiB 106.32 GiB 23.25 GiB
Typical Test Duration 7 hours (NVMe)
12 hours (SATA)
1h 15m 25m

Aside from the updated hardware and Windows versions, the one big change we're making to how the ATSB tests are run is a change for the sake of better automation. Previously, the drives being tested were secure erased before each ATSB trace playback, and for the Light and Heavy test runs on a full drive we would erase the drive, fill it, then play back the trace. But secure erasing drives is a hassle when Windows is involved, so this meant manually rebooting to Linux five times for the five trace runs. Our new method is to run all the trace tests in sequence: starting from a freshly-erased empty drive, we run the Light test, then the Heavy and The Destroyer, then fill the drive with sequential writes and run the Light and Heavy tests again. This means that our "empty drive" test runs of the Heavy test and The Destroyer are now running on a lightly-(ab)used drive rather than a freshly-erased drive, and that may bring down the scores somewhat. The full drive run of the Heavy test will also be somewhat affected by the mix of IO performed by the preceding Light test that disturbs the preconditioning of the full-drive sequential write pass. This eases our testing process and saves my sanity somewhat.

As always, we're still giving the drives plenty of idle time (10 minutes) between test runs and after the sequential write pass. The speed of today's NVMe drives mean that the automated sequence of running all five stages of the ATSB tests now takes about the same ~12 hours that was average for a SATA drive to complete just The Destroyer. Batching the ATSB tests together makes for more effective time usage when running these tests overnight, and less sleep lost to manual intervention.

The performance scores presented for each ATSB test include the overall average throughput during the test, and several subscores breaking down average and 99th percentile latency for both read and write operations. We also report the total energy usage during the test, which is a good indication of overall efficiency. Low-power drives that take a long time to complete the test and drives that complete it quickly but with high power draw will both suffer in the energy usage rankings.

ATSB The Destroyer
Average Data Rate
Average Latency Average Read Latency Average Write Latency
99th Percentile Latency 99th Percentile Read Latency 99th Percentile Write Latency
Energy Usage

 

ATSB Heavy
Average Data Rate
Average Latency Average Read Latency Average Write Latency
99th Percentile Latency 99th Percentile Read Latency 99th Percentile Write Latency
Energy Usage

 

 

ATSB Light
Average Data Rate
Average Latency Average Read Latency Average Write Latency
99th Percentile Latency 99th Percentile Read Latency 99th Percentile Write Latency
Energy Usage

 

 

 

 

PCMark 10 Storage Benchmarks

At the end of 2019, UL released a major update to their popular PCMark 10 benchmark suite, adding storage performance tests that had been conspicuously absent for over two years. These new storage benchmarks are similar to our AnandTech Storage Bench (ATSB) tests in that they are composed of traces of real-world IO patterns that are replayed onto the drive under test.

PCMark 10 provides four different storage benchmarks. The Full System Drive, Quick System Drive and Data Drive benchmarks cover similar territory to our ATSB Heavy and Light tests, and all three together take about as long to run as the ATSB Heavy and Light tests combined. The Drive Performance Consistency Test is clearly meant to one-up The Destroyer and also measure the worst-case performance of a drive that is completely full. Due to time constraints, we are not yet attempting to add the Drive Performance Consistency Test to our usual test suite. This is a line we're put in to ensure timely arrival of reviews.

PCMark 10 Storage Tests
Test Name Data Written
Data Drive 15 GB
Quick System Drive 23 GB
Full System Drive 204 GB
Drive Performance Consistency 23 TB + 3x drive capacity

The primary subscores for the PCMark 10 Storage benchmarks are average bandwidth and average latency for read and write IOs. These are combined into an overall score by computing the geometric mean of the bandwidth score and the reciprocal of the latency score.

 

Full System Drive Benchmark

The Full System Drive Benchmark covers a broad range of everyday tasks: booting Windows and starting applications and games, using Office and Adobe applications, and file management. The "Full" in the name does not mean that each drive is filled or that the entire capacity of the drive is tested. Rather, it only indicates that all of the PCMark 10 Storage sub-tests are included in this test.

PCMark 10 Storage - Full
Overall Score Average Bandwidth Average Latency

 

Quick System Drive Benchmark

The Quick System Drive Benchmark is a subset of the Full System Drive Benchmark, running only 6 out of the 23 sub-tests from the Full test.

PCMark 10 Storage - Quick
Overall Score Average Bandwidth Average Latency

 

Data Drive Benchmark

The Data Drive Benchmark is intended to represent usage a secondary or portable drive may be subject to. This test simulates copying around files, but does not simulate the IO associated with launching and running applications from a drive.

PCMark 10 Storage - Data
Overall Score Average Bandwidth Average Latency

 

Test Suite Overview Synthetic Tests: Basic IO Patterns
Comments Locked

70 Comments

View All Comments

  • thestryker - Monday, February 1, 2021 - link

    The explanations along with all of the data was a great way to show insight into both the why and how for the new bench setup.

    The only thing I'd like to see is either 900p/905p Optane drives added in and wishful thinking would be p5800x. Even though they're at relatively unattainable prices due to low volume, discontinued and/or being enterprise they do represent nand alternatives with a rather different performance profile. Hopefully Intel will opt to bring another consumer version out once they have broad PCIe 4+ support across their consumer product line.
  • Billy Tallis - Monday, February 1, 2021 - link

    The 900P and 905P are in line for their turn on the testbed. Those two and the WD Black AN1500 will be the first drives I use the new Quarch PAM for, since that currently only has a fixture for the PCIe add-in card form factor. (I did also run some tests on one of the Optane drives while experimenting to develop this suite, but they haven't run through the final version of the full suite.)

    I do think it's a lot more likely that I'll get a P5800X sample than a Micron X100. But it wouldn't surprise me if Intel holds off on press samples until they're ready for it to be reviewed on an Intel platform.
  • thestryker - Monday, February 1, 2021 - link

    Great to hear, I figured that they just hadn't had their turn yet.

    I've assumed that Intel + PCIe 4+ drives were all going to be waiting on Intel's own platforms which is why I still hold out hope for a future consumer Optane.
  • Greg100 - Monday, February 1, 2021 - link

    I too would love to see tests of the Intel P5800X and Micron X100 on this new test platform.
    In fact, these drives interest me most for OS and software installation.

    The 900P and 905P are rather a historical curiosity nowadays due to their slow sequential transfer rates.
  • p1esk - Tuesday, February 2, 2021 - link

    AFAIK 905p still destroys any modern drive in random reads/writes. Also, it does not degrade like other drives when their cache is full.
  • Oxford Guy - Tuesday, February 2, 2021 - link

    Shame about the ridiculous pricing, though.

    Intel also refused to validate its 'storage' drives like the 800p, 900p, and 905p for use as a disk cache. This site posted some interesting results for the 118GB 800p. It made it seem like that drive might actually be relevant for people on a tight budget who need a lot of storage and don't mind the noise of a mechanical hard disk. But, Intel's site has very clear statements saying that only the 'memory' Optane products (which seem to be very pointless for consumers) are supported for cache use.
  • ksec - Monday, February 1, 2021 - link

    OH Thank You. Coming here just to comment on P5800X. I am wondering on the power usage the idle time of SSD is so much higher than what I expected.
  • pexxie - Monday, February 1, 2021 - link

    Wow, very thorough. Phenomenal work.
    Would there be any interest in testing synchronized writing? (i.e. bypassing the device's volatile write-back cache). In linux this can be done with the oflags sync or dsync.
    E.g. 4K Random Writes: "dd if=/dev/urandom of=/testfile bs=4k count=10k oflag=sync"
    Without the oflag, or using oflag=direct; you'd be using the write-back cache, which looks great but comes with reliability risk. See a write-up here: https://www.postgresql.org/docs/current/wal-reliab...
  • linuxgeex - Tuesday, February 2, 2021 - link

    That's not 4K random writes you're testing. It's sequential writing of pseudorandom data generated by the kernel.
  • pexxie - Wednesday, February 3, 2021 - link

    I think I get what you're saying/writing. It's random data, but not being written to random locations on the disk. Any clue how to do the latter? Gosh, that's a plot-thickener of note for me. :-P How random is "random"? :-O

Log in

Don't have an account? Sign up now