Samsung has announced a new prototype key-value SSD that is compatible with the first industry standard API for key-value storage devices. Earlier this year, the Object Drives working group of Storage Networking Industry Association (SNIA) published version 1.0 of the Key Value Storage API Specification. Samsung has added support for this new API to their ongoing key-value SSD project.

Most hard drives and SSDs expose their storage capacity through a block storage interface, where the drive stores blocks of a fixed size (typically 512 bytes or 4kB) and they are identified by Logical Block Addresses that are usually 48 or 64 bits. Key-value drives extend that model so that a drive can support variable-sized keys instead of fixed-sized LBAs, and variable-sized values instead of fixed 512B or 4kB blocks. This allows a key-value drive to be used more or less as a drop-in replacement for software key-value databases like RocksDB, and as a backend for applications built atop key-value databases.

Key-value SSDs have the potential to offload significant work from a server's CPUs when used to replace a software-based key-value database. More importantly, moving the key-value interface into the SSD itself means it can be tightly integrated with the SSD's flash translation layer, cutting out the overhead of emulating a block storage device and layering a variable-sized storage system on top of that. This means key-value SSDs can operate with much lower write amplification and higher performance than software key-value databases, with only one layer of garbage collection in the stack instead of one in the SSD and one in the database.

Samsung has been working on key-value SSDs for quite a while, and they have been publicly developing open-source software to support KV SSDs for over a year, including the basic libraries and drivers needed to access KV SSDs as well as a sample benchmarking tool and a Ceph backend. The prototype drives they have previously discussed have been based on their PM983 datacenter NVMe drives with TLC NAND, using custom firmware to enable the key-value interface. Those drives support key lengths from 4 to 255 bytes and value lengths up to 2MB, and it is likely that Samsung's new prototype is based on the same hardware platform and retains similar size limits.

Samsung's Platform Development Kit software for key-value SSDs originally supported their own software API, but now additionally supports the vendor-neutral SNIA standard API. The prototype drives are currently available for companies that are interested in developing software to use KV SSDs. Samsung's KV SSDs probably will not move from prototype status to being mass production products until after the corresponding key-value command set extension to NVMe is finalized, so that KV SSDs can be supported without needing a custom NVMe driver. The SNIA standard API for key-value drives is a high-level transport-agnostic API that can support drives using NVMe, SAS or SATA interfaces, but each of those protocols needs to be extended with key-value support.

Comments Locked

48 Comments

View All Comments

  • shayne.oneill - Friday, September 6, 2019 - link

    "Relational databases are built from key-value stores"

    Uh, no. No they are not they most definately are not KV stores. You can implement a KV store on a relational database but under the hood they look nothing like a KV store.
  • sercand - Friday, September 6, 2019 - link

    Most RDBS are really built from key-value stores like RocksDB. For example, CockroachDB has a great blog post about how to SQL engines work with key-value stores: https://www.cockroachlabs.com/blog/sql-in-cockroac...
  • dudedsy@gmail.com - Thursday, September 5, 2019 - link

    I mean, this is a rocksDB replacement, relational databases like MySQL still need a storage layer, which often is rocksDB. This doesn't replace the database, it replaces the storage layer interface to the drive.
  • lkcl - Friday, September 6, 2019 - link

    https://www.linkedin.com/pulse/lies-damn-statistic...

    be careful. rocksdb is a pile of s**t.
  • FunBunny2 - Sunday, September 8, 2019 - link

    "be careful. rocksdb is a pile of s**t."

    ah, dreadful partisanship there.
  • lkcl - Sunday, September 8, 2019 - link

    not really - do the research, you'll find that most benchmarks are done in an incompetent (and thus misleading) fashion, such as only running the test for 30 minutes and yet claiming full read coverage, when, plainly, at the data rate published *by* the tester, 30 minutes times the SSD's data rate is physically impossible to read the entire dataset. yet, somehow, they claim to have managed it... turns out that there's a large RAM cache which they failed to mention, which invalidates the entire benchmark.

    however in the case of RocksDB it's much worse than that: there are technical design flaws which it takes a huge amount of expertise to even fully understand. for that, you want the analysis of one of the world's leading experts - a core developer on OpenLDAP - Howard Chu.
  • trissylegs - Thursday, September 5, 2019 - link

    Cockroach DB implements a Postgres database on top a Key-vale store (using RocksDB).
    So you should be able to use these with Cockroach (if implemented)
  • insufferablejake - Friday, September 6, 2019 - link

    Not really. Relational DBs expose a relational schema, this doesn't dictate how they actually persist their data. For eg. if B-Trees are used [ref. https://www.sqlite.org/fileformat.html] then you can replace the 'pages' or nodes in a BTree with the 2MB pages on an SSD.
  • Urthor - Saturday, September 7, 2019 - link

    Exactly. A relational database just means "expose a relational schema" but people keep thinking that all relational databases "must" use the same under the hood technologies as MySQL or else they are not relational.

    As long as you expose that relational scheme your actual method of abstracting physical media to the relational tables is totally irrelevant.
  • Sivar - Thursday, September 5, 2019 - link

    I am curious how the move from fixed block-oriented to dynamic-key-oriented storage was implemented in firmware. Isn't page size (smallest programmable unit) fixed at manufacture time? 4 bytes seems a little small for a physical page of flash memory.

Log in

Don't have an account? Sign up now