Home Enterprise KIOXIA CM6 PCIe 4.0 SSD Review

KIOXIA CM6 PCIe 4.0 SSD Review

by Adam Armstrong

NVMe SSDs have taken over as the top performers across the board. They started off by making huge leaps and bounds over SAS and SATA drives, but in recent years have slowly been eking out a little more performance with each iteration. There is an upper limit with PCIe 3.0 and the current drives are hitting them. But now with the second-generation AMD EPYC 7002 CPUs, PCIe 4.0 is here, and KIOXIA is leveraging it with the CM6 SSDs.

NVMe SSDs have taken over as the top performers across the board. They started off by making huge leaps and bounds over SAS and SATA drives, but in recent years have slowly been eking out a little more performance with each iteration. There is an upper limit with PCIe 3.0 and the current drives are hitting them. But now with the second-generation AMD EPYC 7002 CPUs, PCIe 4.0 is here, and KIOXIA is leveraging it with the CM6 SSDs.

KIOXIA CM6

KIOXIA announced the new PCIe 4.0 drives, CM6 and CD6, as a demo back at the last Flash Memory Summit, back in the long-forgotten times of physical events. At the time, they were the first PCIe 4.0 SSDs and as of this writing, the CM6 SSD series may still be one of the few if only PCIe 4.0 SSDs for enterprise. The big deal with the new drives is going to be higher performance: quoted as hitting up to 6.9GB/s and 1.4 million IOPS read. Those are some impressive theoretical numbers. The drives also come with in-band NVMe- MI, persistent event log and namespace granularity.

KIOXIA CM6 bottom

The KIOXIA CM6 is a U.3 form factor, SFF-TA-1001 conformant allowing them to be used in tri-mode enabled backplanes. The CM6 comes in a wide range of capacities ranging from 800GB all the way to 30.72TB. The drives come in a read-intensive (CM6-R) and mixed use (CM6-V) flavor depending on users’ needs. On top of use cases specific models, there are a variety of secure version including a Sanitize Instat Erase (SIE), a Self-Encrypting Drive (SED), and a FIPS 140-2 (Level 2) model. The drive is dual ported to provide high availability as well.

KIOXIA CM6 side

For this review we are looking at a KIOXIA CM6-V at 6.4TB of capacity.

KIOXIA CM6 PCIe 4.0 Specifications

Model CM6-R (Read-Intensive) CM6-V (Mixed Use)
Form Factor 2.5-inch 15mm Z-height
Capacity1 960GB, 1.92TB, 3.84TB, 7.68TB, 15.36TB, 30.72TB 800GB, 1.6TB, 3.2TB, 6.4TB, 12.8TB
Interface PCIe Gen3 / 4, 1×4 and 2×2
Compliance PCIe 4.0 and NVMe 1.4
NAND Type KIOXIA BiCS FLASH96-layer 3D TLC
Sequential Read Gen3 = up to 3,500MB/s
Gen4 = up to 6,900MB/s
Gen3 = up to 3,500MB/s
Gen4 = up to 6,900MB/s
Sequential Write Gen3 = up to 3,100MB/s
Gen4 = up to 4,200MB/s
Gen3 = up to 3,100MB/s
Gen4 = up to 4,200MB/s
Random Read Gen3 = up to 800K IOPS
Gen4 = up to 1.4M IOPS
Gen3 = up to 800K IOPS
Gen4 = up to 1.4M IOPS
Random Write Gen3 = up to 155K IOPS
Gen4 = up to 170K IOPS
Gen3 = up to 290K IOPS
Gen4 = up to 350K IOPS
Power Consumption Active: 20W; Idle: <5W
Endurance 1 DWPD for 5 years 3 DWPD for 5 years
Uncorrectable BER 1 sector per 10^17 bits read
MTTF / AFR 2.5M hours / 0.35%
Operating Temperature 0 to 70C

Performance

Testbed

Our new PCIe Gen4 Enterprise SSD reviews leverage a Lenovo ThinkSystem SR635 for application tests and synthetic benchmarks. The ThinkSystem SR635 is a well-equipped single-CPU AMD platform, offering CPU power well in excess of what’s needed to stress high-performance local storage. It is also the only platform in our lab (and one of the few on the market currently) with PCIe Gen4 U.2 bays. Synthetic tests don’t require a lot of CPU resources but still leverage the same Lenovo platform. In both cases, the intent is to showcase local storage in the best light possible that aligns with storage vendor maximum drive specs.

PCIe Gen4 Synthetic and Application Platform (Lenovo ThinkSystem SR635)

  • 1 x AMD 7452 (2.35GHz x 32 Cores)
  • 8 x 64GB DDR4-3200MHz ECC DRAM
  • CentOS 7.7 1908
  • ESXi 6.7u3

PCIe Gen3 Application Platform (Lenovo ThinkSystem SR850)

  • 4 x Intel Platinum 8160 CPU (2.1GHz x 24 Cores)
  • 16 x 32GB DDR4-2666Mhz ECC DRAM
  • 2 x RAID 930-8i 12Gb/s RAID Cards
  • 8 NVMe Bays
  • VMware ESXI 6.7u3

PCIe Gen3 Synthetic Platform (Dell PowerEdge R740xd)

  • 2 x Intel Gold 6130 CPU (2.1GHz x 16 Cores)
  • 4 x 16GB DDR4-2666MHz ECC DRAM
  • 1x PERC 730 2GB 12Gb/s RAID Card
  • Add-in NVMe Adapter
  • Ubuntu-16.04.3-desktop-amd64

Being the first set of reviews on a new platform, we’ve included past drive results, which are close but not 100% apples to apples comparisons since they were testing on an older platform. Our synthetic test differences won’t have much skew in results, but the application workloads working on the single CPU AMD platform vs Quad CPU Intel platform may to some degree. In our MySQL tests one of the new Gen4 KIOXIA products did take the lead, but in SQL server latency was average. With only two Gen4 drives we’ve been able to publish around we don’t have a significant amount of comparable data, but it is something to take note of viewing these results. We’ve also ramped up our synthetic tests to take advantage of the faster SSDs, now showing test results with higher peak thread counts.

Testing Background and Comparables

The StorageReview Enterprise Test Lab provides a flexible architecture for conducting benchmarks of enterprise storage devices in an environment comparable to what administrators encounter in real deployments. The Enterprise Test Lab incorporates a variety of servers, networking, power conditioning, and other network infrastructure that allows our staff to establish real-world conditions to accurately gauge performance during our reviews.

We incorporate these details about the lab environment and protocols into reviews so that IT professionals and those responsible for storage acquisition can understand the conditions under which we have achieved the following results. None of our reviews are paid for or overseen by the manufacturer of equipment we are testing. Additional details about the StorageReview Enterprise Test Lab and an overview of its networking capabilities are available on those respective pages.

Application Workload Analysis

In order to understand the performance characteristics of enterprise storage devices, it is essential to model the infrastructure and the application workloads found in live-production environments. Our benchmarks for the KIOXIA CM6 are therefore the MySQL OLTP performance via SysBench and Microsoft SQL Server OLTP performance with a simulated TCP-C workload. For our application workloads, each drive will be running 4 identically configured VMs.

SQL Server Performance

Each SQL Server VM is configured with two vDisks: 100GB volume for boot and a 500GB volume for the database and log files. From a system-resource perspective, we configured each VM with 8 vCPUs, 64GB of DRAM and leveraged the LSI Logic SAS SCSI controller. While our Sysbench workloads tested previously saturated the platform in both storage I/O and capacity, the SQL test is looking for latency performance.

This test uses SQL Server 2014 running on Windows Server 2012 R2 guest VMs, and is stressed by Quest’s Benchmark Factory for Databases. StorageReview’s Microsoft SQL Server OLTP testing protocol employs the current draft of the Transaction Processing Performance Council’s Benchmark C (TPC-C), an online transaction-processing benchmark that simulates the activities found in complex application environments. The TPC-C benchmark comes closer than synthetic performance benchmarks to gauging the performance strengths and bottlenecks of storage infrastructure in database environments. Each instance of our SQL Server VM for this review uses a 333GB (1,500 scale) SQL Server database and measures the transactional performance and latency under a load of 15,000 virtual users.

SQL Server Testing Configuration (per VM)

  • Windows Server 2012 R2
  • Storage Footprint: 600GB allocated, 500GB used
  • SQL Server 2014
    • Database Size: 1,500 scale
    • Virtual Client Load: 15,000
    • RAM Buffer: 48GB
  • Test Length: 3 hours
    • 2.5 hours preconditioning
    • 30 minutes sample period

For our SQL Server transactional benchmark, the KIOXIA CM6 placed fourth overall with 12,633.6 TPS, though it was only 10.6 TPS under the top performer.

KIOXIA CM6 SQL TPS

With SQL Server average latency, the CM6 had an average latency of 5.5ms, the same as its cousin CD6 SSD.

Sysbench Performance

The next application benchmark consists of a Percona MySQL OLTP database measured via SysBench. This test measures average TPS (Transactions Per Second), average latency, and average 99th percentile latency as well.

Each Sysbench VM is configured with three vDisks: one for boot (~92GB), one with the pre-built database (~447GB), and the third for the database under test (270GB). From a system-resource perspective, we configured each VM with 8 vCPUs, 60GB of DRAM and leveraged the LSI Logic SAS SCSI controller.

Sysbench Testing Configuration (per VM)

  • CentOS 6.3 64-bit
  • Percona XtraDB 5.5.30-rel30.1
    • Database Tables: 100
    • Database Size: 10,000,000
    • Database Threads: 32
    • RAM Buffer: 24GB
  • Test Length: 3 hours
    • 2 hours preconditioning 32 threads
    • 1 hour 32 threads

Looking at our Sysbench transactional benchmark, the KIOXIA CM6 had 8,632 TPS again taking fourth in our comparable pack.

KIOXIA CM6 Sysbench TPS

With Sysbench average latency the CM6 took fourth once again at 14.82ms.

For our worst-case scenario latency (99th percentile) the CM6 stayed where it is comfortable, in fourth place, with 29.86ms.

VDBench Workload Analysis

When it comes to benchmarking storage devices, application testing is best, and synthetic testing comes in second place. While not a perfect representation of actual workloads, synthetic tests do help to baseline storage devices with a repeatability factor that makes it easy to do apples-to-apples comparison between competing solutions. These workloads offer a range of different testing profiles ranging from “four corners” tests, common database transfer size tests, to trace captures from different VDI environments. All of these tests leverage the common vdBench workload generator, with a scripting engine to automate and capture results over a large compute testing cluster. This allows us to repeat the same workloads across a wide range of storage devices, including flash arrays and individual storage devices. Our testing process for these benchmarks fills the entire drive surface with data, then partitions a drive section equal to 25% of the drive capacity to simulate how the drive might respond to application workloads. This is different than full entropy tests which use 100% of the drive and takes them into steady state. As a result, these figures will reflect higher-sustained write speeds.

Profiles:

  • 4K Random Read: 100% Read, 128 threads, 0-120% iorate
  • 4K Random Write: 100% Write, 128 threads, 0-120% iorate
  • 4K Random Read (high load): 100% Read, 512 threads, 0-120% iorate
  • 4K Random Write (high load): 100% Write, 512 threads, 0-120% iorate
  • 64K Sequential Read: 100% Read, 32 threads, 0-120% iorate
  • 64K Sequential Write: 100% Write, 16 threads, 0-120% iorate
  • 64K Sequential Read (high load): 100% Read, 64 threads, 0-120% iorate
  • 64K Sequential Write (high load): 100% Write, 64 threads, 0-120% iorate
  • Synthetic Database: SQL and Oracle
  • VDI Full Clone and Linked Clone Traces

Comparables:

In our first VDBench Workload Analysis, Random 4K Read, the KIOXIA CM6 turned around with an impressive performance peaking at 846,288 IOPS at a latency of 150µs. This puts the drive in the top spot.

KIOXIA CM6 4K read

The new PCIe 4.0 drives can withstand a higher load and we would be remiss if we didn’t push them a bit harder to see what they can do. So with a Random 4K Read high load the CM6 was able to peak at 1,507,564 IOPS at a latency of 337.9µs. Much better than its CD6 counterpart.

KIOXIA CM6 4K read high

For Random 4K write, took third overall. It ran with sub-100µs latency until about 490K IOPS and peaked at 548,169 IOPS at a latency of 226.4µs.

Random 4K write high load saw the CM6 go on to peak at 549,103 IOPS at a latency of 922µs trailing the CD6 this time.

Switching over to sequential workloads the CM6 had a chance to shine once more taking the top spot in 64K read at a peak score of 97,779 IOPS or 6.11GB/s at a latency of only 325µs.

KIOXIA CM6 64k readHigh load 64K sequential read saw similar to the placement in 4K read with the CM6 peaking at 101,018 IOPS or 6.3GB/s at a latency of 629µs.

KIOXIA CM6 64K read high64K write showed the CM6 with a strong peak score though performance fell off after peak, coming in at third. Peak performance was about 49K IOPS or 3.1GB/s at a latency of about 50µs.

High Load 64K sequential write saw the CM6 have a higher peak but drop in performance afterward. The CM6 peaked at about 49K IOPs or 3.1GB/s at a latency so low we can barely see it before dropping off.

Our next set of tests are our SQL workloads: SQL, SQL 90-10, and SQL 80-20. Starting with SQL, the KIOXIA CM6 took second place overall with a peak of 266,458 IOPS at a latency of 119µs.

 

For SQL 90-10 the CM6 took second once again with a peak performance of 265,276 IOPS at a latency of 119.2µs.

SQL 80-20 gave the CM6 chance to show off by coming in first with a peak performance of 263,819 IOPS 119.4µs.

Next up are our Oracle workloads: Oracle, Oracle 90-10, and Oracle 80-20. Starting with Oracle, the CM6 came in first once more with a peak performance of 271,230 IOPS at a latency of 128.6µs.

Oracle 90-10 had the CM6 place second with a peak performance of 202,341 IOPS at a latency of only 107.4µs.

The CM6 slide right by the competition to take first once again in the Oracle 80-20 with a peak of 206,733 IOPS at a low latency of 104.7µs.

Next, we switched over to our VDI clone test, Full and Linked. For VDI Full Clone (FC) Boot, the CM6 took the top spot with 223,668 IOPS and a latency of 153.5µs.

For VDI FC Initial Login the CM6 slipped to third with a peak performance of 154,836 IOPS at a latency of 189µs.

Our VDI FC Monday Login benchmark saw the CM6 stay in third with a peak of 98,867 IOPS with a latency of 158.4µs.

For VDI Linked Clone (LC) Boot, the KIOXIA CM6 went back to the top spot with a peak score of 115,058 IOPS at a latency 137.7µs.

VDI LC Initial Login is a bit hard to read in the chart but the CM6 landed in the middle of the pack with a peak of 38,848 IOPS at a latency of 202.4µs before dropping off some.

Finally, VDI LC Monday Login had the CM6 once again perform best with a peak score of 96,008 IOPS and a latency of 162.5µs.

Conclusion

The KIOXIA CM6 was one of, if not the, first PCIe 4.0 SSD for the enterprise. The new drives come with the promise of higher performance, in this case up to 6.9GB/s and up to 1.4 million IOPS. The CM6 is dual ported, adding a level of high availability to the drive. The drive comes in a wide range of capacity from 800GB up to a whopping 30.72TB with 9 capacity options in between. The CM6 has both a read-intensive and mixed-use model with 1 and 3 DWPD respectively. And the SSD comes with a variety of secure model options.

For performance we ran our usual barrage of Application Workload Analysis and VDBench with a few exceptions. We had to skip the Houdini test as the test platform is Intel and the KIOXIA drives would be handicapped by the Gen3 ports. On VDBench we added in a higher load test to stress the new drives a bit more since they are designed to handle it.

In our Application Workload Analysis, we ran SQL Server and Sysbench. With SQL Server the CM6 took fourth in both TPS and average latency with 12,633.6 TPS and 5.5ms, still very good score. With Sysbench the drive again hung out in fourth place across the board with 8,632 TPS, 14.82ms average latency, and 29.86ms in our worst-case scenario latency.

In VDBench the drive really shined. The CM6 was the best performer in several of our benchmarks. Basic highlights include 846K IOPS in 4K read, 1.5 million IOPS in 4K read high load, 548K IOPS in 4K write, 549K IOPS in 4K write high load, 6.1GB/s in 64K read, 6.3GB/s in 64K read high load, and 3.1GB/s in both 64K write and 64K write high load. SQL saw peaks of 266K IOPS, 265K IOPS in SQL 90-10, and 264K IOPS in SQL 80-20. Oracle gave us peaks of 271K IOPS, 202K IOPS in Oracle 90-10, and 207K IOPS in Oracle 80-20. VDI FC gave us 224K IOPS boot, 155K IOPS Initial Login, and 99K IOPS in Monday Login. VDI LC saw 115K IOPS boot, 39K IOPS Initial Login, and 96K IOPS Monday Login.

This review and that of the CD6 takes a specific look at PCIe 4.0 and the future of storage devices as more enter the market. There aren’t many server vendors producing front to back support for PCIe 4.0, with Lenovo being the only one in our lab as of this writing. Lenovo was quick to seize on all of the advantages the 2nd generation AMD EPYC 7002 processors offers, anticipating storage products like the KIOXIA CM6. But for KIOXIA it puts them in the interesting spot of being ahead of others but the full potential of their drive only being met with newer, AMD based servers (until Intel decides to jump in the game as well). For now, the CM6 will still work in legacy gear and will be ready to unleash more performance as companies upgrade.

KIOXIA Enterprise SSDs

Discuss on Reddit

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | Facebook | RSS Feed