With the advent of solid-state-disk technology, shrinking costs of high speed volatile memory, the low cost of SATA, and the reliability of SAS, optimal organization and integration of new storage technologies has become more difficult. Opportunities exist to place frequently accessed “hot” data on lower latency, faster media, while leaving rarely accessed data on higher latency, lower cost media.
By Kimberly Robinson, Performance Engineer, LSI Corporation
With the advent of solid-state-disk technology, shrinking costs of high speed volatile memory, the low cost of SATA, and the reliability of SAS, optimal organization and integration of new storage technologies has become more difficult. Opportunities exist to place frequently accessed “hot” data on lower latency, faster media, while leaving rarely accessed data on higher latency, lower cost media.
With all these choices available to IT Storage professionals, there is an enormous opportunity to innovatively use cost, performance and capacity metrics to determine the ideal location for user data. This article will discuss how storage tiering can significantly improve performance and reliability in mixed storage environments, complementing and enhancing the host operating system cache, while optimizing costs.
Today’s servers provide many different functions, and while we can make generalizations, each application produces its own unique workload characteristic. In addition, performance needs depend on the current loading and any QOS requirements. While, storage environments have more options than ever before, allowing for better customization, at the same they add more complexity and make storage performance capacity planning more difficult. Performance capacity planning requires knowledge of application I/O traits, capacity & performance growth requirements, disk & storage performance characteristics, data protection needs, and company budget.
More Options = More Control
Modern storage controllers have a plethora of options: new and compound RAID types, premium features, advanced cache options, and variants of hardware offloading to suit every budget. Today’s advanced embedded processors have made intelligent storage controllers even more capable, allowing them to extend their capabilities and transform as new technologies emerge.
Disk technologies are no exception to the boom of new options. Serial Attached SCSI (SAS) was designed to integrate both SATA and SAS so both interfaces can be combined to create a custom storage backend based on individual cost and performance needs. SATA’s popularity lies primarily in its excellent cost per capacity metric, however from a performance standpoint, it provides the lowest overall performance. SAS provides significantly higher performance and improved reliability, but at a higher cost. Another option is Solid State Disk (SSD) technologies which come in both SATA and SAS interfaces. SSD’s provide astoundingly higher random performance than rotating media, but at a higher price point.
Adding to the complexity is the performance results for different RAID types. Optimizing for your workload requires understanding your specific I/O characteristics and knowing how to map that into the ideal RAID type based on your availability needs.
If we focus on RAID 10, you can see clearly that some disk types are better suited for different applications, with SSDs costing on average about 6.5x more than 15K RPM 6Gb SAS drives. Yet not all applications provide 6.5x increase in actual real world performance.
Every IT professional probably has at one point or another asked the question, “how many rotating disks does it take to deliver the same performance as a high performing SSD?” It’s a great thought process, however in the real world, it is likely that only a portion of the actual storage capacity will be accessed at any given time. Cache architectures have been successfully designed based on this assumption for decades. What if you could build your storage out of different storage mediums with different cost and performance characteristics?
It’s Getting Too Hard
Storage vendors recognize that with the prevalence of non-uniform media architectures, that tiering provides the best of all worlds. Storage tiering is a simple concept; place the most frequently used data on the fastest available media, while leaving cold data on slower media. Tiering is different from caching in that the capacity of all participating logical disks can be used for user data storage. While this is not a new concept, it traditionally is not a part of the storage intelligence, but SSD’s disruptive technology has brought about new opportunities.
Let’s look at an example of how storage tiering could help in a database environment. The ACME corporation is designing a new SQL Server, and based on their past experience they know the following information.
- 4 Terabytes of storage
- 3% is hot (~ 125GB) and accessed 65% of the time.
- 6% is accessed intermittently (~250GB) 25% of the time
- The sum-rest is cold data accessed 10% of the time.
- The database is accessed in 8KB sizes, with a read to write ratio of about 2:1.
- Eight slots are available for disks.
An ideal cost sensitive solution that gets us to 4TB would be to create a logical device that provides the required performance solution for each tier of data, both in terms of I/O’s per second, and response times.
Let’s consider the homogenous disk alternatives.
Below is a summary comparison of the three storage infrastructure options. Clearly, the tiered option not only provides a lower cost per database transaction, but produces over six times more IOPs capability than a SATA only solution, and more than three times the pure SAS solution, with more capacity than the other proposals.
Many solutions can be built based on performance, cost, capacity, or real-estate limitations. This is only one example that clearly highlights the cost benefits of a tiered solution. Of course this could be done manually, presuming: you already know exactly which files will be highly utilized; have the ability to separate them physically onto different media; and the hot data is not transient or dynamic.
Pros & Cons of Storage Tiering
Storage tiering offers the best of all worlds. By leveraging multiple types of media, costs and performance can be optimized, and previous server real estate can be saved. Consider this – to achieve the same number of database IOPs possible in the tiering example would require over fifty SATA drives, significantly increasing the power and rack space requirements. Intelligent tiering can allow for a dynamic environment where frequently accessed data is continuously and automatically placed on the fastest media. Possibilities may even exist where the most critical data can be placed on high availability volumes, or data accessed by geographically distant sites can be copied to local storage facilities.
Despite the benefits of tiering, there are a few drawbacks to consider. Although the job of identifying and properly storing the hot data is done automatically, building a proper storage subsystem which meets your current and growing requirements still should be custom designed by a storage professional. Another potential disadvantage is that under a storage tiering model, although the logical volume would appear as a single disk, the volume may be broken across multiple physical disk groups. By using hardware RAID protection, the possibility of data loss can be reduced.
We are at the era of a perfect storm of technology: dramatically increasing storage capacity needs, more disk options than ever before, higher performance demands from the rising popularity of digital business transactions, increasing processing density, and the need for greater protection of our most valuable asset, data. Tiering allows you to take advantage of the low cost SATA storage costs, the security and reliability of enterprise SAS and the high performance of SSDs all in one bundle.
Kimberly Robinson works as a performance engineer for LSI Corporation’s Engenio Storage Division. She has been working on optimizing enterprise storage solutions for major OEMs for over ten years.