TCQ, RAID, SCSI, and SATA

StorageReview takes the first of several comprehensive looks at tagged command queuing and its implications for the desktop as well as the server world. Does TCQ rate? How does it mesh with RAID arrays? Is SATA TCQ as effective as SCSI TCQ? Find the answers to these questions and more in SR’s latest!

Introduction

Over the last few years, Western Digital has maintained a virtual vice-lock on the high-performance, high-capacity desktop and enthusiast markets. The venerable WD Caviar series has combined enviable speed and capacity with reasonable prices. However, aside from a relatively obscure, short-lived SCSI line, when it came to the lucrative enterprise arena the firm simply watched from a distance as titans such as Seagate, Maxtor, and Hitachi battled for market share.

A little over a year ago WD tested the enterprise waters with the introduction of the world’s first 10,000 RPM ATA drive, the Raptor WD360GD. The Raptor paired SCSI-class mechanics with the new and relatively inexpensive Serial ATA interface in an attempt to undercut the rather hefty premiums that SCSI subsystems demanded. StorageReview’s performance results, however, revealed that while the WD360GD delivered world-class single-user results, its multi-user performance remained unimpressive when contrasted with existing 10k RPM SCSI units.

The WD360GD lacked a key element that the SCSI world has enjoyed for years- tagged command queuing (TCQ), a feature that intelligently reorders requests to minimize actuator movement. In September of 2003, Western Digital announced the follow-up Raptor WD740GD, a second-generation unit that brought a host of improvements to the line. Though the doubling of the Raptor’s capacity to 74 gigabytes is the most visible improvement, the most intriguing undoubtedly is the implementation of TCQ.

TCQ In Brief

Not to be confused with operating-system reordering and optimization, tagged command queuing is a hardware-level process designed to streamline the delivery of data in highly-random accesses under heavy loads. Without TCQ, a drive can only accept a single command at a time. It thus operates on a first-come, first-serve basis, completing requests in the order they are received. This is not always the most effective way to service data requests, especially in an intensive, non-localized environment.

Through the process of tagged command queuing, a host adapter adds special tags to individual commands. The drive itself, privy to its own physical layout of sectors across three dimensions, can take into account rotation and seek distances and reorder commands to serve them more efficiently. Requested data is thus returned to the controller in a more streamlined manner; it can then use the additional information it added earlier to transparently return the data to the operating system.

Consider the diagram to the left. In a traditional, non-queued paradigm, the drive would accept the request for data piece A, move the actuator and retrieve it, accept the request for B, retrieve it, then move to piece C. A drive that can buffer and queue requests, however, would be able to retrieve A, then opt to retrieve C first, followed by B, resulting in a net savings of time in completing these three requests.

TCQ must be supported by both the controller and the hard drive itself. It was introduced to the SCSI world as early as 1990 and was formally codified into the SCSI-2 standard by 1994. The feature rapidly proved itself invaluable in the world of multi-user servers and is today consistently deployed across virtually all host adapters and disks. Likewise, TCQ was formally implemented in the 1998 ATA-4 standard. Unlike SCSI devices, however, ATA drives simply were not used in enterprise applications where features like hot-swappability and low access times were paramount. Further, the traditional ATA stronghold, single-user machines, just did not benefit from TCQ; indeed, in many cases the additional imposed overhead actually reduced rather than enhanced performance in these areas. As a result, the feature went largely ignored by the industry.

Today, however, the advent of Serial ATA, its associated hot swap features, and its promised interoperability with the upcoming Serial Attached SCSI (SAS) standard has resulted in a brightening future for ATA in the enterprise. The forthcoming SATA II standard includes provisions to incorporate tagged command queuing a la ATA-4’s standard. Native SATA drive architectures such as the Seagate Barracuda 7200.8 and Maxtor MaXLine III tout the inclusion of “Native SATA” tagged command queuing, or “Native Command Queuing” (NCQ) for short. NCQ’s fundamental paradigm is identical to that of tagged command queuing; the NCQ moniker simply differentiates the SATA II standard from the existing ATA-4 model.

With its deep research and development pockets, Seagate was the only manufacturer to avoid a less expensive and faster-to-market PATA-to-SATA bridge for its first SATA products. For financial and temporal reasons, other manufacturers such as Western Digital introduced their first products with bridged operation. The Raptor WD740GD is one of these designs. While the practical ramifications are negligible (it is bottom line performance that counts, after all!), the Raptor’s bridge prevents it from using the SATA II NCQ standard. Thus, to implement tagged command queuing into its budding enterprise-oriented line in a timely fashion, Western Digital opted to include ATA-4-style TCQ in the Raptor. Fortunately for WD, the firm has received enthusiastic response from many controller manufacturers. Most firms designing NCQ-enabled SATA host adapters are also incorporating Raptor-style queuing. One such manufacturer is Promise Technology.

The Tests

In this first of what will be several articles examining the effects of SATA’s tagged command queuing, we will take a look at how the upcoming Promise FastTrak TX4200 compares to the currently shipping, non-TCQ-enabled FastTrak S150 TX4. The relationship between these two controllers is especially interesting as the TX4200 is simply a FastTrak S150 TX4 with added TCQ code. The S150 TX4, in turn, is simply a RAID-enabled SATA150TX4, the SR Testbed’s long-standing reference SATA controller. A direct contrast between the two Promise RAID controllers can thus isolate the effects of TCQ from other variables.

The Promise FastTrak TX4200 features:

4 Serial ATA Ports for up to 4 drives
RAID 0/1/10 and JBOD
32-Bit / 33-66 MHz PCI Operation
NCQ & SATA TCQ Support

TCQ, of course, has been around for some time in the SCSI world- all current host adapters, RAID controllers, and hard drives support a very mature implementation. To discover what disadvantages, if any, SATA TCQ suffers when contrasted with more established SCSI solutions, results from a Mylex AcceleRaid 170 RAID controller paired with up to four 73 GB Seagate Cheetah 10K.6 drives have been included in these tests.

The Mylex AcceleRaid 170 features:

1 68-pin LVD Port for up to 15 drives
RAID levels 0, 1, 0+1, 3, 5, 10, 30, 50, JBOD
32 MB ECC SDRAM Cache
32-bit / 33 MHz PCI Operation

Though TCQ confers benefits even when a single drive operates under heavy random loads, its true potential shines when there are also multiple actuators to work with. Hence, the tests that follow also take a hard look at the scaling provided by arrays in both multi-user and single-user scenarios- our first formal take on RAID in over two years.

In the following tests, Testbed3’s hardware and benchmarks sort out the multiple dimensions of potential performance drivers:

How does TCQ benefit multi-user and single-user performance?
How does TCQ affect a RAID array’s ability to scale performance upwards as more drives are added?
How does SATA TCQ stack up against SCSI’s implementation?
How does a RAID array scale under increasingly heavy random I/O?
What benefits does a RAID array deliver to the highly-localized I/O that dominates non-server (single-user) use?

Since these tests take advantage of the standard SR testbed, let us take a moment to consider a potential limitation of the machine’s hardware, the 33 MHz, 32-bit PCI slot.

Limitations of the PCI Bus

The 133 MB/sec limit of the standard 32-bit, 33 MHz PCI bus may be of concern to some, especially those seeking for various reasons to maximize sequential transfer rates. The practical real-world limit remains slightly below that threshold- STR tests associated with the results below top out at 126 MB/sec. A single Raptor in its outer zone can push nearly 72 MB/sec while a Cheetah 10K.6 can do 69 MB/sec- it takes only two of either to saturate the PCI bus.

Let us take a closer look, however, at just how important STR is in the majority of applications. The StorageReview File Server DriveMark generates an average transfer size of 22 kilobytes. In other words, the average generated I/O operation in the suite consists of repositioning the actuator to the desired location followed by the reading or writing of 22 KB of data. In the same vein, the SR Office DriveMark’s average transfer size is 23 KB. The SR High-End DriveMark, based on a suite of applications that includes video and audio editing, is the only test that reaches significantly beyond these sizes, generating a relatively high 69.5 KB transfer per IO.

A single Raptor WD740GD, with its maximum transfer rate of 72 MB/sec, can transfer 22 KB in:

72 MB/sec * 1024 KB/MB / 22 KB = 3351 times/second or 0.298 milliseconds eachHence, the average IO request in the SR File Server DriveMark concludes with a read or write to the platter that takes an average of 0.3 ms to complete.

A PCI-throttled RAID0 array can transfer 22 KB of data in:

126 MB/sec * 1024 KB/MB / 22 KB = 5865 times/second, or 0.170 milliseconds eachIn a typical access pattern that features significant localization such as the Office DriveMark, an adept drive such as the WD740GD can achieve about 600 I/Os per second. Inversely stated, each I/O (which, again, consists of positioning + transfer) takes about 1.7 milliseconds. In a single-drive, highly-localized scenario, the Raptor average 1.7 milliseconds per I/O. Of this 1.7 ms, 0.3 ms, or 18%, is the transfer of data to or from the platter. The other 82% of the operation consists of moving the actuator to or waiting for the platter to spin to the desired location. The situation further polarizes itself as transfer rates rise. At 126 MB/sec, transfers consist of just 11% of the total service time. In effect, sequential transfer rates ranging from 50 MB/sec to 130 MB/sec and higher “write themselves out of the equation” by trivializing the time it takes to read and write data when contrasted with the time it takes to position the read/write heads to the desired location.

The diagram to the right illustrates the relationship between positioning and transfers in typical single- and multi-user scenarios. Observe how the time spent positioning the actuator and platter (red) dominates the relatively small amount of time spent reading/writing the data itself (yellow). Even an asymptotic case of an infinite transfer rate unleashed through an infinitely fast bus would only eliminate the yellow portion of the total time it takes to service one request.

Therefore, while the PCI bus can limit sequential transfer rates, its practical effect in capping real-world speed in typical use is not nearly as significant as one may believe at first blush. As a result, the scaling demonstrated in this article also represents the increases one will gain from arrays operating on higher-speed buses.

Our third Raptor WD740GD Sample

The evaluation sample provided to SR by Western Digital for our review published last January was manufactured in December 4th, 2003. For this review, WD sent us four more samples, all dated March 4th 2004. Though much of the focus of this article rests on multi-drive arrays, for control purposes, it was necessary to retest a single drive from this new batch on our reference Promise SATA150TX4 controller. Some differences arise:

Small reductions in performance are evident- around 3% in the Office DriveMark, for example. Most notable is the 8% drop in the Bootup DriveMark. Western Digital representatives attribute the difference to revised firmware. A closer look at the two different samples reveals the following extended model #s:

Note the differences in the final digit of the extended MDL designation when comparing our second and third samples. The December unit ends with a zero while the March unit concludes with a 1. Why the change? First, we should point out that all manufacturers quietly and regularly refresh the firmware on all their drives after initial release, either to correct bugs or to tweak performance as piles of configuration experience pour in.

Second, as has been painfully obvious over the past several months, SATA command queuing has proven to be a constantly moving target while drive and controller manufacturers alike continue to tweak their products. As controller manufacturers such as Pacific Digital, Silicon Image, and Promise Technology continue to develop pre-release adapter samples, drive manufacturers such as Western Digital are forced to re-optimize firmware to obtain the best results. Likewise, the same has been true in the opposite direction. Though Western Digital announced the WD740GD in September of 2003 and though the units widely available through the channel since last December feature TCQ functionality, the Raptor team is nonetheless driven to regularly reassess the drive’s potential companion host adapters and retune firmware accordingly. A result is the “00FLA1” revision, a unit better suited for the state of today’s TCQ-enabled host adapters albeit with a very slight drop in certain performance measures.

Glancing at the figures above reveals that while there are differences, they are for all intents and purposes trivial. One simply will not notice the difference in speed in subjective use. Attempts to specifically procure the earlier 00FLA0 revision would likely prove frustrating and fruitless. We would not sweat over the difference.

A Word on Organization

Presenting the following results can be quite daunting. Many different dimensions of performance emerge when one attempts to form the “big picture.” How does performance increase when all other variables save queue depth remain constant? What kind of benefits result from adding more drives to an array? How does choosing mirroring over striping affect performance? The list of questions runs on. As a result, we have avoided use of our standard “HTML-generated” graphs in favor of static graphs. Hopefully, they accurately convey the myriad of information to be gleaned.

Without further ado, let us take a look at some results!

Multi-User Performance as Queue Depths Increase

First, a look at how performance scales under the highly-random StorageReview File Server DriveMark as queue depths (load) increase. For information on the FS DriveMark, please review this explanation. For comparison purposes, results for Testbed3’s standard Promise SATA150TX4 SATA controller and Adaptec AHA-29160 SCSI controller have been included the single-drive scenario. The RAID0 results presented below feature a 64 KB stripe size.

The Raptor operating off of both the Promise SATA150TX4 and the FastTrak S150 TX4 commences with an impressive 129 I/Os per second at a queue depth of 1. The Cheetah paired with the AcceleRaid likewise fares respectably at 125 I/Os per second. The Promise TX4200/Raptor and Adaptec 29160/Cheetah combos, however, stumble with their starts of 112 I/Os per second and 117 I/Os per second respectively. As depths rise, however, the picture starts to change.

By the time the load reaches 4 outstanding I/Os, the 29160 pulls to the head of the pack, edging past the AcceleRaid. The FastTrak S150 TX4 and SATA150TX4 continue to perform identically while the TX4200, still bringing up the rear, manages to close some of the gap.

A depth of 16 yields considerable changes. The Adaptec vaults to the front of the pack with an impressive 237 I/Os per second. The Mylex continues to scale well, but the TX4200/Raptor pair’s TCQ implementation starts to make its presence felt and places just slightly behind.

When we finally reach a heavy load of 64 outstanding requests, a clear hierarchy emerges. Adaptec SCSI host adapters have always been popular, reliable choices. The 29160 also proves to be a great performer by leaving the other controllers in the dust under heavy load. Though the TX4200 and AcceleRaid fail to reach such lofty heights, they nonetheless deliver the scaling under load that one would expect from subsystems that feature TCQ. By contrast, the two non-TCQ ATA controllers bring up the rear of the pack, a marked contrast from their swift performance when only one request remains outstanding.

When configured with two drives in a striped format, all three RAID controllers start similarly- the FastTrak S150 TX4, lacking the overhead imposed by TCQ, enjoys a slight lead. The gap is virtually erased by the time the queue reaches 4.

When depths hit 16, the S150 TX4’s slope levels out, indicative of a lower rate of scaling than the AcceleRaid and the TX4200. Finally, at 64 I/Os the AcceleRaid and the TX4200 enjoy a significant advantage over the S150 TX4 with the TX4200 achieving a high of 385 I/Os per second.

Relative scaling effects remain similar under a three-drive RAID0 setup. Results from all three controllers, however, exhibit steeper slopes, indicating that the increased actuator count itself yields benefits as queue depths rise.

The non-TCQ S150 TX4 falls behind the other two adapters by the time queues reach 16. When depths hit 64 outstanding I/Os, the TX4200 once again sets itself ahead of the others.

Finally, in a four-drive RAID0 setup, the modest S150 TX4 manages to scale as well as the TCQ-enabled arrays all the way up to 16 outstanding requests. It is only by the time we reach 64 that differences surface. The S150 TX4 levels out while the AcceleRAID continues to scale along the same slope. The TX4200 actually scales slightly better under this final leg, opening a significant gap between itself and the SCSI-based array.

Multi-user Performance as the Number of Drives Increases

Contrasting performance between controllers as the number of drives a RAID0 array increases also yields telling results. The graphs that follow demonstrate the effects of adding more disks while holding all other variables constant. The results presented below feature a 64 KB stripe size.

Slopes remain quite flat when we compare the I/O operations delivered per second against the number of drives in a striped array with just one outstanding I/O. Without requests backed up and waiting, an increased actuator count simply can not make itself felt. Under such light loads, the simplest design, Promise’s S150 TX4, delivers the best performance regardless of the number of drives in the array.

The situation changes markedly when the outstanding load bumps up to four requests. All three controllers deliver increased I/Os per second as drives are added to the array. With just a single disk, the AcceleRaid enjoys a lead at 154 I/Os per second while both Promise controllers score similarly at about 135 I/Os per second.

When a second drive is added to the array, however, the AcceleRaid scales a bit less than the ATA controllers; all three place very close to one another.

By the time we reach a drive count of three and four, the AcceleRaid gentle slope keeps it behind both Promise controllers. The increased actuator count allows the S150 TX4 to overcome its lack of TCQ functionality and pull ahead of the AcceleRaid as well as the TX4200, though by relatively small margins.

At a queue depth of 16, the S150 TX4’s lack of TCQ places it at a disadvantage. Though it still enjoys increases in performance as drives are added to the array, it lags behind the TX4200 and the AcceleRaid at almost all points. The only exception is at the 4-disk level- the AcceleRaid modest increase when the fourth drive is added allows the relatively consistent S150 TX4 to close the gap.

Though the S150 TX4 continues to display near-linear scaling under a consistently heavy load of 64 outstanding requests, the addition of more drives to the array simply is not enough to keep it in the fray with TCQ-enabled alternatives.

The SCSI AcceleRaid offers better performance than the SATA TX4200 with just one drive. As more disks are added to the array, however, the TX4200 scales better; its greater slope culminates with a 13% lead over the AcceleRaid in a 4-drive array.

Multi-User Mirroring

Results culled from RAID0 arrays as drive counts increase are interesting from an academic standpoint mainly due to the linear nature in which one can add independent actuators. Admittedly, however, practical use of RAID0 in production servers is quite limited- performance gains are more than offset by the significant increase in risk. Should one fail in a four-drive striped array, all data would be lost.

Mirroring (RAID1) is a much more likely scenario. In such an array, every piece of data is written to at least two disks. While writes must occur in unison, reads do not; as a result, intelligently-designed RAID controllers do offer performance increase through implementation of independent reads when using multiple actuators. RAID1 delivers the benefits of redundancy should one drive fail while also offering improved performance through two separate read mechanisms. RAID01 offers redundancy across two sets of RAID0 arrays- should one array fail, data remains preserved on the other. RAID10 mirrors the data on two drives, then stripes the resulting array with another array. Both RAID01 and RAID10 enhance performance with two write and up to four read mechanisms. The Mylex controller offers both RAID01 and RAID10 while the Promise units incorporate RAID10.

When mirroring two drives, all three controllers exhibit an appreciable increase in delivered I/Os per second up to a moderately heavy load of 16 outstanding requests. In a strange turn, however, the Promise TX4200’s performance actually decreases when moving from a load of 16 to 64.Both the S150 TX4 and the AcceleRaid, on the other hand, continue to scale as expected. The AcceleRaid easily turns in the best overall RAID1 performance.

As one would expect, striping a pair of mirrored arrays (RAID10) delivers even greater performance benefits. Here the modest S150 TX4 actually jumps out to a significant early lead at a relatively light load of 4 outstanding requests. Things pretty much even out by the time load hits 16; by 64, the S150 TX4’s gentler improvements lead it to fall behind somewhat. In a RAID10 configuration, the TX4200 does not exhibit the same odd drop between 16 and 64. Rather, it scales as expected.

Single-User Performance

Though evidence has been presented to the contrary, a combination of overzealous marketing as well as general lack of knowledge has resulted in the proliferation of RAID among power users running single-user workstations. While considerable argument may be made for the redundancy provided by RAID1, the increase in transfer rates and high-I/O random access performance delivered by RAID0 simply do not benefit most non-server uses.

Likewise, though the latest Raptor delivers outstanding single-user performance, Western Digital has worked to incorporate TCQ functionality in the drive not to widen its single-user lead over the competition but rather to ensure that the Raptor stakes its claim as a viable alternative in the traditionally SCSI-based server world.

Even so, we realize that many readers worldwide have been eagerly awaiting results for the Raptor mated with an appropriate TCQ controller. StorageReview’s Desktop DriveMarks offer an unrivaled opportunity to assess just how much difference RAID arrays make when it comes to non-server, single-user applications. Here we will take a look at how RAID, TCQ, SCSI, and SATA all impact performance. The RAID0 results presented below feature a 64 KB stripe size.

Similarities between the closely-related standard (non-RAID) Promise SATA150TX4 controller and the (non-TCQ RAID) Promise S150 TX4 controller are quite evident as both score similarly across all four tests. The TCQ-enabled Promise TX4200, however, yields telling results. When an TCQ-capable ATA controller finally meets a TCQ-capable ATA drive, the additional complexity and overhead actually decreases performance! Remember that no matter how sophisticated the multitasking and multithreading get in a single-user machine, access patterns (highly-localized vs. highly-random) and queue depths (mostly-low with an occasional burst vs. consistently-higher) remain fundamentally different from a multi-user server. In fact, it is evident that the very maturity and consistent across-the-board implementation of TCQ in the SCSI world is actually one of the factors that cause mechanically superior SCSI drives to stumble in single-user scenarios.

On the SCSI side of things, the Adaptec AHA-29160 and Mylex AcceleRaid 170 switch places. While the 29160 proves itself the superior performer in single-drive, multi-user scenarios, the AcceleRaid delivers better single-user scores across the board. Note, however, that even when handicapped by TCQ operation, the Raptor manages to hold onto its lead over the Cheetah in three out of the four single-user tests. Western Digital’s experience in designing great single-user read-ahead and write-back buffer strategies continues to shine through.

Let us move on and examine how the RAID controllers scale in each access pattern as drive counts increase.

When going from a single-drive setup to a dual-drive striped array, performance for all three controllers increases by a margin of 9%-13% in the SR Office DriveMark. It is important to distinguish, however, that some of this increase surfaces through the doubling of the logical drive’s capacity from 74 GB (1 drive) to 146 GB (2 drives). Remember that the StorageReview Desktop DriveMarks are the ultimate, repeatable representation of real-world performance. Accordingly, real LBA sector addresses are requested throughout the tests. Higher-capacity drives and arrays will by definition cram these locations closer together physically and thus enjoy an advantage (as they do in actual use).

Upgrading to three drives reveals a significant diminishing in returns. Moving to four disks actually causes regression in scores. While some may argue that the testbed 32-bit, 33 MHz PCI bus limits the gains achieved by striping, readers should remember that, especially at these I/O levels, such limits are not significant. Regardless of drive count, the S150 TX4’s simpler, streamlined operation grant it a considerable performance advantage over the other two controllers.

Due to its higher average transfer sizes, the SR High-End DriveMark benefits slightly more than the Office suite does as the drive count increases. Gains range from 9%-19% here and unlike the case of the Office test, never fully level out with drive count. Again the S150 TX4 delivers better performance than the TCQ-enabled controllers.

The SR Bootup DriveMark captures the boot process of Microsoft’s Windows XP. XP boot sequence is unusual in that it both reorders data on the drive to mimic the sequence of requests (hence allowing more sequential transfers) and also attempts to generate as high a queue depth as possible by slamming the storage subsystem with requests (an average of 2.39 I/Os outstanding, however, remains quite modest compared to most multi-user scenarios). The net result is the only Desktop DriveMark pattern that benefits significantly from an increase in drive count. Performance consistently rises as another drive is added to the array. The practical benefits, however, remain dubious. The increases in the XP boot process would have to overcome the increased overhead generated during the POST power-on process by the RAID array and its associated drives. Even in the Bootup DriveMark drive count remains far more important than TCQ- the S150 TX4 maintains a consistent and considerable lead over the other controllers.

Finally, we come to the SR Gaming DriveMark, a weighted average of the drive accesses generated by five popular PC games. The flat slopes indicate that gaming uses benefit least from both RAID and command queuing. In fact, at 677 I/Os per second, a 4 drive Cheetah array operating off of the Acceleraid 170 lags a single Raptor running on the “dumbest” of SATA controllers by a margin of 9%.

Single-User Mirroring

Mirroring data across two drives (RAID1) and striping two mirrors (RAID10) results in a significant performance change only in the Bootup DriveMark. All three controllers deliver an increase when moving from a single drive to a mirrored pair and again when moving from a mirrored pair to RAID10.

In all other access patterns, the three controllers exhibit very slight changes here and there, with no significant trend discernable. Note that once again the lowly non-TCQ S150 TX4 delivers the best performance in all mirrored combinations.

Conclusions

From the plethora of data presented above, we can draw several conclusions:

1. SATA TCQ and SATA RAID have the potential to deliver benefits to the server market just as great as those of SCSI TCQ and SCSI RAID. The Promise FastTrak TX4200 and the Mylex AcceleRaid 170 are respectively entry-level RAID controllers for the SATA and SCSI interfaces. The former, in fact, is little more than Promise’s current FastTrak S150 TX4 controller with SATA TCQ (and NCQ) functionality added.

Unlike other major hard disk players, Western Digital does not have an established SCSI-based product line to protect. As a result, the firm seeks a competitive advantage through offering SCSI-style mechanics and functionality at prices associated with the more cost-conscious ATA interface. A quick check at the time of this writing with StorageReview sponsor HyperMicro prices 73 GB Cheetah 10K.6s at $339 each, Raptor WD740GDs at $219 each, and the AcceleRaid 170 at $379. When released this August, the Promise TX4200’s price will be on par with that of the S150 TX4 that it is meant to displace. It runs $159 at HyperMicro.

Hence, the following pricing (excluding cables and accessories) arises:

$1356 ($339 x 4 Cheetahs) + $379 (AcceleRaid 170) = $1735 (4-drive SCSI RAID Array)$876 ($219 x 4 Raptors) + $159 (FastTrak TX4200) = $1035 (4-drive SATA RAID Array)The cost difference between the two arrays works out to 40%, significant indeed. What does one sacrifice by opting for the less costly SATA solution? After all, it has been demonstrated that SATA’s performance is competitive with, and in some cases exceeds, that of a comparable SCSI solution.

While WD has delivered a solution that can match a SCSI-based solution’s speed and scalability, one must also keep in mind the key factors of infrastructure and reliability. As with TCQ itself, SATA’s support hardware such as backplanes, all-in-one solutions, and the like remain in their infancy when contrasted to the maturity and longevity of SCSI hardware. Also keep in mind that while Western Digital claims an enterprise-class 1.2 million hour MTTF spec and backs the Raptors with a 5-year warranty, the line is still new and remains relatively unproven compared to established solutions such as Seagate’s Cheetah series. Finally, remember that the prices listed above represent the cost of the storage subsystem alone- factoring in the total cost of server hardware when motherboards, CPU, and RAM are considered can dilute the difference significantly.

In the end, the potential for SATA to invade the entry- and mid-level server market is there. The performance is definitely there. If the Raptor’s reliability proves comparable to the competition and if the infrastructure/support hardware surface, WD will have a viable contender.

2. Command queuing is meant to assist multi-user situations, not single-user setups. With the recent release of Intel’s 9xx chipsets, pundits and enthusiasts everywhere have been proclaiming that command queuing is the next big thing for the desktop. Wrong. As evidenced by the disparities between the FastTrak S150 TX4 and TX4200 (otherwise identical except for the latter’s added TCQ functionality), command queuing introduces significant overhead that fails to pay for itself performance-wise in the highly-localized, lower-depth instances that even the heaviest single-user multitasking generates. It is becoming clear, in fact, that the maturity and across-the-board implementation of TCQ in the SCSI world is one of the principal reasons why otherwise mechanically superior SCSI drives stumble when compared to ATA units. Consider that out of the 24 combinations yielded from the four single-user access patterns, one-to-four drive RAID0 arrays, and RAID1/10 mirrored arrays presented above, the non-TCQ S150 TX4 comes out on top in every case by a large margin. TCQ is only meant for servers, much like the technology mentioned just below.

3. RAID helps multi-user applications far more than it does single-user scenarios. The enthusiasm of the power user community combined with the marketing apparatus of firms catering to such crowds has led to an extraordinarily erroneous belief that striping data across two or more drives yields significant performance benefits for the majority of non-server uses. This could not be farther from the truth! Non-server use, even in heavy multitasking situations, generates lower-depth, highly-localized access patterns where read-ahead and write-back strategies dominate. Theory has told those willing to listen that striping does not yield significant performance benefits. Some time ago, a controlled, empirical test backed what theory suggested. Doubts still lingered- irrationally, many believed that results would somehow be different if the array was based off of an SATA or SCSI interface. As shown above, the results are the same. Save your time, money and data- leave RAID for the servers!

We’re far from finished here! Competing SATA TCQ products from Pacific Digital Corp. and Highpoint Technologies are currently available while Silicon Image has a chipset yet to be incorporated into a shipping product. We’ll continue to work with other controller manufacturers to bring readers Raptor TCQ results paired with a variety of products. SATA NCQ is also just now entering prime time. As always, StorageReview will be there.

Review Discussion