TCO Comparison of Flash-Powered Cloud Architectures vs. Traditional Approaches

Recently, GridIron and Brocade announced a new joint Reference Architecture for large scale cloud-enabled clustered applications that delivers record performance and energy savings.  While the specific configuration that was validated by Demartek was for Clustered MySQL Applications, the architecture and the benefits apply equally to other cluster configurations such as Oracle RAC and Hadoop.  The announcement is available here: GridIron Systems and Brocade Set New 1 Million IOPS Standard for Cloud-based Application Performance and Demartek’s evaluation report is available online at: GridIron & Brocade 1 Million IOPS Performance Evalution Report.

Let us take a closer look at the Total Cost of Ownership (TCO) profile of the Reference Architecture vis-à-vis  alternatives.  For the OpEx component, we’ll just use power consumption as the main/only metric.

Requirements:

  • Total IOPS needed from the cluster = 1 Million Read IOPS and 500,000 Write IOPS
  • Total capacity of the aggregate database = 50 TB

Assumptions:

  • Cost of a server with the requisite amount of memory, network adapters, 4x HDDs RAIDed, etc. = $3,000
  • Number of Read/Write IOPS out of a server with internal/local disks = 500
  • Power consumption per average server = 500 Watts
  • It takes a watt to cool a watt; in other words if a server consumes 500 Watts, it takes another 500 Watts to cool that server
  • Cost of Power: USA commercial pricing average of $0.107/KWH
  • The cost of the many Ethernet switch ports vs. the few Fibre Channel switch ports is assumed to be equivalent and will be excluded from the calculations.

Option 1: Traditional Implementation Using Physical Servers

In this scenario, IOPS is more of a determining factor for the number of servers required rather than the capacity of the total database.

  • Number of servers (with spinning HDDs) required to hit 1 Million IOPS = 1,000
  • Assuming 40 servers per rack, total number of Racks = 1,000/40 = 25 Racks
  • Cost of the server infrastructure = 1,000 * 3,000 = $3,000,000
  • Power consumed by the serves = 500 * 1,000 = 500 kW
  • Power required for cooling = 500 kW
  • Total power consumption = 1000 kW
  • Annual OpEx based on power consumption = $0.107 * 1000 * 24 * 365 = $937,320
Option 2: Traditional Implementation Using Physical Servers AND PCIe Flash Cards in Each of the Servers
In this scenario, capacity of the total database (limited by the flash capacity of the PCIe flash cards) is more of a determining factor for the number of servers required rather than the IOPS from each server.
  • Capacity of each PCIe flash card = 300 GB
  • Two PCIe cards will be used to RAID/mirror per server
  • Number of servers required to get to 50TB total = 167
  • Assuming 40 servers per rack, total number of racks required = 5 Racks
  • Cost of the server infrastructure = 167 * 3,000 = $501,000
  • Cost of the PCIe flash cards ($17/GB) = 2 * 167 * 300 * 17 = $1,703,400
  • Total cost of server infrastructure including flash = $2,204,400
  • Power consumed by the servers = 500 * 167 = 83 kW
  • Power required for cooling = 83 kW
  • Total power consumption = 166 kW
  • Annual OpEx based on power consumption = $0.107 * 166 * 24 * 365 = $155,595

Option 3: Implementation Using GridIron-Brocade Reference Architecture

Two GridIron OneAppliance FlashCubes will be used for a mirrored HA configuration.  Each FlashCube has 50TB of Flash.

  • Number of servers required = 20
  • Rack Units of the two FlashCubes = 2 * 5 = 10 RU
  • Total number of Racks = 1 Rack
  • Cost of the server infrastructure = 20 * 3,000 = $60,000
  • Cost of the FlashCubes = 2 * 300,000 = $600,000
  • Total cost of the server infrastructure including flash = $660,000
  • Power consumption per FlashCube = 1,100W
  • Power consumed by the servers and FlashCube = 20 * 500 + 2 * 1,100 = 12.2 kW
  • Power required for cooling = 12.2 kW
  • Total power consumption = 24.4 kW
  • Annual OpEx based on power consumption = $0.107 * 24.4 * 24 * 365 = $22,871
Comparison Summary of Different Approaches
Traditional Traditional with PCIe Flash GridIron-Brocade Reference Architecture
Number of servers 1,000 167 20
Number of Racks 25 5 1
CapEx of Infrastructure $3,000,000 $2,204,000 $660,000
Power Consumption (kW) 1,000 166 24
OpEx* (just based on power) $937,320 $155,595 $22,871
*The difference in the management costs of 1,000 servers vs. 20 servers will be equally dramatic, but is not included in the calculations above.

Normalized Comparison of Different Approaches

By normalizing the values in the comparison table (where the values of the traditional approach is at 100% and the other values are relative to 100%), we get the following graph.  It is very clear from the graph that both the CapEx and OpEx are dramatically lower with the GridIron-Brocade Reference Architecture.

Normalized Comparison of Different Approaches to Building Large Clusters

Normalized Comparison of Different Approaches

Clarifying Some Differences Between Network-Based Flash Caching and SSD SANs

George Crump, of Storage Switzerland, published a great article titled Cost Effectively Solving Oracle Performance Problems to which Kaminario (K) responded with some thoughts of their own on why SSD SANs are a better choice for solving Oracle performance problems compared to network-based flash caching.  We disagree with some of the points made by Kaminario:

Storage Switzerland: The challenge is that typically these vendors have limited experience in delivering the types of storage services that Oracle administrators have become accustomed to.
K: This is no longer true. Today, Kaminario K2 offers sophisticated SAN features important to Oracle administrators including lightening fast snapshots and non-disruptive operations.

There is no such thing as non-disruptive deployment of new storage arrays. In a production environment, halting a system to perform data migration, validate that migration and then restart the environment can be time and resource intensive. Converting existing scripts and operating procedures to use a new vendor’s snapshot features can be equally complicated and risky.  With GridIron’s transparent network-based deployment, no changes are required to business processes or applications and there is no data migration involved–it is truly non-disruptive!

K: The idea of caching is to quickly serve data that was served before. Most Oracle read performance problems are random read or single blocks. This is where mechanical disk storage is limited. If Oracle needs to read a block, that block needs to be in the cache appliance to improve performance. But will that block be in the cache? Only if that block is being used a lot (was served before). We call these hot-blocks. BUT, if these blocks are hot, they will already be in the Oracle internal cache and therefore Oracle will not read them. You’d end up double caching without a lot of improvements, unless the entire SAN data is placed in the caching appliance.

The difference in size between Oracle block cache and the dataset is so vast that the Oracle block cache cannot effective hold anything but the hottest blocks. SAN based caches can be scaled to be many times the size of the largest memory footprints (impossible to construct using server DRAM) yet be a fraction of the dataset size and the (implicitly larger) physical storage footprint. Using sophisticated caching algorithms based on performance feedback is the key to making caches effective for large datasets. GridIron customers who evaluated server-based, storage-based and network-based caching using flash can testify to the advantages that proper algorithms bring to the picture.

K: Some applications are dynamic in nature with real random access. You will not get much improvement from a caching solution compared with placing the entire database on a Flash array. We have seen this with Oracle Flash Cache vs. placing the entire tablespace on SSD.

Holding a dataset vs. making that dataset available at high bandwidth are entirely different problems. Holding data on a single SSD drive would only be available at the bandwidth of that single drive and the queue depth of that drive’s controller. Data within a storage array populated with SSD disks is limited by the RAID structure – RAID 5 cannot deliver more bandwidth than 4 disks and cannot deliver higher IOPS than 4 disks (assuming perfectly spread random reads). GridIron architecture can spread data over 100 disks per appliance to deliver highest concurrent bandwidth at levels not available from primary storage arrays.

K: I agree that caching will improve writes BUT not as much as placing an entire database on Kaminario K2. There is no doubt there.

Write throughput of spinning disks is higher than SSDs. If a database is write throughput bound, there is no doubt you would be wasting money on an all-flash array.

K: Finally, if you really like your existing SAN and want to keep it, use a solution like Kaminario K2 mirrored to the existing SAN. Use ASM or OS mirroring to assure that writes go to both storages, but reads are served only from K2 (prefer-read option). Then you will get both read and the write improvements. This solution will work for every Oracle application.

There are several issues with this approach:

  1. Preferred read is an Oracle 11g/ASM function and is not available for Oracle 10 or earlier.
  2. The doubling of the bandwidth and IOPS coming out of the servers leads to other performance problems. The goal is to have processors do more Oracle work, not spend twice as much effort moving data around.
  3. Finally, the ASM silvering process to integrate a new all-flash array into an existing storage environment can take upwards of 4 hours for a 50TB data warehouse during which the Oracle server is so busy with the silvering process that it is practically offline i.e., NOT highly available. Here are the details of the silvering process:
    • The K2 array has a write bandwidth of 8GB/sec
    • The Oracle server will need to silver the K2 plex by copying data from the main SAN array and writing it to the K2 array resulting in:
      • Sustained reads fully saturating the primary SAN – a Tier-1 SAN will saturate at 4GB/sec.
      • An Oracle server load of 4GB/sec for reading the data AND 4GB/sec of writing the data. This will essentially saturate the Oracle server.
      • Streaming writes of 4 GB/sec to the K2 array while it is silvering
    • The 4GB/sec number is actually pretty ambitious from a server to sustain – let’s say that it does so…
    • That translates to approximately 12TB/hour of silvering
    • So… when you install your K2 array and start the ASM silvering – you will bring down the Oracle server and the primary Storage Array to its knees and take them essentially offline for 4+ HOURS

Contrast the above approach with GridIron’s SAN accelerator that starts boosting data immediately when it turns on and:

  • Learns and stores new data in RAM
  • De-stages the data to flash in the background making the flash write time a non-issue
  • NEVER overloads the server OR the primary storage array with superfluous “saturating” reads that degrade performance.

GridIron Systems TurboCharger is the only solution in the market today that can be deployed without disrupting business operations and without requiring any changes to operational processes.

Warp Speed Big Data – 1 Million IOPS using MLC

For anyone who ever doubted that MLC could deliver high performance – welcome to a new frontier! Here at GridIron, we have boldly gone where no company has gone before by being the first company to use MLC to drive one million IOPS. This is good news for us, obviously, but it is also good news for all those IT folks out there who are struggling to balance the performance challenges of Big Data and databases with efficiency and cost savings.

When we look at simple economics, it is clear that MLC, not SLC or eMLC, is the direction in which high volume Flash technology is headed. Already we see the falling price of MLC bringing it into alignment with the price of hard disk. That’s why we here at GridIron think it makes perfect sense to boldly direct engineering resources to developing Big Data solutions that incorporate MLC.

To ensure we are not delusional, we invited some independent third parties in to take a look at what we have accomplished. They have helped us confirm that we have indeed made MLC history. You’ll hear more about the specifics in upcoming posts.

We have repeatedly verified that we can run systems with production database loads at one MILLION IOPS (LOVE that number). Users can expect server consolidation of at least 10:1 and reduce power consumption by a staggering 60%. We are excited about what this performance breakthrough means for MLC technology and for the value it will bring to Big Data.

Warp speed for Big Data is here!