XtremIO (or AFA) + Standardized Compute vs Exadata – A Perspective on Risk Management

 I recently was fortunate enough to learn a lot about Exadata while researching Data Warehouse infrastructure solutions with copy management & orchestration. I want to share more detail on my perspective regarding operational risk management of deploying a known traditional stack such as XtremIO (or other AFA) + Standardized Compute vs Exadata. We start off with the position that deploying either solution for oracle workloads has operational risk and unknowns. Here is my thinking/questions:

  • If both solutions have unknowns – which to me equals risk – then the question is how do we qualify the unknowns to see where the most risk lies. I built the table below as a way for me to think through the potential risks along the  IT operational focus areas that I think are important to most. This is my thought process entirely and I am open to constructive criticism.
  • Here are a few lingering questions that I still have:
    • Compatibility – Btw Infra and Application/Platform – I can see how the XtremIO (or other AFA) + Standardized Compute can be a concern due to vendor interop/compatibility issues and effort in tuning – however I see the Exadata as a much riskier solution given that there are strict requirements for DBMS version and potentially application code. And for those that would say that an engineered solution could be a “hammer” to force application/DBMS lifecycle, I would say in my experience  these “hammers” are effective only when compliance requirements with financial penalties like Sarbanes-Oxley GCCs are the driver. To me there is much larger unknown in how we will manage application to DB to infra code dependencies in the future on the Exadata vs what we could be doing today in compatibility management.
    • Patching/Code Upgrades – From what I have read the patch process for Exadata requires maintenance at three layers with three different tools – DB Hosts (Firmware / OS & Oracle GI / RDBMS) , InfiniBand Network, and Storage Nodes.  I cannot confirm from Oracle if these are bundled into one single patch or if each is applied separately at the different levels. It appears from Oracle’s own documents and several blogs that the rolling patch process at a minimum involves quarterly updates at  3 different levels, could take up to 2 hrs per storage node (14hrs+ on ½ rack), and requires good Linux CLI skill sets?  HOWEVER I need your help verifying. See website links below for the references I found. Note – To manage this complexity, Oracle has a tool called OPLAN that helps you with the patch process by summarizing the different patch strategies available. When you made your choice, Oplan tells you exactly what commands to execute. This will limit errors and reduce the time it takes to prepare. generating step-by-step instructions telling you how to apply a bundle patch in your environment. Still have to patch separately and run commands manually.
    • Patching/Code Upgrades – If the Exadata is a single infra to platform stack, would we need an additional Exadata for patch and more importantly application testing?  Unless you considering OVM – which to my understanding is not widely deployed – is there a way to partition a test environment on the production frame? And even if so would the interaction btw OVM, Flash Cache, and Storage cells warrant the need for testing a separate test Exadata system?

One additional observation – We have strived to abstract platforms/applications from infrastructure for the last 5-8 yrs through OS virtualization bc we see great value in the flexibility while still achieving enough standardization. This trend will accelerate in the coming years as containers further abstract the platforms/applications from infrastructure by removing the dependencies on OSs. By marrying the infrastructure/OS with the platform under one code matrix and HW/SW architecture, the Exadata goes in the complete opposite direction to where the industry is going.

Risk in Unknowns
IT Operational Areas XtremIO (or other AFA) + Standardized Compute Exadata X5
Scalability

Low

The ins/outs of capacity and cost scalability are pretty well understood across both platforms. Risk is dependent on the ratio of standardized compute to Exadata X86 CPU. Depending on the initial vs long-term needs standardized compute may be slightly higher due to Oracle licensing model which assumes you can turn off Exadata cores, however how practical this si in the long run is unknown.

Availability

Low

Both solutions provide known and well established availability architectures.

Compatibility – Btw Infra Components

Low to Moderate

We all have experienced compatibility issues btw components from multiple vendors the infra stack. If you have preferred vendors that are partners of each other and have extensive joint support and escalation agreements then you will be in better shape. IBM, Oracle, and EMC come to mind.

Low

Exadata provides an engineered solution with a fixed support matrix delivered completely by Oracle. Our expectation is that there is very low risk btw infra components in the stack.

Compatibility – Btw Infra and Application/Platform

Moderate

Some have experienced compatibility issues with heterogeneous vendor stacks that require experienced resources and committed vendors to resolve. Additionally there is a fair amount of abstraction btw infra and platform layers so that supportability is at least good.

Moderate to Major

Exadata will dictate strict compliance btw Exadata code levels and the Oracle RDMS version required by applications. Exadata marries the platform layer to the infrastructure/os layer and spreads the DB code throughout the storage, i.e. the Storage FW is also DBMS code?

Patching/Code Upgrades

Low

Known support and processes. Small patch test environments can be spun up virtualization or smaller footprints to isolate new code from production and from infrastructure. Most common storage AFA software is relatively easy to install and full regression testing is usually done with multiple vendor stacks.

Moderate

Patching relies on multi step process and LUNIX CLI skill set?  Requires separate Exadata system to test patches? Exadata could require more patches than traditional stacks – quarterly x 3?

Vendor Involvement / Escalation

Low

Most large enterprise vendors have known and expected support and escalation processes. I am assuming that large enterprise vendors have the focused Enterprise account teams (Sales, Presales, and Support) needed to escalate and resolve issue.

Based on the analysis above I would say that deploying XtremIO (or other AFA) + Standardized Compute has less risk/unknown than deploying Exadata in an environment that has some maturity on operating / managing the XtremIO (or other AFA) + Standardized Compute stack

I welcome your comments on perspectives I might have missed. 

 #XtremIO
References:
http://www.oracle.com/webfolder/technetwork/exadata/maa-bestp/patching/patch.pdf
http://www.exadata-certification.com/2014/12/things-to-do-before-applying-bundle.html
http://www.exadata-certification.com/2015/04/what-should-be-order-of-exadata-machine.html
http://www.exadata-certification.com/2015/04/exadata-patching-download-extract.html
http://www.exadata-certification.com/2015/05/exadata-patching-cell-server.html
http://www.exadata-certification.com/2015/04/exadata-patching-infiniband-switch.html
http://www.oracle.com/us/support/library/certified-platinum-configs-1652888.pdf
http://uhesse.com/2014/12/20/exadata-patching-introduction/
https://www.micoresolutions.com/reduce-risk-oracle-exadata-patching-upgrades/
http://blog.oracle-ninja.com/wp-content/uploads/2014/04/colvin_exadata_patching_deep_dive_IOUG_2014.pdf

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s