Published/updated: February 2015
By Tony Lock
Rapid data growth, a greater emphasis on operational risk and developments in storage hardware are conspiring to question the suitability of some traditional data protection solutions to deal with evolving needs. Against this background, a blend of approaches is required to effectively manage data-related risks going forward.
Increased pressure on data protection and recovery
You probably don’t need reminding of the ever-increasing amount of data that’s currently generated across your organisation on a daily basis. This includes structured data from core business applications, along with documents and objects that are more unstructured in nature arising from the use of office tools, communication and collaboration systems, creative solutions, and so on. Throw cloud services into the mix and the chances are that your storage systems are starting to feel the pressure.
To make matters worse, expectations across your business are also undoubtedly rising. Users nowadays consider pretty much all forms of data to be ‘business critical’. They expect you to look after it and make sure it’s available on a 24 x 7 basis. In the unfortunate event of it going off-line for some reason, they want their data back and accessible in the shortest possible time.
As a result of all this, within IT you are almost certainly having to place a higher degree of emphasis on availability and recovery today than you have done in the past. At the same time, traditional approaches to data protection may not be serving you as well as they used to, and RAID technology in particular is increasingly falling into this category. In the remainder of this document we will explore why this is the case, and look at ways to overcome the challenges.
Traditional RAID is falling short
Almost all organisations use some form of RAID somewhere in their infrastructure to protect data held on hard disk drives. It can be implemented in various ways that each have their pros and cons, but the basic idea underpinning all RAID systems is the spreading of redundant data across multiple drives to prevent data loss and ensure continuity of service if a disk fails. When this happens, protection is reinstated automatically while the system is still running once a replacement disk is installed.
Over the years, RAID configurations have become an integral part of the way we protect data at a fundamental level, and it has generally proved to be both effective and convenient. So what’s changed?
Higher capacity disks mean lengthier rebuilds
The storage capacity of individual disks continues to increase steadily as technology companies find ways of fitting more data onto a single drive. This is generally good news from a cost perspective as this trend brings with it a reduced cost per unit of storage. The need for fewer moving parts can also simplify systems and lead to a corresponding reduction in administrative overhead.
But the news isn’t so good when it comes to the impact of increased drive capacity on RAID configurations. This is because the traditional method by which RAID rebuilds occur following a failure can take a significant amount of time, and the higher the capacity of the disk(s) involved, the longer it takes.
Furthermore, while rebuilds are in progress, performance of the overall storage system, and therefore service levels experienced by users, can be negatively impacted. In addition, and perhaps more significantly, the storage system is potentially exposed until a rebuild completes. A second disk failure before the first one is fully recovered could compromise the integrity of the entire RAID array, leading to both downtime and potential loss of data.
In summary, higher capacity disks translate to longer RAID rebuild times, which in turn leads to greater business risk.
But this isn’t the only way in which some of the trends we have been discussing are leading to RAID-related challenges.
More disks and aging kit mean higher risk
While the capacity of individual disks is increasing year-on-year, the rate of change is nowhere near the growth rates you are likely to be experiencing in terms of overall data volumes. There are then situations in which it is preferable to use a higher number of lower capacity disks (rather than fewer larger ones) for performance reasons – e.g. to maximise the degree of parallel activity.
Pull these factors together and the reality is that as data volumes grow, the number of physical hard disks required in your storage systems invariably increases too, despite the fact that each device can hold more. And this matters because the probability of experiencing a disk failure increases linearly with the number of physical disks deployed.
Aggravating the situation, the average age of equipment is escalating. You would not be unusual if you were expecting the storage equipment you currently have in place to remain in service for longer than in the past, with a life expectancy of 4, 5 or even 6 years and upwards. Given that the likelihood of failure increases as devices get older, this multiplies the chances of problems occurring, and therefore the frequency at which RAID rebuilds become necessary.
All of this means more risk of both data loss and service interruption or degradation.
Acknowledging the RAID challenge
If you are unsure of how much you should be taking some of these challenges seriously, the results of a recent Freeform Dynamics research study might help. More than a third of the 403 IT professionals taking part considered RAID rebuild times to be a major risk to the business, with a further quarter regarding the issue as a significant distraction.
While the remainder didn’t see any problems in this area, we have to bear in mind that a proportion of these probably hadn’t thought about it enough to acknowledge the challenges. If you live with constraints and issues for long enough, the tendency is to just accept them as normal to the point where you have stopped even looking for better ways of doing things.
The fact that you have read this far suggests that you are aware of the need for review and action. So, if you have experienced problems already, or see the potential for challenges to emerge as data volumes and variety continue to grow, then stay with us as we look at a few different ways of approaching resilience and recovery.
Options to improve data protection
Our research at Freeform Dynamics tells us time and time again that data protection is an area that often doesn’t receive the attention it deserves. The tendency is to put solutions in place then consider the problem to be solved. Not only does this mean that protection measures drift out of line with business requirements, especially given the pace of change we have mentioned, but that it’s all too easy to fail to keep up with the way in which the solution side of the equation is developing.
If it has been a while since you performed an objective review of how you protect data in your organisation, then it may be time to pause and assess where you are.
A good place to start is with data classification and requirements definition. While users might claim that everything is critical, the reality is that needs will vary in relation to factors like performance, resilience, tolerance of data loss and required recovery times in the event of a failure. Once you understand the requirements of different data sets, you can start to look at appropriate protection measures more objectively. Let’s look at some of the options you might consider while doing this.
More effective use of RAID Despite the traditional challenges, RAID still has a role to play. As you review its use, however, it’s important to address the exposure represented by elongated rebuild times. Here are some approaches now available to help:
The following advice from a participant in our research serves as a good reminder:
“Always look at the applications using RAID. For example, mission critical applications are at their most vulnerable during a RAID rebuild, so rather than use RAID 5, it would be better to switch to RAID 6. I always find it best to refer back to the basic principles of RAID before making these kinds of decisions.”
But RAID alone is never going to meet all requirements, so let’s take a look at some of the options when it comes to a system or site failure.
Data availability during a system or site failure
Recovering from the failure of a single disk can be challenging, but there are data sets that are so critical to your business operations that it is essential the data is available even in the case of the failure of a storage system or the loss of your site. In the past setting up systems and networks to give you DR data protection capabilities was complex and expensive. As a consequence such solutions were only available to one or two business critical applications in the largest organisations.
But some vendors can now offer you storage arrays with functional capabilities designed to ensure that in the event of a disaster your storage can failover automatically to a second system running at another location. This could be in another office, computer room or data centre belonging either to your organisation or one run by your chosen service provider.
The second option is particularly appealing if you work in a smaller business where your IT runs only in one place. Indeed, the availability of storage systems able to provide remote DR failover capabilities at an affordable price without the need for highly specialised skills is one that has rarely been seen in many small and mid-sized organisations.
There are now several options available to provide automated storage failover in the event of a disaster, and some are offered as ready to buy solutions from a number of vendors. It is important that you ensure any solutions proffered are able to operate as seamlessly as possible with your IT infrastructure.
Technologies that allow data to be mirrored, snapshotted or replicated between systems, or even locations (e.g. across multiple data centres, or between your data centre and the cloud) are now widely available. Within this space, so-called ‘High Availability’ (HA) solutions allow you to create a ‘hot standby’ of your storage environment so that applications can failover automatically with little or no downtime should a serious failure occur.
DR / HA storage approaches include:
The bottom line
It will be clear from our discussion that a blend of approaches is required to strengthen, streamline and future-proof your storage systems. As you explore the options, however, beware of niche suppliers who sometimes over-position point products. It’s not that you should necessarily avoid them, but larger players who offer a broad range of storage options are more likely to give you balanced advice.
The overriding imperative, however, is to appreciate that the world has moved on in terms of requirements and technology, so it’s critical to review your current setup and make sure you are in a good position to handle what the future has in store.
By Richard Edwards
By Dale Vile
By Bryan Betts and Dale Vile
Yesterdays software delivery processes are not up to dealing with today’s demands, but modernising you approach is not just about implementing Agile, even creating a DevOps culture. You need to focus on some specific, hard-core principles. ...more
By Dale Vile & Jack Vile
Cloud services are increasingly becoming part of the IT delivery mix, but a recent study of 378 senior IT professionals suggests a parallel commitment to ongoing investment in the datacentre. This in turn shines a light on the key role of modern application platforms. ...more
By Tony Lock & Dale Vile
Despite the advent to cloud computing the datacentre remains central to corporate IT. But with demands continuing to escalate, how do you ensure your infrastructure is powered robustly and efficiently? ...more
By Bryan Betts
Many are exploiting cloud computing to drive business advantage, while others are enjoying the flexibility and efficiency of DevOps. But what happens if you use both together in a coordinated manner? The answer is a significant amplification of the benefits of each. ...more