Roughly speaking, availability management is what your organisation does to ensure IT services stay up when expected/agreed – and continuity management is what you do if (nonetheless) something goes wrong. If it goes horribly, horribly wrong of course, you have disaster recovery. All of these terms share one core characteristic – that anyone who expects IT to just work is either relatively new to the trade, or in the wrong job. What’s somewhat harder to grapple with – OK, let’s face it, a lot harder – is deciding exactly what to do with this statement of fact.
The IT department’s remit is to deliver services at an agreed level of availability, amongst other characteristics. This is usually expressed in terms of percentage uptime – with the number of nines being the deciding factor, depending on when availability needs to be achieved. So, 99% availability might sound quite attractive; when it is considered as 3.65 days in a year however, attention turns to whether those days are smack in the middle of a reporting cycle.
I do know how easy it is to get this wrong. When I first became an IT manager, in my inexperience I blithely continued the dubious practice of running backups on a Monday morning, rendering most of IT inaccessible for a good couple of hours. I know, I know, in hindsight this was barking mad. Nobody ever complained, perhaps because it was just the way things had always been done.
Thinking from the perspective of how systems have traditionally been built, availability need not be hard. In the ‘silo’ model where an application runs on a database, which in turn depends on an operating system running on a server somewhere, it is straightforward to ask the question, “What availability characteristics are we looking for,” and design the system accordingly using any one of a number of techniques – clustering, replication, failover to hot/warm standby and so on.
We know in practice however that up-front cost can often scupper any best intentions when it comes to availability. It’s a couple of years since we wrote the report “The Application Availability Gamble” but I doubt things have changed that much. In this report, we highlighted how “much of the exposure leading to high failure rates comes about because system availability is only considered towards the end of the project lifecycle.” In other words, availability is treated like the poor cousin of IT – last – with the inevitable consequences in terms of failure rates.
So, just as common sense isn’t always that common, best practice isn’t necessarily common practice. The question is, what happens to availability if we start to take into account some of the very real trends we’re seeing in IT today?
Virtualisation needs to get a mention, not least because in principle, it brings its own availability tools to the party. Because it is easier to manipulate a virtual machine than a real one, it is more straight forward to keep it available – by taking a live snapshot of the machine and copying it to a safe place, for example (try doing that to a physical box without matter transference). Virtualisation vendors also offer dynamic VM movement tools such as vMotion from VMware, or Live Migration from Microsoft. We know systems managers like this stuff because they tell us, for example, how easy it becomes to take down a physical server for maintenance when they can shift any workloads onto a different box without users even noticing.
Virtualisation could become its own worst enemy when it comes to availability however. By its nature, virtualisation makes it very easy to create VMs – and judging by the lack of forethought that can go into system design, it’s not hard to imagine environments where it looks like the sorcerer’s apprentice has taken over.
A second trend is less about the services themselves, and more about how, and where they are delivered. The remark, “But that’s not how it’s meant to be used” is reasonably common in support circles, and admittedly it is galling when business users have unrealistic, or previously unspecified expectations around the applications and services they use.
However, the move towards more distributed workforces, with an increasing reliance on smartphones and other such devices, means we might need to reset our expectations on what we mean by availability. From here, it looks like the concept of ‘office hours’ is becoming increasingly blurred, as does the concept of ‘office equipment’ and even ‘office applications’, the latter illustrated by someone sending me an email via their personal account, as it was too large to get through the corporate firewall. Perhaps I need to spend less time on trains and more time in an office, to remind myself that we’re not all Blackberry-lunking extras from Trigger-Happy TV. But I doubt it.
What about hosted services? Believe the hype and you’d start to think that all this availability stuff will soon become a thing of the past, given that all our existing kit will very quickly be replaced and transformed by a raft of Internet-based offerings. Now, before you can say tosh and poppycock, it is worth making the point that many of us are today relying on hosted services far more than we were just a few years ago.
The availability angle on the Web is, of course, that we are playing with fire. When was the last time you checked the T’s and C’s on your hosted email for example? No, I thought not. The chances are – and many companies have already been caught out on this – that liability clauses will not be stacked in your favour particularly if you are depending on ‘free’ (advertising funded) services for blogging, email and indeed document creation and sharing.
While this is not the place for glib answers, it is worth at least considering some glib questions. As one IT professional recently said to us, “People don’t want computers, they want what computers do.” Just how clear are you on the services IT should be providing to your organisation, wherever they are sourced? If you find this question difficult to answer, perhaps it’s worth spending a bit of time working out what these services are, and exactly what is seen as acceptable when it comes to availability levels. With availability, hope is not a strategy.
[As a postscript, it was with a certain feeling of glee that I received “Internal Server Error” from an ITIL Web site, when I was doing a bit of desk research for this article.]
Content Contributors: Jon Collins