Published/updated: November 2011
‘Big data’ is one of those ‘buzz’ terms that is difficult to avoid, at least by anybody who reads IT-related publications, or receives vendor marketing materials. From that perspective, ‘big data’ certainly is big. And as is typically the case with ‘buzz’ terms, there are not only serious technology developments but also serious business issues (and their potential solutions) behind the hype. But where are we when it comes to actual adoption of those technologies? And when does data become ‘big data’?
Let’s start with the last question. To many, the answer is simple: data becomes ‘big’ when you’re talking volumes ending in terabytes (TB), petabytes (PB), exabytes (EB) and beyond (yottabytes, anyone?). Others apply criteria such as a number of records, transactions, or files; if the number is mind-bogglingly large, then surely it must be ‘big data’. But that’s missing a crucial point about ‘big data’ – it’s not just about quantity, but also about the nature of the data. Put differently: just scaling up your existing systems to accommodate ever larger data volumes doesn’t necessarily mean you’re dealing with ‘big’ data. The ‘big’ comes into play when you’re looking at all the data that resides outside your existing, typically well-structured, systems and wondering how on earth you can harness it for business benefit.
Next, let’s get another definitional issue out of the way: ‘structured’ versus ‘unstructured’ data. Categorizing ‘structured’ data tends to be less contentious because this is data that exists in tabular form, typically in relational database management systems (RDBMS). And then there’s the rest. To call it ‘unstructured’ applies only in the sense that this data doesn’t reside in traditional RDBMSs. But there are many shades of grey. For example, the data in system or web logs is usually very well structured, and each data element has a known definition and set of characteristics. Likewise, data from a social media stream, like Twitter, is well-structured in some ways (e.g. defined length of message, use of operators such as @ or #), and yet is totally unstructured in others (e.g. the content of the message could be anything). Email, documents, spread sheets and presentations also fall into this category to a certain degree; it all depends on the context in which they are stored. Then there are blogs, pictures, videos and all kinds of other data elements which your organisation may well wish to understand better but doesn’t yet capture within existing systems.
Of course, there’s certainly no getting away from data growth. A recent survey* conducted by Freeform Dynamics shows that most organisations are seeing data volumes increase, with ‘unstructured’ data for many looking set to grow even faster than their ‘structured’ data, as illustrated in Figure 1.
This is where the first check point comes for an organisation wondering whether ‘big data’ applies to them and what is the potential importance of being able to unlock value from the ‘unstructured’ data it has available? This is regardless of whether such data is created or collected inside the organisation, or arriving through external feeds. Most organisations have a strong gut feeling that there ‘must be something in there’ but haven’t got the means to get at ‘it’.
But how can you justify investing in technology and setting aside resources when you don’t even know what you’re looking for? Especially when it may not even be clear which technology might be best suited to this data equivalent of ‘panning for gold’.
This classic ‘chicken-and-egg’ situation is most likely one of the key reasons why few companies today are extracting value from information held outside systems designed for handling structured data, as shown in Figure 2.
Nevertheless this dilemma is set to change, as developments in advanced storage, access and analytics come into play. Together, these help to address ‘big data’ problems not just by being able to crunch lots of data but also by doing so in innovative ways. And in many instances, they use commodity hardware and open source software, which in turn can reduce the cost of entry, provided of course the relevant skills are available within the organisation.
It goes beyond the scope of this brief article to go into ‘big data’ technologies in more detail. But before any technology decisions are even considered, business and IT executives need to identify whether they’re really in ‘big data’ territory in the first place, by applying the ‘three Vs’ test: volume, variety, and velocity.
As described in Figure 3, the Three Vs together capture the essence of ‘big data’. This is of course just an artificial construct, albeit a very useful one that is becoming increasingly accepted. How an organisation applies the definitions and criteria is, of course, a matter for discretion. What’s most important is having – or gaining – a clear understanding of the organisation’s data landscape, how that is used today and whether there is a reasonable chance that further useful information might be extracted, above and beyond what’s already being done.
* Online survey conducted during first half of November 2011; 122 responses. Organisation size ranging from under 50 employees to 50,000 and above. Respondents mainly IT professionals, with 55% based in the UK, 17% in the rest of Europe, 16% in the USA, and 12% in the rest of the world.
CLICK HERE TO VIEW ORIGINAL PUBLISHED ON
By Dale Vile & Jack Vile
Making the right technology investments in today’s fast moving digital age can mean the difference between success and failure in pretty much any industry. But how well are decisions are actually being made in this space? ...more
By Dale Vile
Some argue that IT operations doesn’t matter anymore; it’s all about developers. Our aim in this paper is to re-balance the discussion based on research in which feedback was gathered from over 400 European IT professionals. ...more
By Dale Vile & Jack Vile
We often hear that cloud computing dramatically reduces the need for in-house IT teams, and might even lead to their ultimate demise. The research reported here provides a very different view based on analysis of real business objectives and actual experience. ...more
By Dale Vile & Jack Vile
As technology becomes smarter, more opportunities arise to exploit AI, machine learning and other forms of intelligent systems to drive efficiency and transformation. But what’s the impact on IT teams? ...more
By Tony Lock
Have we all been caught asleep at the capacity planning wheel? Business users today want, and expect new IT services to be delivered in the blink of an eye, the necessary resources provisioned instantly, and changes made “on demand”. ...more
By Dale Vile and Tony Lock
It’s easy to be caught out by a cyber attack or internal mistake that leads to your customers’ data or important intellectual property ending up on the black market. Making sure your business is adequately protected and is able to respond effectively ...more
By Dale Vile, Tony Lock & Jack Vile
Application programming interfaces (APIs) have been around for decades. In the early days of IT they were primarily used to give programmers convenient access to libraries of prebuilt functions. As systems became more distributed, APIs found their place ...more
By Dale Vile & Jack Vile
The world we live in is increasingly digital. As the smart use of technology leads to markets speeding up and becoming ever more unpredictable, a strong set of established offerings and execution capabilities only gets you so far. Feedback from 1,442 IT ...more