First published: November 2011
The definitions outlined below were used during a research study conducted by Freeform Dynamics in November 2011, using The Register news site as a vehicle to collect responses (via a web survey).
The aim was to put the solutions often discussed under the ’Big Data’ umbrella into the broader context. At the time the research was conducted, however, no clear definition of the term ’Big Data’ was being used consistently across the industry. Rather than risk falling foul of ambiguity, confusion and assumption, we therefore described the solution types in which we were interested using generic language as follows (these are taken directly from the questionnaire):
General purpose RDBMS servers: Oracle, SQL Server, MySQL, DB2 etc
High performance RDBMS configurations - Clusters, grids, etc.
OLAP multi-dimensional database systems: For high volume aggregation and drill-down analysis, e.g. in a data warehouse context.
Write once read many (WORM) databases: For storing rapid high-volume feeds, logs, call records, etc where data is rarely or never updated following initial capture.
Rule-based stream processing engines: For inspecting real-time feeds to identify patterns, anomalies, etc and create events.
In memory databases: Where the entire database, not just a cache, is held in machine memory to maximise speed.
Scale-out storage architectures: Storage systems able to scale linearly through adding software, processing and storage elements integrated into standard building blocks under the control of a common management layer.
Distributed indexing and search: For implementing fast distributed object storage and retrieval on scale-out storage architectures (including the so called ‘No SQL’ approach).
Distributed data analytics engines: For executing parallel distributed data analysis on scale-out storage architectures.
Legacy databases and file systems: No longer contemporary, but still underpin older mainframe, mini and unix based applications.
While the above list is certainly not exhaustive, it was useful to provide us with a feel for how options across the broad spectrum of storage and database management technologies were considered to be relevant by participants in the study.