In recent years the importance of analytics has grown dramatically, indeed an increasing number of organizations now want to use real-time analytics to make operational business decisions. The challenge is the amount of work – often repetitive work – that analysts and data scientists must carry out to make their data just perfect for the workloads they seek to apply against it.
This drudgery was bad enough when we built data warehouses and ran BI (business intelligence) queries against them to look for historical trends we could predict from. That kind of backward-looking reporting was yesterday, though.
Today, the value of information has never been higher, hence the demand for real-time data analytics – but that risks making the drudgery far broader in scope. Real-time analytics means that the data scientists and analysts doing the work cannot wait days for data to be extracted, cleansed, transformed into suitable formats, collated, and finally made available to be worked on. In short, they need an IT infrastructure built to enable them to work as analysts, not as data cleansers. They need a more responsive data ecosystem.
As a consequence, IT systems need to support five key requirements via a combination of hardware, software and processes.
- Be able to find the data needed. This is clearly the first step in any analysis. Finding the data to process quickly and easily is an absolute, especially when multiple sources of information may need to be accessed.
- Be certain that the data is both correct and fully up to date. This is no small issue, but unless the information being processed is the information of the instant then using data science to steer the real-time enterprise may not work.
- Eliminating or reducing the need for format conversions on import/export and data reconciliation. Data scientists shouldn’t have to spend time reconciling conflicting data sources or ensuring the formats of data from different sources align. (e.g. where there are coding differences such as data from one system referring to a product with the reference ‘XYZ-987’, whereas another one calls it ‘P00987-XYZ’. etc.) In the past this process was known as ETL, Extract, Transform, Load.
- Be able to copy data quickly to new spaces or be able to analyze it in situ. When you have found the data, you need somewhere to store it where it can be worked on. Performance is, of course, critical – especially if the data sets are large.
- Be sure that the data is always available when required. It’s likely that some work sets will have to be kept for long term usage, perhaps with the data being regularly updated. Thus, resilience and reliability along with flexibility and security will be essential.
There is another factor of which it is important to be aware, but it is one that is easy to overlook, namely data governance. There have always been legal limits on how an organization is allowed to use the data it holds. but today legislation such as GDPR and other privacy requirements make it essential that data be used responsibly. Add in the expectations of customers and shareholders, and data security and governance cannot be ignored.
Getting points one to five right can take much of the drudgery out of analytics, but some challenges are likely to remain. But improving the ability of the organization to use data is essential to business success.