The data retention policy in too many organisations has historically boiled down to ‘keep everything forever’, or something very close to it. As disks proliferate and backup tapes pile up though, such an approach is not sustainable over the longer term. In fact, some are already struggling, and even those that have taken a more discerning approach are being challenged by the rate at which unstructured data in particular is growing.
So what can be done? Throwing yet more storage at the problem may alleviate immediate symptoms but isn’t the basis for a longer term fix. Apart from the cost of additional capacity and the space required to accommodate new storage devices in the data centre or computer room, as requirements continue to escalate, more effort is required to manage everything. The choice then becomes to grow the data management team or risk losing control, neither of which is desirable in today’s environment.
There are some ideas and approaches that can help, however. These are based on the principles of implementing a more selective retention regime, and making sure that data that is retained is stored as efficiently as possible:
Data classification: The principle here is that not all data is equal in terms of importance and value, and by distinguishing between different classes or categories, it is possible to develop more objective and selective retention policies.
With the classification approach, you can define the amount of time you keep certain types of documents, transaction data, and so on, which allows you to get rid of data once the prerequisite time has passed. How you classify depends on your business, and it is not always necessary to be exhaustive. A lot can be achieved, for example, by simply identifying data that does not need to be kept at all, or which may be discarded after a short period of time.
Document versioning: Sometimes there is a need to retain all versions of a document through the various stages of drafting, review, approval and subsequent revision. This may be the case in highly regulated environments, for example.
Often, however, all that really needs to be kept is the final version. Similarly, the need to hang on to correspondence and copies of other forms of communication leading up to a final document or transaction will vary immensely depending on the industry and specific scenario. By understanding these differences and putting appropriate policies in place, backed up by solutions such as workflow and document management systems, the volume of data to be stored can again be reduced.
Storage optimisation: When it is necessary to retain data, you want to make sure this is done efficiently. Techniques such as de-duplication can dramatically reduce storage requirements, e.g. by preventing a document that was circulated as an attachment to multiple email recipients being stored multiple times in an email archive.
There is then storage tiering, which is based on the principle of saving cost by holding data on media and devices that are ’good enough’, but no more. Frequently accessed critical data may be stored on high performance resilient disc at one end of the spectrum, for example, with persistent historical data that is accessed infrequently put on cheap commodity storage, tape or even uploaded to cloud storage at the other.
Solutions in the areas we have discussed will often help to reduce management overhead. Modern content management and workflow systems allow automation of policy implementation, and the latest storage management software will simplify administration through virtualisation techniques and auto distribution/migration of data to the most appropriate location (e.g. tier or device).
It is also worth bearing in mind that getting your act together on storage will not only reduce cost and risk, but increase the chances of users actually being able find the data they need to make business decisions, so benefits extend well beyond the IT department.