A Business Understanding of Data Disposition
Understanding the Why, When and How of Data Disposition.
A lot of data management, like a lot of household management, is about keeping things tidy. The problem is that data is abstract and not physical. You can see that the contents of your kitchen cabinet are a mess that needs to be tidied up, but it is very difficult to see, or even conceptualize, that there is a data mess. A particular area of data management where this becomes important is “data disposition” – essentially, getting rid of data, and we are going to take a look at it in this article.
What Is Data Disposition?
Even the term “data disposition” sounds a little clumsy. Many people prefer to say “data deletion”, but this does not really capture the concept either. We are talking about getting rid of data, and it can indeed be deleted, but it can also be anonymized in a way that renders it useless for normal processing. We will explore why you would want to do that shortly.
There are other somewhat bizarre terms used for data disposition, and “data lifecycle management” is a quite common one. But this gives the impression of managing the entire lifecycle of data, and the data lifecycle is complex with disagreements about what it is. But when people say “data lifecycle management” they nearly always mean data disposition.
Why Dispose of Data?
Years ago data disposition was not seen as a priority. Data could be left in place as long as it was useful. Even if it was not kept in particular processing environments it could be stored in backups that could be recovered on demand. Today, however, things are different. Here are some major reasons we need to get rid of data.
Cost: It is often said that storage is cheap, but huge amounts of data will quickly add up to a meaningful expense. This is particularly true of Cloud environments where everything is constantly metered and charged back. Furthermore, there is always some processing that gets done on any data hosted in the Cloud, and the unnecessary processing is a further expense item. Getting rid of data reduces these costs, or at least prevents them from rising.
An important point about cost savings and cost avoidance is that they can be quantified. Staff who undertake data disposition can prove how much money they are saving, which is a really good metric to impress executives with.
Risk: Does the enterprise really want to keep all its data? In many cases it does not because data represents risk. In the USA most people are aware that they need to keep financial information for 7 years as it is a requirement of the IRS. But when that statutory period exists it is wise to get rid of the information to prevent unnecessary exposure to tax investigations. This is not to suggest covering up malfeasance, but to reduce risk in terms of legal entanglements. eDiscovery in civil cases is another big risk, and not having data that needs to be defended is better than having to construct a defense for it. So if there is no obligation to keep data, then it is wise to dispose of it.
Efficiency: Data needs to be governed and managed and this takes effort. Having many copies of the same legal agreement with different file names, and in different states, like draft, finalized, and revised, all stored in different locations is a recipe for chaos. Structured data can also suffer from such issues with a given database table being unnecessarily duplicated for one-off needs, and never cleaned up afterwards. This mess rapidly accumulates and confuses staff searching for data, and staff who use the wrong data.
When Do We Dispose of Data?
There are other reasons why data needs to be disposed of, but these are the major ones. Given this, when is data to be disposed of?
Here we have a problem, because data is not “one size fits all”. There are different kinds of data that have different business uses, and these dictate when it needs to be got rid of. Some data is subject to laws and regulations that dictate how long it must be kept for – or the circumstances under which it must be disposed of. Yet other data is so voluminous, e.g. data from sensors that are continuously creating data, that it cannot be allowed to exceed certain storage limits.
What this means is that each type of data needs to be identified, where each type is a kind of data that has its own data disposition rules. This is not easy, or intuitive. A top-down approach is to list all the rules from laws and regulations and for each area of the business to figure out what data they manage falls under any of these rules. Typically, this requires a “champion” model where someone in each business area leads the effort for their area. The champions have to be trained, and coordinated by some higher-level authority like Risk, Legal, or Data Governance.
A bottom-up approach is for each area to decide what data it is going to dispose of and under what circumstances. The top-down approach will not work for all data – only data that is subject to laws and regulations, so it is only partially derisking data. The bottom- up approach closes this gap. However, it requires a commitment to data management that many business units do not have.
A hybrid approach is to figure out centrally the different types of data in the enterprise and when to dispose of them. This goes beyond data subject to laws and regulations, and leads to policies for disposition. It still may not fully cover all types of data, but a good deal will be covered. This approach requires a strong Data Governance function to support the business units.
How Is Data Disposed of?
As noted above, data can simply be deleted. This is by far the most common method. But personal and confidential data can be anonymized. This is done when data is still needed for some other purpose beyond normal usage, e.g. analytics.
For structured data the process of disposition can be quite technical. For instance, a subset of records in a database has to be deleted, and the exact filtering conditions have to be applied to identify the right records. Then, the deletion process has to be proven to have worked. The number of records deleted must be logged along with when the process was carried out, and other information. This can be tricky as the information about the deleted records cannot be retained – you cannot search for records that were supposed to be deleted by using the data that is now supposedly deleted.
Disposition activities have to be scheduled, and this of course requires resources and effort. However, as we have seen, data disposition really cannot be avoided today.
Businesspeople are going to be increasingly involved in data disposition as time goes by and computing resources become more expensive, so this is one area of data management where we can expect to see a lot of growth.
If you’d like to learn more about my background and what led me to write this publication, I invite you to click below to learn more about my journey.



Excellent breakdown of why data disposition matters today. The cost angle is especially compelling since alot of orgs dunno how much they're actually burning on unused cloud storage. I've seen teams keep staging enviroments with production-size datasets for years, and when those finally got cleaned up the quarterly cloud bill dropped almost 15%. What makes this trickier is proving compliance with the deletion though.