Data Quality Issue Management for Businesspeople
Data quality is the discipline of making sure that data actually represents what it is supposed to, and can be used for its intended purposes.
Everyone is aware of the divide between IT and the business. How it originated, why it persists, and how it can be fixed are topics that have been debated for years – with limited success. This divide pops up in different places with different impact, and one of them is the area of data quality.
Data quality is the discipline of making sure that data actually represents what it is supposed to, and can be used for its intended purposes. Think of it as the health of the data. But sometimes things go wrong with data, and when that happens it is important to detect the issue and to fix it.
What Is Data Issue Management?
This is where the divide between IT and the business comes in. IT is oriented to technology as the solution to problems, and technology can be used to detect data quality issues. Perhaps not all data quality issues can be detected with technology, but a lot can be with modern data quality tools (more recently rebranded as “data observability” tools).
But what happens when a data quality issue is spotted by one of these tools, or as still often happens, by a businessperson looking at data in an output of a system? There is no magic technology that can automate the resolution of the problem – it is pretty much dependent on human effort. So at this point, IT usually either drops out of the picture completely, or takes a subordinate role, waiting to be told what to do by the business.
What has to happen is to drive the data issue to a resolution – a process known as Data Issue Management. It is not easy to define Data Issue Management because it is a series of steps that come into play. Maybe not every step is needed for a particular data issue, but in aggregate they are:
Issue confirmation: Is the “data issue” really an issue, an issue or not. Perhaps it could be a misunderstanding.
Known Issue: Is the data issue one that has been seen before? If so, is there a
known resolution that can be applied, such that the issue is fixed right away?
Impact Assessment: What impact does the data issue have? The immediate
impact, if any, is important to know, so that propagation and knock-on effects can be stopped.
Notification: Who needs to know about the data issue, and who can help in its resolution? Most importantly, who will coordinate the overall data issue
management process.
Analysis: Understanding the cause of the data issue, and its impact beyond
anything immediate.
Resolution Design: The agreed-on fix for the data issue
Handover: The handover of the resolution to the staff who will implement it. Resolutions can involve process re-engineering, training and education of people,
and technical fixes. A resolution could be a mix of them all.
Closure: Informing the stakeholders involved in Data Issue Management that the
data issue has been resolved.
This list is high-level and each step can be quite complex in its own right.
It is also important to note that Data Issue Management leads to the design of a resolution. Who implements that actual resolution will vary. If it is technical, then IT staff will implement it. If it is a failure of training, then perhaps Human Resources will develop new training modules. It is the nature of Data Issue Management that the resolutions can be very diverse.
Local vs. Systemic Data Issues
Another wrinkle of Data Issue Management is that some data issues can be very localized, whereas others are systemic.
A localized data issue is one that occurs in a single environment like a business unit or system. Its cause and effects are completely confined to this context. As such it is easy for the team involved to run the entire Data Issue Management process and not involve anyone else.
Systemic data issues are data issues that cross organizational and/or system boundaries. They are a result of the fact that a great deal of data flows across enterprises, often in ways that are not fully understood.
Data Issue Management for systemic data issues can be tricky. This is because at the point of origin, the “data issue” may be unimportant and not considered to be an issue. It only manifests as an issue that has a significant negative impact much further downstream in the flow of data. Thus, the staff that are impacted are not empowered to fix the root cause of the problem as they have no authority over the team where the root cause lies.
Systemic data issues require an enterprise-level response that includes strong governance. Appropriate Data Issue Management processes can then be built. Unfortunately, this is rarely done, and in many enterprises systemic data issues persist for years. Ultimately, it is a failure of Data Governance leadership. Data Governance units cannot wait to be told what to do about systemic data issues; they should develop the Data Issue Management policies, standards, and processes to address them.
Data Products
The evolution of data engineering teams, and especially the development of data products by these teams, has broken with more traditional approaches to Data Issue Management. Decades ago, during mainframe days, in-house production control teams oversaw a lot of Data Issue Management processes. These teams ran production jobs, and when an error was discovered, they would coordinate the response. Today, data engineering teams that build data pipelines also run them. When these data pipelines produce data products, then the expectation is that any data issue in the data product will be handled by the data engineering team involved.
This approach has provided more clarity about who will coordinate Data Issue Management. Of course, a data issue in a data product may have been caused by an upstream data source outside of the control of the data engineering team. However, there is at least an understanding of who will drive the Data Issue Management process.
Fragmentation and Lack of Definition
Despite the promising developments in the area of data products, Data Issue Management remains fragmented in many organizations. Data issues are often not even recognized as such, and Data Issue Management processes – to the extent they exist – are developed ad hoc in different units. Not all of the steps in Data Issue Management that were outlined above are followed, so these processes do not always generate good resolutions. Data Issue Management processes consist of people and processes much more than technology, which is why IT is much less involved. It may also account for why we see such fragmentation and lack of definition.
Ultimately, Data Governance within each organization will determine how successful Data Issue Management will be.


