Understanding Data Acquisition for Third Party Data
A clear explanation about the fundamentals of the Data Acquisition Cycle.
Data Acquisition is a relatively new area of Data Management that has come to prominence in recent years. It is the process by which Third Party Data is brought into the enterprise and is particularly important for businesspeople to understand.
We reviewed Third Party Data in an earlier article Understanding Data Acquisition and Third-Party Data, but basically it is data that is created outside of the enterprise by other organizations and which the enterprise needs to bring into its data environment to use.
IT versus The Business
At first it might seem that Data Acquisition is a purely technical matter that IT can take care of. Part of it certainly is, but a great deal concerns the business. The diagram below summarizes the entire Data Acquisition Cycle, and the only “pure” IT part is ingestion which comes at the very end and is the loading of data from outside the enterprise into what is usually a data lake.
The Data Acquisition Cycle
Of course, IT will be involved throughout the cycle, but the cycle is essentially business- driven. This really needs to be appreciated by businesspeople. Unfortunately, businesspeople sometimes simply identify “data” with “databases” that are maintained by IT, and think that it is all the responsibility if IT. But IT cannot be expected to be responsible for data content, which is something the business needs to address – although IT can provide some support.
The Details of The Data Acquisition Cycle
Let us now look at the Data Acquisition Cycle in more detail.
Use Case Crystallization: The description of the use case to be solutioned. It will identify any data that has to be acquired from outside the enterprise.
Information Asset Design: The end product that has to be produced as a result of the implementation of the use case, and any asset that has to be developed to produce it. Today, the great majority of use cases are analytical in nature, so models including AI and their outputs are the information assets. These assets lie outside of the scope of data acquisition, but what is important is that they provide a more detailed understanding of the data that is required in data acquisition.
Data Sourcing: The identification of possible sources of external data, including the data vendors that can supply this data.
Data Vendor Engagement: Data vendors can be contacted based on the list developed during data sourcing. The objective is to determine if they can actually supply the data that is required, and how to proceed with them.
Data Profiling: A dataset obtained from a data vendor is examined to understand its semantics and high-level data quality.
Data Evaluation: The dataset is subject to testing to determine the extent to which it is relevant for the use case.
Proof of Concept: Sometimes, the evaluation is turned into a proof of concept (PoC) where considerably more development is undertaken to find out if the use case can be solutioned. This is the equivalent of a small development project.
Business Case Development: The use case is augmented with results of the data evaluation and/or proof of concept. A more detailed cost-benefit analysis with risk analysis is created for executive review.
Executive Approval: The business case is reviewed by executives who will have to fund it. They will approve or reject the business case.
Vendor Negotiation: At this point it is clear what data is needed from the data vendor. Procurement will lead the contracting effort to finalize the supply of data from the data vendor. Occasionally, these details have all been worked out previously and this step is not required.
Implementation Specification: The design of the ingestion process.
Ingestion Setup: The ingestion process is implemented.
First Successful Implementation: The dataset is loaded via the ingestion process for the first time. Testing is undertaken to prove the ingestion was successful.
Production Turnover: The ingestion process is turned over to the staff responsible for production operations, who integrate it into the broader solution for the use case. The actual data acquisition cycle can vary from project to project and enterprise to enterprise, but the above outline captures the most common steps.
Why Does All This Have to be Formalized?
It is not enough to understand that there is a Data Acquisition Cycle, but it is also necessary to centralize it and standardize it.
This is a departure from the old way of looking at data, and it may seem tempting for businesspeople to reach out directly to data vendors to find out what the vendors can offer. Many years ago nobody gave much thought to data, and staff operated under the impression that if they could access data within the enterprise they could use it for anything. This attitude became extended to external data, such that if businesspeople paid for data then there was no problem, and they had no need to consult anyone else.
However, Data Acquisition does need to be governed and managed for the following reasons:
1. Licensed Internal Data Is Not Necessarily Available
Businesspeople will likely look within the enterprise for data that they would otherwise need to source externally. But, any licensed data that exists in the enterprise will be subject to the contractual terms under which it was licensed. These terms can easily limit how the data can be used such that it must not be used in solutioning the new use case. The businesspeople involved may know nothing of these terms and may just find a way to get hold of the data, thus breaching the contract.
2. Businesspeople May Try to Contact Data Vendors Directly
It may seem a simple matter for project personnel to contact data vendors directly to find out what they have. However, this may circumvent contracts the enterprise may already have with data vendors. It may also compromise the enterprise if no confidentiality agreement is in place with the data vendor. Further, the businesspeople involved may not be aware of a data vendor that the enterprise has decided it cannot do business with, perhaps because the practices of the data vendor are considered risky.
3. PII May Be Involved
Today most businesspeople are aware that personal information (PII, or personally identifying information) needs special treatment, which may deter them trying to license such data on their own. That may still happen, but what is more likely is that the businesspeople may think the data they are buying is not PII, but it reality it does contain PII. For instance, a list of small businesses may include sole proprietorships. In some data privacy laws, sole proprietorships count as PII because they pertain to a specific individual.
4. Regulatory Impact
Use case crystallization is the point at which the use case is understandable in detail. This means that the use case can be judged in terms of data and AI regulations. For instance, it should be fairly easy to determine if the use case does or does not involve High Risk AI as defined in the Artificial Intelligence Act of the European Union. Similarly, the use case can be judged as to whether it must comply with the California Privacy Rights Act.
5. Data and AI Ethics
Though not strictly data legal concerns, the use case can also be judged in terms of data and AI ethics. The alignment to the stated values of the enterprise, and frankly decent behavior, should be confirmed.
These risks and problems can be mitigated with a centralized Data Acquisition function that incorporates strong governance. Unfortunately, the project mindset tends to make a project team want to cut all external dependencies so they can meet their deadlines, and so left to themselves the team that came up with the use case will not want to engage such a Data Acquisition function
This points to the need for a Data Acquisition policy that lays down how Data Acquisition will be done in the enterprise. Of course, execution is important, and no Data Acquisition function should be bureaucratic, slow moving, and inefficient. In recent years the practices of Data Acquisition have matured and are much better understood, so enterprises should be able to create these functions that are business-facing and successful.
If you’d like to learn more about my background and what led me to write this publication, I invite you to click below to learn more about my journey.


