Understanding Data Acquisition and Third-Party Data
When computers first became widespread, data was thought of as a byproduct of processing and not something in its own right.
Data has increasingly become a business over the past couple of decades, and it is important for data practitioners and businesspeople alike to understand how to deal with the economic and legal implications involved.
Nowhere has this become so important as in what is often called “Third Party Data” that is intimately linked with the process of Data Acquisition. This article looks at what is involved in both.
Where Did Third Party Data Come From?
When computers first became widespread, data was thought of as a byproduct of processing and not something in its own right. All data was linked to the internal systems that automated hitherto manual processes, meaning that the data was generated by the enterprise itself. Of course, this was not totally the case, as some data like tax rates and country codes always had to come from the outside.
This began to change when some industries realized they needed to get data to run certain specialized systems. For example, Bloomberg L.P. was founded in 1981 to supply data about financial markets to financial services institutions. This included lists of financial instruments like stocks and bonds, and their prices on stock exchanges. Financial services institutions had to buy this data from Bloomberg.
This situation was replicated in other enterprises where there was a need for external data to drive operational processes. Data vendors emerged to meet these needs and enterprises purchased the data from them. There was really no alternative as the enterprises requiring the data could not do it for themselves in a cost-effective manner.
The result was that a set of “traditional” data vendors grew up that supplied enterprises with data they needed for specialized use cases, the majority of which were for operational systems. Some of the use cases were uncommon, like securities trading platforms. Others were more widespread like credit information about businesses or individuals for loan origination.
The Rise of Big Data
This situation persisted until about 2010 when “Big Data” came onto the scene. In reality, the size of the data was not as relevant as the software that enabled advanced analytics and data science.
There had always been reporting from computer systems, but the reporting gradually shifted to predictive and prescriptive analytics with the Big Data technology. However, enterprises found that the information available from their operational systems was not sufficient for these analytics. Other data from outside was needed, such as weather data, surveys of customer sentiment, econometric data, consumer prices, and so on. It was by understanding patterns in this data integrated with operational systems data that enterprises could unlock real value from their analytics.
The Data Lake
One very important concept from Big Data that enabled the use of Third-Party Data was the Data Lake. A Data Lake is a large storage environment where almost any kind of data can be stored. Previously, there had really only been Data Warehouses, which had a fixed data architecture that had to be specified in advance. By contrast, nothing needed to be prespecified in a Data Lake in order to put data into it. And the data that was put into Data Lakes was in the form of files.
So, Data Lakes combined with the new analytics technologies enabled enterprise to develop solutions for huge numbers of use cases that required Third-Party Data. The demand for Third-Party Data skyrocketed in the period 2010-2020.
Data Chaos
As often happens, technology enablement with strong demand was not matched by data management practices. Different units of enterprises went out and purchased data without any central coordination. The same data might be purchased independently by different units, sometimes for different prices. Legal and Procurement departments may not have been involved, so the terms and conditions were poorly understood. The Third-Party data might get ingested into all kinds of different systems, making further distribution difficult.
Data Acquisition
It was in order to fix these problems – or prevent them from arising – that the practice of Data Acquisition emerged. This seeks to manage the entire lifecycle of all Third-Party data, from its initial identification to its distribution within the enterprise, and finally to discontinuation. Data Acquisition is much more than mere ingestion, which is only the technical movement of data from outside the enterprise to within it.
Data Acquisition is centered around Data Lakes, as this is the central point where all Third-Party data is landed when it first comes into the enterprise. From there, it can be further distributed in a controlled fashion, as per the standard operating procedures established for Data Acquisition.
Usually there is a Data Acquisition Manager with a team overseeing all of this. The processes are standardized and all relevant partners are involved. This includes Legal to review terms and conditions of all data contracts to ensure they are acceptable, and Procurement to ensure pricing is acceptable and standard Procurement processes are followed. The Data Acquisition team will determine if any Third-Party data being requested is already available, and that proper data management needs like data privacy and dataset cataloging are carried out.
Beyond Third Party Data
While our focus here has been on Third Party Data – data purchased from data vendors– there is actually a much wider range of external data that is used by modern enterprises. Such data may be sourced directly by scraping from the Internet. Or it may be the result of surveys that are sent out. There are other modalities too. What is important is that the data is still handled by Data Acquisition. It is striking how Data Acquisition emerged from nowhere over the course of a few years to become such an important part of data management.



I loved Malcolm’s post, as per usual, but this one magnifified what I recognize as an “often invisible” blind spot data topic of opportunity — invisible because its most typical use is embedded in hardware, software, testing, performance, innovation and complience “backer-ended” systems, smart-factories, robotic execution, oversight, and IOT sensor delights.
What I had lost sight of, until this post, was the pace of growth of this DAQ (Data Acquisition) sector of all party-types. I don’t pay much attention to the Industry 4.0 side of the house. As with everything AI touches, Malcolm C helped me connect the dots to the agentic elephant in the room, and how it may explode…
Meanwhile, back at the Pedro C #dataninja ranch, Im still too often still helping to align on what a Customer vs Supplier rec should be. 🤣🥷
Is it just me?? Talk #data to me, curiouser me would like to know!