How Data Became Everything
A Journey from Obscurity to Global Dominance, and Why the Journey Matters
Over a period of seventy years, data rose from being viewed as irrelevant, indeed despised, to the most valuable economic resource apart from energy. In fact, much of the transition occurred in just the 35 years between 1990 and 2025.
How and why did this happen? For those of us who lived it, who were always proponents of data, it is an outcome far beyond anything we could have dreamt of. Let us try to understand what happened.
In The Beginning
We will stipulate that “data” is the stored representation of facts held in computerized systems. This is not a broad enough definition, but it will suffice for the present discussion.
There was a time when data, using this definition, did not exist for normal economic affairs. Record-keeping – the tracking of economic things and events – was done using paper and ink in ledger books. Clerks, sometimes called “computers”, performed the record keeping. This state of affairs persisted for centuries until 1965.
It was in 1965 that the IBM Series 3 range of computers became widely available to organizations. There had been computers before that, but they had mostly been confined to defense, intelligence, and institutional research.
The new computers could be purchased or rented. Renting was known as “time sharing”. A company that could not afford to buy a computer could rent time on a computer used by another organization when the latter was not using it. Time sharing quickly and vastly expanded the range of businesses able to use computers.
The Programming Revolution
The new computers were used to automate hitherto manual processes of record- keeping. For instance, updating a bank’s ledger book when a customer came into a branch to withdraw or deposit money was replaced by a program. The program was a set of instructions for how to update the now electronic record – the data – about the customer’s banking activities.
Once a program was successfully written it could be executed an infinite number of times and would perform the same set of steps each time. A human clerk could make mistakes, but a fully debugged program would not. If a bank based on manual processes expanded its customer base it would need to hire more human clerks to keep up, but if it used computers it could scale up much more cheaply with more computing power.
The Age of Operational Systems
The incontrovertible economic advantages of shifting from manual processing to computerized automation unleashed a massive shift in which nearly every clerical process was moved into computerized architecture. This was the paradigm while mainframes were the dominant architecture from 1965 to 1982.
Where was data in all of this? Data was certainly recognized as being vital. In fact, the “Technology” or “Information Technology” departments of those days were often called “Data Processing”. But the data was simple and viewed as what was necessary to run the previous manual operations. Each manual process got converted in isolation from all others to a computer system. Its data was its data. There was no interaction with any other data or any other system. Today this is known as a “siloed” architecture, and was called “stovepipes” in the past.
The siloed systems made data very uninteresting. There was not much to think about other than what the system needed in terms of data.
The Rise of Databases
And yet there were people who thought about data. They realized that although data was held as individual files, some of this data was the same in different files, and some data had relationships to different data in other files. This gave rise to the notion of a “database”, a set of related data all stored in one place. A database would be more efficient, and allow more complex processing more easily than the standalone files that were in use up to that point.
It was all very nice in theory, but it was just theory. Eventually, however, relational databases emerged. As with computer hardware, these tended to be developed for defense and intelligence, then emerging into the mainstream economy with financial services being early adopters. There were database architectures other than relational databases, but relational databases became dominant, so we will focus on them.
Data Steps Out on Its Own
Relational databases were complex enough to need to be designed well. Designing a data store was something new. E. F. (“Ted”) Codd who contributed a great deal to the theory of relational databases came up for rules of good relational database design for update processing. Later in the 1970’s Peter Chen developed a visual notation for what Codd had formulated, so a standard graphical representation of a database design became possible. This visible representation of the components of a database and their relationships was revolutionary. Purely logical, abstract, immaterial data structures could be much more easily managed by humans.
For the first time data was seen as something in its own right, rather than a byproduct of automation. Costly mistakes in database design quickly persuaded businesses to invest in data modeling – the design of databases. Data as a discipline had been established.
Technology Advances
In 1982 personal computers (“PC’s”) emerged. They spread like wildfire, throughout organizations of all kinds. The turbulence of the period from 1982 – 1987 saw many changes, including the adoption of relational databases in PC environments, especially the housing of databases on dedicated servers – specialized hardware dedicated to databases.
But by now there were very few manual processes that remained to be automated. That had been done, so attention shifted to improving what was in place and upgrading it to keep in synch with changes in the business. A computer program or a database structure very much reflects a need at a point in time. As a business evolves, so programs and databases need to evolve with them. This is actually far more difficult than creating another point in time solution from scratch, which is why we tend to see systems and databases being created anew rather than changed.
While this was going on, some people began to think of data as an asset that could be used outside of automated operations. During the late 1980’s and especially in the 1990’s this viewpoint slowly gained ground. The concept of data warehouses emerged. These were database environments that took data from automated systems, transformed it, integrated it, and organized the historical aspects of it. Data warehouses were strictly for analyzing and reporting data, and performed no process automation. Bill Inmon and Ralph Kimball contributed ideas to specialized design patterns for data warehouses that were widely adopted.
But it was not just data per se. New technologies emerged that supported data warehouses. These technologies made it easier to move data, to detect data quality issues, and to process historical changes in data.
The Tide Had Turned
The focus was now on using data from operational systems to manage organizations. One example is Sam Walton of Walmart who pioneered this approach beginning in the 1970’s, using data to understand customer behavior and inventory movement. The results were self-evident with Walmart growing from a small Arkansas retailer to one of the world’s largest companies.
The realization that data was not just a byproduct of automation, but could contain golden nuggets of information that might improve revenue or reduce expenses took hold in the 1990’s. Data-centricity was on the rise, and process-centricity was declining. But much more was yet to come.
The Internet
The major technological innovation of the 1990’s was the Internet. It coincided with the understanding of data as a strategic asset, and had a profound impact – slowly at first, but quickly accelerating.
Organizations found that they could interact with data outside of their proprietary technological environments. They could even reach individual consumers via the Internet. This was something new and strange that had never happened before. The volumes of data available via the Internet also grew rapidly, which was also unexpected. Much of this data was “unstructured data”, such as documents, images, video, and audio. It had not been used very much in traditional data processing, but it suddenly became important.
And then of course the excesses happened. Vast amounts of capital were fed into poorly thought-out ventures that had a vague plan for revolutionary disruption of some market or other. This capital allocation alone frightened the executives of big traditional well-established businesses, and they started pouring money into internal Internet projects, hoping to survive what they thought would be a massive onslaught from dot.com startups.
Few people really understood what was happening, and many believed the ridiculous value propositions of the ill-conceived startups, but in March 2000 it all came to an end with the NASDAQ market crash. The excesses were cleared out, and the hitherto oppressed “bricks and mortar” crowd exacted a terrible revenge, punishing the IT departments of their enterprises for all the money they had wasted.
The Dark Age
A real Dark Age followed from 2000 to 2005, with the events of 9/11 only adding to it. On the surface it seemed that everything had gone back to a much leaner version of the early 1990’s, with an acceptance of the importance of data, but little appetite to make any bold claims about it.
And yet, under the surface things were developing. There was a realization that the Internet was more than the “dot.com bust” and that there were now very successful companies, such as Google and Amazon, that had data at the heart of their business models. This was something totally new. Another realization was that data was its own thing that had its own special problems, needs, questions, solutions, and body of expertise. Data was not technology. Multiple data warehouse project failures had flawless technology but were caused by all kinds of problems in the data that were not understood or ignored.
This came to a head in 2005 where the new discipline of Data Governance burst onto the scene. It was now understood that if data was a valuable resource it had to be managed like one. Just as Human Resources departments oversaw the management of people in an enterprise, so Data Governance was to do the same for data.
Big Data and The Cloud
A big boost for Data Governance came with the 2008 Global Financial Crisis. Governments realized that the data they had been using for regularity and economic affairs was flawed. Despite vast amounts of money being spent by governments, none of them had been able to predict the GFC. Data now came under intense regulatory scrutiny. Exactly what that achieved is open to debate, but it did raise the profile of data in governmental circles. The perception of the importance of data gained even more ground.
Then another technological revolution happened. Echoing the rise of relational databases, new architectures for managing ultra large-scale datasets emerged. The needs of Defense and Intelligence to quickly process vast amounts of signals intelligence could not be met by traditional relational databases. So instead, completely new architectures were created using “junk hardware” that was cheap and could fail, but could easily be replaced. Parallel processing with built-in redundancy meant tasks could keep running and data would not be lost when junk hardware failed. The new architectures could scale simply by adding more hardware in a way not possible before. New kinds of databases held data in designs that were highly optimized for querying. Google and their kin also adopted and popularized this new paradigm that became known as “Big Data”.
The original idea of “Big Data” was that it could be used for ultra-large scale datasets, at the petabyte level. But the new technologies involved were very appealing, even for much smaller amounts of data. The experience with the Internet, combined with more and more sophisticated analytics meant that traditional data warehouses were no longer adequate. A data warehouse is built to satisfy known queries against data, and could take years to develop. But what happened when analytics were unpredictable, and needed to provide results in short timeframes?
The result was data lakes. These were environments that used the Big Data approach and technology. They could quickly ingest any dataset and integrate it with other data to be used in analytics based on tools that permitted rapid insights.
The connectivity to data lake environments increasingly became via the Internet. The Internet is often represented as a cloud, and this paradigm became called “Cloud Computing”. Companies found that it was cheaper to rent data storage and processing capacity from providers than to continue with their own highly expensive data centers.
At first it was difficult to move to the Cloud, but it got easier and by 2015 the vast migration was well under way. Organizations could now manage much more data, many more different kinds of data, and perform new kinds of analytics in these environments than they had been able to before. Data-centricity was firmly established. The number of organizations with data at the center of their business models rapidly expanded. By 2017 The Economist magazine declared that data was the “New Oil” – the world’s most valuable resource.
Data Privacy
Throughout all this time, data was not something that mattered to ordinary people. It sounded academic, or corporate, and not part of their normal lives. That began to change in 2013 when Edward Snowden leaked many classified NSA documents. Max Schrems, a law student in Austria, then sued Facebook in Ireland, complaining that Snowden’s leak showed that Schrems’ personal data was being shared by Facebook with the NSA. Transfer of personal data from the European Union to the US had been covered by a Safe Harbor arrangement up to this point. In 2015 the European Court of Justice found in favor of Schrems, and the Safe Harbor provisions were invalidated.
The EU then came up with the General Data Protection Regulation (GDPR). This was a landmark. For a long time there had been a very tiny minority of professionals who were concerned about the management of personal information, but they had generally been ignored. Now data privacy was front and center, and individual people quickly became aware of its importance. Data – or at least personal data – became a major concern for ordinary people.
More legislation followed, like the California Consumer Protection Act, and commercial enterprises were forced to greatly improve the ways in which they handled personal data.
The Rise of Artificial Intelligence
As Big Data gained ground, advanced analytics became possible. At first this was data scientists building custom models to make predictions, rather like models used in academic research. The data scientists were highly qualified, very intelligent, and quite expensive. Slowly, however, it was realized that the commercial world is not like academic research. No hypotheses needed to be formulated and modeled. Instead, software tools could look for patterns in data that might have business value. And slowly the data scientists were replaced by software tools and developers.
This was the rise of Machine Learning (ML). It enabled a huge expansion in the advanced analytics that organizations could undertake. The focus shifted to gathering and preparing the data to be fed into the ML tools and making the outputs available to business users. This raised the profile of data even more. Data had to be quickly gathered, understood, and qualified for use.
As this was happening, AI burst onto the scene with GPT4 in late 2023. A major goal of AI was to actually to replace the legions of developers needed to develop the data pipelines involved in analytics, and thereby expand the use of data. However, AI also utilized the vast amount of unstructured data that had been made available via the Internet. Everybody could now derive much more value from AI analyzing this data than had been possible when only search was available. Yet there were other consequences. Data storage and compute costs, though greatly reduced on a unit basis rapidly expanded on an aggregate basis. In fact, data and its processing are now becoming synonymous with energy. Data has turned the “New Oil” metaphor upside down because it is the primary consumer of energy. This cannot be reversed as AI is needed to drive the economy, and data fuels AI.
You Are Here
Which brings us to 2025 and where we are now. Nearly everything in our economy depends on data.
No scientific approach could tell us why this is the current reality, because the current reality is the product of a set of historical processes. It was not generated by uniformitarian processes that have always been happening in the same way for all time.
Is any of this important? It is. The reason is that much of the past remains embedded in the present. Not only hardware and software, but also ideas about how we should manage and use data. The environment we have today is not something that recently came into existence where nothing existed before. Rather, it is a product of the historical development we have outlined. It impacts how we deal with the current state, which is becoming ever more complex. But that is a topic for another day.