Have you ever wondered if you are data? Probably not, but despite this sounding like a ridiculous question there are in fact people who indirectly think they really are data. Let’s explore why this is.
What is Data?
If anyone is to be reduced to data, we should first have a good idea of what data is. Data has the following properties:
1. It is stored in a physical location.
2. It is composed of symbols, where a symbol is a sign that by convention represents something else.
3. Data exists to be processed. There has to be some kind of mechanism that uses the data as an input and produces an output. The nature of the output depends on the nature of the data that is input and the processing mechanism. There may be additional properties, but that seems to be enough to establish a distinction between data and everything else.
Is DNA Data?
This brings us to the question of what DNA (deoxyribose nucleic acid) is. In a biochemical sense it is an acid composed of an alternating sugar (deoxyribose) phosphate “backbone” with nitrogenous bases of four kinds attached to every repeating sugar-phosphate monomer. DNA exists as two paired antiparallel molecules in normal cells, forming a double helix. I could go on, but it is irrelevant.
Just as the meaningful content of a book cannot be reduced to paper and printer’s ink, neither can the information content of DNA be reduced to a heap of biochemical jargon. Let’s look at how it lines up with data’s properties.
1. DNA is physical and it stores information.
2. This one is a bit trickier.
a. DNA is composed of symbols. These are sequences of 3 nucleotides. Each sequence represents an amino acid, or controls processing. There are 64 possible combinations of sequences, and 20 amino acids that are normally found in living organisms.
b. There is a convention of what each 3 nucleotide sequence represents what amino acid (or what processing signal) that seems to hold with a few variations across all living organisms. This is called the “genetic code”.
3. DNA’s information content gets processed in protein synthesis. A sequence of DNA is copied as RNA, which preserves the information content. The RNA is then used to generate a protein – a string of amino acids – by the cellular machinery that exists for this purpose. So, data is input to the process, and a protein is the output.
From this, we can conclude that DNA holds data in the same sense that the solid state drive on my PC holds data.
Genotype vs. Phenotype
The “genotype” is the entire set of genetic information (data) that an organism possesses. The “phenotype” is the actual individual organism as it appears. Think of the genotype as the recipe for an omelette and the phenotype as the omelette on your plate.
Which brings us to popular conceptions of DNA. It is now quite common for organizations to say “It’s in our DNA”, which is somewhat weird as organizations do not have DNA. This seems to reflect a popular belief that people are wholly or largely the product of their DNA – the product of their data.
However justified of not this belief is, it seems to be widely held. Indeed, the popular conception of DNA elevates it to the level of magic pixie dust. There are countless science fiction stories that revolve around DNA, with DNA producing rather outlandish scenarios. In the realm of reality, DNA is used to establish identity in legal matters, and establish a predisposition to particular diseases. Other popular beliefs exist, some of them quite toxic, such as DNA being responsible for determining intelligence levels.
But Is It So?
If we accept that DNA is data, then we can perhaps get a better appreciation of it by looking at it from a data perspective.
Data exists to be processed. The cellular machinery that processes DNA is incredibly complex. Once the outputs are produced, they enter into metabolic pathways that are also complex and eventually lead to a final end state. Even this is a gross simplification as there are yet other biological processes that are involved.
Data is similar. Think of a payroll system that begins with data about an employee which gets processed by the system to finally end up with funds deposited in the employee’s bank account, as well a number of deductions that get sent to other bank accounts. It is not the data about the employee alone that is solely responsible for what happens.
Similarly, it is not as if we have a gene for “X” and “X” just happens to us. There must be specific metabolic pathways (and more) to actually produce “X”. I can add new data elements to my dataset for the employee payroll, but nothing will be done with them unless I add programming logic to process them.
How Human Is DNA?
Some data is generic in the sense that it is widely used in many different applications. A company will have many uses for its customer data and product data. This seems to have a parallel in DNA. Humans have about 30,000 genes that code for proteins, as do mice, but only about 300 of these are uniquely human – the rest (about 29,700) are just the same as the mice have.
However, there is a lot of controversy around this point. The amount of DNA in the human genome that codes for proteins is only a small portion of the DNA in the human genome. The remaining “junk” DNA does seem to have some functionality, like regulating gene expression, but there is not as clear a picture as for protein coding genes.
But maybe DNA is old hat. Dr Michael Levin’s research team at Tufts University has found that the electrical fields of cells may be just as important. These electrical fields form something like collective cellular networks which exhibit a form of proto cognition - a distributed intelligence that allows tissues to make decisions about shape, repair, and function. It is difficult to see how this could function without data, but any data involved almost certainly cannot be in DNA.
So, Are You Data?
Yes, we do have a data component – DNA – but it is only part of what makes each one of us what we are. And it may be rather a mundane component, only responsible for enzymes and structural proteins. In the end, nothing can be just data because data by itself is simply inert.