Storage as a Communication Problem
Updated: Mar 19, 2022
In the novel Death's End author Liu Cixin describes the difficulty of storing information that will last for a hundred million years. USB sticks and solid state drives would fail after a few decades at most - a CD not much longer. Papyrus and paper would rot or degrade after a few centuries or millenia under the best of conditions. Eventually he lands on the use of stone - massive carvings robust enough to outlast a geological age.
Cement and stone are extremely robust - Pont du Gard is an ancient Roman aqueduct built in the first century AD in the South of France (near the town of Nîmes which has a functional colosseum that still runs bullfights and Rammstein concerts to this day). One of the interesting things about these structures is seeing the graffiti left by travellers over the centuries - a bit of a "Mike wuz here" from 1598. But as robust as they are concrete and stone are impermanent. Nature can only remove from the message as they seconds pass - given enough time even the most solid stone will weather away and disappear to dust.
As Richard Hamming and many other information theorists have noted, storage is a type of communication problem. Here our storage medium transmits a signal across time. As with a radio transmitter or a telegraph array the signal is subject to degradation and noise before reaching its audience. Book pages yellow, or become water damaged, or are torn. Transistors corrode. Languages to decode the original message become lost* and software protocols move on. Statues are worn to gravel and erode down to sand.
Information from a message can only decrease as we move through time. We state this more formally by noting that a transmitted message can be treated as a Markov chain and satisfies a data processing inequality. This raises an interesting point. Take the great wealth of all knowledge that exists in the world at a point in time. From all of the books and magazines in libraries, down to the sensation that a hummingbird feels as it drinks from a flower. How can this be communicated to the future, or at least how can the important things be preserved?
Writers and historians will compress an intractable amount of world information to a compressed representation - a book, a song, a sculpture, or some other imperfect facsimile of a tiny subset of reality. And yet even these representations are lost to history. How many books lie unread on shelves, or sit in bookstore warehouses waiting to be rediscovered. As a result, how many ideas and inventions are need to be reinvented rather than recycled?
This is a necessity consequence of communication. The world contains too much information for even at an instant to ever be recorded and transmitted - even for a subset of a subset of a niche section of an obscure facet. In a one-paragraph short story Borges once wrote of an infinite map perfectly recreating the world almost down to the cartographer on its surface - how much harder would the job be of the author writing an infinite world on their pages?**
So how can we prevent important information being lost to the sands of time? Imagine that instead of a single path of information we have one that branches and rejoins further down the line to be cross-referenced or which itself branches further to generate redundancy.
In deep learning models we could call this a skip connection or branched redundancy. Information here has more than one point of failure and so is more robust to being lost. As the signal is repeated and refreshed, it is able to travel a much farther distance in time without being lost. You can see this a little in research. One of the core parts of papers is a compressed summary of the field as it has come before. This "repeater" action allows the ideas to be refreshed and transferred to the "frontier" of the current state of the field.
This does not mean that information is not lost through repetition, branching or transmission - much useful and relevant information is dropped by copying or compression, both through the interpretation of the reader and the necessities of time, space, and contemporary lens.
"They say you die twice. Once when you stop breathing and the second, a bit later on, when somebody mentions your name for the last time." Ideas will die eventually too - when they are no longer transmitted, refreshed, or are unable to be carried on to the future. An idea spreads, and grows, and eventually dies by the same dynamic as an epidemiological outbreak propagating through a population. An R naught for information.
It is a cliche that there is nothing new under the sun. If you read a top paper published in the 1950's you will likely find a result that would not look out of place in a leading journal today. So many things fall by the way as time moves on - common objects and knowledge collect below the surface like artefacts hidden from the light of day by an accumulated dust of centuries. Excavating the foundation is often as important as building the new structure above. There are many gems, bones, treasures, and trash hidden below the surface.
* There have been recent attempts to use machine learning approach for restoring extinct languages http://people.csail.mit.edu/j_luo/assets/publications/DecipherUnsegmented.pdf
** Lewis Carroll covers the same idea, wittily remarking that the map has never been rolled out as the farmers complained it would shut out the sunlight.