The Unreasonable Effectiveness of Deep Learning in Decision Making Systems
In The Unreasonable Effectiveness of Mathematics in the Natural Sciences Eugene Wigner declared 'The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve'. What he referred to was the seemingly shocking ability for mathematical concepts to describe, quite accurately, phenomena far removed from any particular origin or inspiration for their development. In 2019 a paper by Blake Richards (et al.) in Nature Neuroscience which described the unreasonable effectiveness of considering neurological processes within frameworks developed in computer science for artificial neural networks. The authors proposed chemical methods of backpropagation, among other things.
In this blog I'll argue that we can go further than that: by looking at common features - the shared mathematical structure - in a variety of systems we draw inspiration to explaining how decision making occurs more generally. This is pure conjecture - and an unconvincing one before evidence - but we can note a number of similarities between how effective decision making systems such as deep neural networks (both artificial versions running on software, and biological networks comprised of wet and squishy cells) and other forms of large-scale decision making such as committees, business structures, and electoral systems. Each of these forms has evolved in response to pressures to become efficient at making decisions in a complex and uncertain world. The resulting structure share striking similarities: they are often hierarchical, comprised of networks of simple decision 'cells', abstract information between lower and higher levels, and are capable of extremely complex reactions.
For decision making systems we care about three things: the structure, the objective, and loss. The structure refers to how the system (a web of neurons, a computer circuit, a deep learning system, a company, or a polity) is organised (the number of nodes, hierarchies, the interconnections between nodes, and the strength of these interconnections). The objective is the target outcome. For economists, we naturally like to think of this in terms of a maximisation of money or the more abstract concept of utility (well-being), however it can be more general: correct prediction of a statistical forecast, classification of photos as a zebra or horse, victory in an election, or eating a piece of chocolate are each valid objectives. The loss measures the deviation of an outcome and how 'painful' this deviation is felt by the system (a business is unlikely to care about missing a target by a dollar, but may care about missing it by millions; I won't care much about whether a delicious looking chip is dark or milk chocolate, but I will be concerned if it turns out to be something else small, dark, and pellet shaped).
These three factors are enough to describe a decision making system. However we are interested in this system being able to learn and restructure in response to outcomes to become more effective at making decisions. Two additional factors enable this: the update energy, and the update time. The first refers to how much a system reacts or is updated in response to a loss (in AI terms, we'd consider these the hyperparameters of the gradient descent algorithm; in business, an organisational restructure or change in tactics). The second is how quickly the system can react to an observation, or for one 'cycle' of decision making to occur (e.g. one iteration of backpropagation, a company-wide email from the CEO or circulated decisions from executive staff). Neither of these two factors need to be constant over time or circumstances: it is possible to move and react urgently in response to an emergency, or leisurely to long-term planning.
In an active environment this decision making system faces events: at various levels in the hierarchy information is perceived, analysed and abstracted, and a decision is made. The objective is evaluated, and in response to the associated loss the system restructures. Repeated over many cycles this process should minimise the associated loss and make the structure more efficient at achieving the objective. This describes a generic process that enables - in this conjectured, over-idealised example - a way that decision making systems can learn in general, but it's worth us looking at a specific example.
Consider a company comprised of many different departments and levels: from the ground-level staff interacting directly with customers, to middle managers and executives overseeing sections, and the board. Formal and informal interconnections exist between the individuals in this company across hierarchies. This forms the structure. The objective of the company is unsurprisingly to maximise its profit, and the loss the extent to which it misses a target. The company perceives its sales and business information directly at the lowest hierarchy. This information is abstracted to a few key components and transmitted higher in the organisation (the number of sales is important, the well-being of the customer's wife is not, although this information may have come up in conversation with the employee). This information is further abstracted by middle managers and combined with the results of other sections of the company. Eventually, this reaches the board at which point the information is at a highly abstracted level (individual sales to customers may not be presented, but aggregate quarterly sales may be). The board evaluates how it has met the objective and the perceived gap between the target. From this it makes decisions, which could be to expand into a new market or increase/reduce production of stock. The organisation updates and restructures (e.g. through systematic organisational changes, or memos run through executives) and - ideally - becomes more efficient at meeting the objective.
It should be noted that what happens in the real world is not as neatly described as above: organisations, neural networks, and other decision making systems are more complex than a simple hierarchical system of abstraction. However there is something to be said that we can describe these disparate business, computer, and biological systems with common components. Where similarity occurs in the world it does not necessarily mean that they were generated through the same process. However mathematics is an unreasonably effective tool where similarities and analogies do exist: the existence of commonalities between these structures means that there are possibly insights to be gleaned by looking at them through the frameworks and innovations that have occurred out of modern deep learning research. It's something we neither understand nor deserve.