Where's Wally and a Swiss Cheese View of Deep Learning
Updated: Mar 29, 2022
One of the ways I like to look at Deep Neural Networks is as a series of filters. A filter removes some unwanted components from a signal by blocking its passage - a bit like a coffee paper prevents grinds from entering the final steaming morning cup of Java. ☕
To consider why this might be useful let's look at the classic puzzle book Where's Wally by Martin Handford. The goal of the game is to find Wally in a noisy and crowded scene.
The fun is this is usually pretty hard. There are so many tiny details that get in the way.
Now let's consider the dual problem of looking for everything that is not Wally.
This 'Wally filter' makes the task trivial. To paraphrase Sherlock Holmes,"Once you eliminate the not-Wally, everything that remains, no matter how improbable, must be the Wally."
So how does this relate to deep learning? Instead of one filter imagine that we instead layer a bunch of filters over each other, each removing a certain part of the original image.
Finding Wally now becomes the task of correctly layering filters. Deep Neural Networks simply allow these filters to be learnable based on multiple training examples.
This approach additionally gives a nice way of viewing why skip connections are so useful. A skip connection combines a filtered input with its unfiltered version.
Returning to Where's Wally - imagine if we accidentally learned a filter that removes Wally from the image. Our model is ruined!
But with a skip connection this information can still be passed onto the next layer unimpeded. The relevant information is preserved and can pass on to our final prediction.
This gives rise to what I like to call a 'Swiss Cheese View' of Deep Learning. The Swiss Cheese Model is intimately familiar to those who work in risk management, representing the idea that for a risk to eventuate it must pass through multiple layers of contingencies.
In deep learning it is instead information that needs to make it through multiple layers. If it is blocked by a layer then it is lost to the output of the model.
Correctly designing and stacking the layers of cheese so that our target information - and only our target information - makes it to the end is the aim of the Deep Learning game.
Shekhar (Shakes) Chandra memorably used a Wally finding algorithm to teach a course in Pattern Recognition at the University of Queensland.