How Much Information Can a Small Image Contain?
The CIFAR-10 and CIFAR-100 (Canadian Institute For Advanced Research) are two classic datasets for training and testing image recognition algorithms. They involve labelled images in 10 and 100 categories respectively, subsets of the 80 Million Tiny Images dataset. The Each of the images is 32x32 pixels with 3 colour channels - truly tiny!
Despite being tiny these images have the potential to contain a lot of information. An interesting question is how many different ways could we encode a pattern within these images? Each of the images is 32x32 pixels with 3 colour channels (Red, Blue, Green) taking 256 values (8 bits) in magnitude.
Let's simplify that a little and consider black and white images. If we have a single pixel message (i.e. we just want to place one pixel with meaning in the image) we have 32x32=1,024 ways of doing it. For two pixels, there are 1024x1023/2 = 523,776 ways (divided by two as swapping the pixels doesn't change the meaning). Extending this out, we have:
Different message possible (involving message lengths of 0 pixels all the way to all 1,024 of the pixels in the image), using a combinatorial rule to simplify the summation. This is an absurdly large number and coincidentally the upper bound of many calculators.
Now consider that we still need to include our colour values in the equation. Each colour channel can be represented in 8 bits, allowing for:
Different colours at each pixel. Taking this into account we can see a truly gigantic number of ways to encode the information in the tiny image.
We can note that 24,576 is the number of bits to store each image. In the extreme case, we could denote a separate class to each of these different combinations (letting every slightly pinker shade of puce at position 16x23 and so on mean something distinct from every other). This in many orders higher than the estimated number of atoms in the universe (10E+80).
This is an absurdity - but it does highlight how sparse and softly-defined semantic concepts are in images. We are able to draw from a near limitless potential codings implicit regularities which allow us to group like things together in a meaningful sense - to sift through the static of randomness to find order that we can map to physical ideas like 'cats', 'dogs', 'horses', or 'planes' - or even symbolic concepts like 'love', 'prosperity', or 'approval'.
These concepts jump out at us immediately. This is remarkable when you think of it. To use an analogy it is as if we were floating in the deadest, sparsest ocean imaginable - a void of endless nothingness. And yet in amongst this emptiness, we spot the brightly-coloured fish flashing briefly from the depths. Our eyes gravitate to meaning and order - we recognise the signal between the infinite possibilities of noise (not perfectly and often seeing order where none truly exists - but much, much, much better than a random chance).
Machine learning tries to do by math what we are able to do by instinct - creating a sieve or a net that is able to catch the meaningful features and regularities as they pass by the filter. Edge detection, colour coherence, invariance to rotations, flips, and distortion - these concepts are necessary and natural to our eye - but how strange it is that an unthinking machine can recognise these concepts with a bit of statistics and a few lines of code! The most challenging datasets for computer vision eclipse the image sizes for CIFAR. The average image size for ImageNet is 469x367 pixels. Common Objects in Context (COCO) is 640 x 480 pixel. And LVIS often has images resized to a short edge length of 800 pixels. The sheer random complexity of those image sizes is stunning - the ability to navigate them is part of the magic of the field and the current state of computer vision algorithms.