I've decided to a release a few projects that I've worked on over the last year and a half. Most of these are better described as 'assignments', but project just sounds fancier. While none of these are particularly ground-breaking, they will hopefully serve as an introduction or inspiration to explore some fascinating fields of study.
Deep Reinforcement Learning for Partially-Observable Markov Decision Processes
This project served as my thesis. Partially-Observable Markov Decision Processes (POMDPs) are a broad class of decision problems in which the agent is missing some information about the environment. The definition is broad enough to cover many decision problems humans and businesses need to deal with, and it's a notoriously difficult problem for artificial agents to solve for even toy problems.
Deep Reinforcement Learning uses a neural network to approximate the value of taking actions given some observed state of the environment. Trial and error exploration reinforces rewarding actions and disincentivises negative ones - much like humans learn that eating a 1.5M Scoville Carolina Reaper chilli is a bad idea, while a chocolate éclair is a tasty one.
For this project I trialled a number of Deep Reinforcement Learning algorithms against toy problems. Performance was inconsistent (there's a tendency for the model to choose safe, risk averse responses), but in some instances interesting and complex behaviour was learned (beating my own scores for simple problems!).
Reviewing the Vehicle Routing Problem with Stochastic Requests (VRPSR)
I completed this project with David Banh. The Vehicle Routing Problem involves planning efficient delivery paths for a vehicle - similar to the problem faced by food delivery companies such as Deliveroo or Uber Eats. Stochastic requests means that new potential waypoints can arise randomly. The goal is to service as many customers as possible during a limited amount of time. Once customer has been accepted they must be served in the tour. The question is whether the agent should accept distant customers (locking up travel time) or hope for better fares later on.
The two gifs below show the general idea. On the left we have a tour made up of stochastic request. The potential tour starts off as loose with only a few customers to be serviced. As new requests are added to the potential tour, it becomes more constrained. On the right we have a heatmap of the customer requests as they are received showing the 'hotspots' of customer activity. Our solution methods investigated a number of heuristic and simulation based solvers.
Algorithmic Music Composition
I completed this project with Taotao Pan and Jingye Liu. Music has long been regarded as one of the most human endeavours. It is considered an art, respected not only for the technical skill involved in composition but also the ability to communicate and emotionally move an audience. But music is also a physical signal: it is a wave of energy carried through the air, from the instrument, and onto the listener’s ear. It can be decomposed through a Fourier transform into its constituent elements - be converted to a form that a machine can electronically record, represent, and reproduce. Like speech, it is comprised of semantic elements and regularities - rhythm, meter, patterns that recur and modify and transmit new meaning with each new variation. And because it can be represented with patterns it represents a tantalising challenge for Artificial Intelligence and Deep Learning researchers: to see whether a machine, comprised at its core by a series of electric switches shifting between one and zero, can reproduce an act thought to lie at the heart of how humanity sees itself - whether a machine can compose and whether what is created can be said to be an artistic creation.
In this project we investigated a number of algorithmic models, including Long-Short Term Memory (LSTM), Generative Adversarial Networks (GANs), and Transformer models. This was a very fun project and I highly recommend listening to the computer generated pieces below.