Personal blog of Matthias M. Fischer

The mathematical biology of Advent of Code #6

Posted: 10th December 2021

Preface

This year, I am participating in the Advent of Code, which has been a very fun experience so far. In particular, really enjoyed solving the exercise of day 6, because it directly relates to my field of research: mathematically modelling the dynamics of populations.

In my opinion, the exercise is nicely suited to illustrate some basic ideas and principles of this kind of mathematical modelling, so I decided to make a quick blog post about it. I hope that this post might make some people become more interested in the field, or aid in understanding some of its general ideas and approaches.

The problem

Quoting from the exercise of day 6:

A massive school of glowing lanternfish swims past. [...] You should model their growth [...]

[E]ach lanternfish creates a new lanternfish once every 7 days. However, this process isn't necessarily synchronized between every lanternfish - one lanternfish might have 2 days left until it creates another lanternfish, while another might have 4. So, you can model each fish as a single number that represents the number of days until it creates a new lanternfish. Furthermore, you reason, a new lanternfish would surely need slightly longer before it's capable of producing more lanternfish: two more days for its first cycle.

A lanternfish that creates a new fish resets its timer to 6, not 7 (because 0 is included as a valid timer value). The new lanternfish starts with an internal timer of 8 and does not start counting down until the next day.

A naive solution: agent-based modelling

The exercise already suggests the first way of modelling the problem computationally. We represent every fish by an integer number which describes its internal timer state. Thus, the population of fish is given by a list of integer values. For every time step, we iterate through the complete list, reducing the timer value of every fish. A fish that has a time value of zero, however, is reseted to a timer value of 6, and another fish, with a timer value of 8, is appended to the list.

This approach of modelling is called agent-based modelling , because every modelled invidual (or "agent") is represented as an individual object in memory, which follows certain predefined rules.

The solution is implemented in the function propagate_naive() in this source file. The source file also allows you to measure the time the execution of the model takes (measured in processor ticks).

Deriving a more elegant solution -- iterated maps

Running the implementaion above, we will quickly notice that it tends to be rather time-consuming (for reasons we will discuss later). For this reason, let's try to come up with another approach that runs more efficiently.

First, note that all individuals of a given age-class behave exactly identically. Thus, let us group all individuals by their age and only keep track of the sums of individuals we have per age-class. In other words, the system can be completely describes with a list of 9 numbers. Let's call them [x₀, x₁, ..., x₈] .

Now we want to calculate the next system state based on the current one. With every time step, every individual's age counter is reduced by one. Hence, e.g. the new x₀ is exactly equal to the old x₁ , the new x₁ is exactly equal to the old x₂ , and so on and so forth. Also, for every individual in x₀ , we get one new individual in x₆ , and another one in x₈ . Thus, we have the following mapping from the old (right) to the new (left) state:

x₀ <- x₁
x₁ <- x₂
x₂ <- x₃
x₃ <- x₄
x₄ <- x₅
x₅ <- x₆
x₆ <- x₇ + x₀
x₇ <- x₈
x₈ <- x₀

Thus, we can write a function that just keeps calculating this mapping once for every time step in order to simulate the system behaviour. It's called propagate_map() in this source file.

I hope, by now my notation is suggestive enough to make you think of vectors ;-). In fact, we can write the mapping above way more tersely by calling the system state x instead of listing all the individual components [x₀, x₁, ..., x₈] every time. Let's denote the system state vector at time point t with x(t) .

Then the mapping from x(t) to x(t+1) can be conveniently written using a multiplication with a quadratic transition matrix A as follows:

x(t+1) = A x(t)

where

A is given by

[0,1,0,0,0,0,0,0,0]
[0,0,1,0,0,0,0,0,0]
[0,0,0,1,0,0,0,0,0]
[0,0,0,0,1,0,0,0,0]
[0,0,0,0,0,1,0,0,0]
[0,0,0,0,0,0,1,0,0]
[1,0,0,0,0,0,0,1,0]
[0,0,0,0,0,0,0,0,1]
[1,0,0,0,0,0,0,0,0].

For now, this is not important, but please keep this in mind -- we will return to this equation later in this post.

Comparing the two approaches

Time and space complexity

From a computational point of view, an algorithm's requirements for time and space are importants considerations. Let's begin by comparing the required time of the two approaches for one simulation step as a function of input length, i.e. the number of fish to simulate.

The naive approach needs to traverse the complete list once per simulation step, decrementing the elements one after each other. Afterwards, the newly generated fish are appended to the list. Thus, with more fish to simulate, the required time will linearly increase. In contrast, we expect the iterated map approach to run in constant time, independent of the number of fish to simulate, because the number of operations in every simulation step does not depend on the actual number of fish present.

The following diagram shows the clock ticks required by the two approaches when run on my computer as a function of input length. The source code, written in C, is available here. Note how these results follow exactly the theoretical predictions we just made before. Interestingly, the naive approach runs in approximately constant time for input lengths of one to one-hundred fish, and only then starts to really consume more time. I assume this might be related to the fact that the computer can store smaller lists in cache, which yields to significantly faster access times, and thus overall run time, however I am not entirely sure if this is the only reason.

Now, let's think about the amount of memory space we need, again as a function of the number of fish we want to simulate. It should be easy to see, that the amount of memory we need to store for the naive approach scales exactly linearly with the number of fish we want to simulate, because every fish is represented by one single integer value. In contrast, the matrix multiplication approach should take constant space, regardless of the actual counts of fish.

Mechanistic insight

Next, let's think about what the two different approaches can teach us -- or, in other words, the amount of mechanistic insight into the system we are able to get by using them. For the naive agent-based implementation, we basically run the simulation as a black box and study its emergent behaviour.

Let us thus simulate the system, starting with one single fish of age 0, keeping track of both the overall sum of fish in all age-classes, as well as the fractions of the nine different age-classes in the population. Such an exemplary simulation run of the system might look as follows:

For clarity, we zoom into the last part of the second plot:

With this observed numerical evidence, we may then make conjectures about the behaviour of the system. Reasonable conjectures based on observations might, for instance, be that the system tends to show exponential growth (first plot, note the log-scaled y-axis), or that it tends to converge to a stable composition of individuals of the different age classes (second and third plot).

It should, however, be noted that these observations are only conjectures based on numerical evidence which we have gathered by numerically simulating the system and observing its behaviour. Thus, these observation cannot be considered "proven" mathematically, and thus we cannot be entirely sure that they will always hold, regardless of how we e.g. initialise the system.

While this might sounds nitpick-y at first, it becomes an important consideration when working with more complicated systems, possibly consisting of multiple interacting species, multiple free parameters, non-linear effects, and so on. In such cases, numerical evidence might not accurately describe the model behaviour for all possible parametrisations and thus might turn out to be incomplete or even misleading .

A mathematically more rigorous approach to modelling can ameliorate these problems. If it is possible to describe our system with a set of equations (which admittedly is often not the case, see below), we can then subject these equations to a rigorous mathematical analysis and prove system properties instead of just conjecturing them. In this way, we gain a higher degree of mechanistical insight into the system we are trying to study, and our conclusions become more reliable.

Luckily for us, we have already derived a mathematically rigorous description of our system. We have seen earlier that the complete system can be described as an iterated map of the form

x(t+1) = A x(t)

where x is an 9-dimensional vector consisting of the system state (the numbers of fish in the nine differentage classes), and A is a matrix decribing the mapping from the system state x(t) at any point of time t to the system state x(t+1) at time t+1 .

Equations of these form have been the object of mathematical study since many years. Hence, we can draw from a rich set of mathematically proven results from the literate to analyse our fish model.

For instance, it is well-known that the set of eigenvalues and eigenvectors of the matrix A describe the long-term behaviour of the system. In particular, the eigenvalue of A with the largest real part is

λ₁ ≈ 1.09102

and its associated eigenvalue is

v₁ ≈ [1.09102, 1.19033, 1.29868, 1.4169, 1.54587, 1.68658, 1.8401, 0.91657, 1]^T.

Since λ₁ exceeds one, the system can be expected to be unstable, showing exponential growth, increasing by a factor of λ₁ at every time step. Doing so, it will converge to a stable composition of individuals of the different age classes with their proportions equal to the proportions of the values of v₁ .

Let us check these purely theoretical predictions using the numerical simulation we just did earlier. First, let's check if the system really tends to an overall growth with a factor of 1.09102 as predicted, by drawing this analytical prediction into our graph.

Fits like a glove! Now, let's move on to the second prediction. Let us check if the system indeed converges to a stable composition which is given by the elements of the vector v₁ . Drawing these fractions into the plot from above, we get:

Spot-on again! Thus, we are indeed able to use mathematical analysis to provably predict system behaviour, without having to rely on (potentially tedious, potentially incomplete, potentially misleading) numerical simulations

Ability to model more complicated assumptions

In the last sections we have pointed out a number of clear advantages of the "mathematically rigorous" approach to modelling. However, of course there are also disadvantages to it. Its biggest one being the fact that we can only mathematically analyse those systems, that we can put into a mathematical form in the first place . If our model e.g. relies on drawing random numbers which then influence the behaviour of the interating individuals, it very quickly becomes hard or even impossible to express the model in a "nice" equation that lends itself to a direct mathematical analysis.

Thus, while agent-based approaches do lack some of the advantages of mathematically rigorous approaches, they nonetheless do have its place and their use cases in computational biology, and are indeed a valuable tool in modern research.

Conclusion

We have seens the basics of the two "archetypical" modelling approaches of theoretical biology, which are agent-based simulations and mathematical analysis of equations. We have seen that both of these approaches have their specific advantages and disadvantages, and thus are both useful tools in the computational biologist's toolbox.

P.S.

I still need to figure out a way to show LaTeX formulas inline. For now, I have been converting them to unicode with this handy little tool, and then inserted the generated unicode into my HTML files. Not a very elegant solution, I fear...