The Fractal Dimension of the Lorenz Attractor
6th July 2022Introduction
In my last post on attractor reconstruction, I have talked about the Lorenz system and how to reconstruct its so-called strange attractor from partial observations by using delay coordinates. In the context of Taken's Theorem, I mentioned the box counting dimension, without going into anz more details. In this post, I want to expaned on that topic by first giving a short theoretical introduction to the topic, and afterwards by numerically calculating the box-counting dimension of the Lorenz attractor myself.
The Definition of the Box-Counting Dimension
The box counting dimension is one way (out of quite a few different ones) of quantifying the fractal dimension of a set \(S\), such as the attractor of a dynamical system. Paraphrasing Wikipedia, we can calculate this dimension for \(S\) by imagining that \(S\) lies on an evenly spaced grid with distances \(\varepsilon\), and by counting how many boxes are required to cover the set. Put differently, we count the number of boxes the set "touches." Then, the box-counting dimension is calculated by seeing how this number of required boxes changes as we make the grid finer, i.e. as \(\varepsilon \to 0\). Let \(N(\varepsilon)\) denote the number of boxes of side length \(\varepsilon\) required to cover \(S\). Then, the box-counting dimension is defined as:
\(D(S) := \lim\limits_{\varepsilon \to 0} \dfrac{\log N(\varepsilon)}{\log (1/\varepsilon)}\)
One may easily see that this is equivalent to calculating the (negative) slope in a plot of \(\log N(\varepsilon)\) over \(\log \varepsilon\), which is the numerical method we will use in this article:
\(D(S) := - \lim\limits_{\varepsilon \to 0} \dfrac{\log N(\varepsilon)}{\log(\varepsilon)}\)
Numerical Estimation of the Box-Counting Dimension of the Lorenz Attractor
Numerically Solving the System
First, we need a solution curve of the Lorenz system. Since there is no analytical solution available (owing to the non-linearity of the system), we need to rely on a numerical solution. I first tried using Python's scipy library for this, however I found the performance not satisfactory. Thus, I quickly hacked together a classical fourth order Runge-Kutta solver in plain C (see here for the source code). I opted for a fixed step size of \( \Delta t = 10^{-5}\), let the system "burn in" for 100 units of time (to avoid any transient behaviour) and then simulated the system for additional 1000 units of time. This generated approximately 4 GB of raw data, written into a simple CSV file (which I cannot share via github due to its size limitation :'D).
When plotted, the obtained solution after "burn-in" looks like this:
Counting Boxes
Next, we need to overlay a grid with systematically altered grid sizes \(\varepsilon\) over this solution curve and count how many of the grid boxes the solution curves "touches". To do this, we iterate through every entry of the system's numerical solution and mark the grid box it falls into as "touched". To uniquely identify the box a given point falls into, we first pick a shared reference point "smaller" than all solution points of the system (I used \((-50/-50/-50)\), and integer-divide the x-, y- and z-distance of the point in question by \(\varepsilon\). This gives us the coordinates of the box the given point falls into.
To keep track of all "visited" boxes, I used Python (see here for the source code) and its very space-efficient 'set' data type. (In fact, I first wanted to do the box-counting in C by just allocating an array of bools, where every array element represents one possible box, but I quickly ran out of memory, and I did not want to come up with something more clever myself.)
The Result
Let's plot the number of visited boxes \(N(\varepsilon)\) over the grid size \(\varepsilon\). We do so by using a log-log plot in order to take into account the varying magnitudes of the data, and, more importantly, because we are interested in the relationship between the log-transformed values anyway (see above). In the following plot, this is shown by the blue dots. We also calculate the local slope of the curve in orange, approximated via finite differences between adjacent points.
First, we observe that for sufficiently large grid sizes, the curve formed by the blue points slows down. This makes sense because the attractor has a finite size, and thus there exists a grid size at which the entire attractor will be contained in one single box. Luckily, this does not matter to us, because we are interested in the slope of the curve as the grid size tends to zero, i.e. we care only about the behaviour of the blue dots at the left-hand side of the plot.
Another effect, however, is at play there: As the box size tends to zero, the number of visited boxes tends to a constant -- namely, the exact number of simulated data points of the solution curve of the system: \(1000\) units of time divided by a step size of \(10^{-5}\), i.e. \(10^8\). This shows a numerical problem of our approach: Because our numerical solution of the system yields only a sequence of discrete points, instead of a continuous curve, there will always exist a critical grid size, where each of these points will start occupy its own grid box. In other words, \(N(\varepsilon)\) is bounded from above.
To solve this problem, we could implement a (possibly simple, linear) interpolation between every pair of subsequent points of our numerical solution. Alternatively, we could dynamically decrease the step size of our numerical solver whenever we decrease the grid size \(\varepsilon\). Finally, we can also accept the problem and work around it -- which is what I decided to do here.
In particular, I decided to simply plot all local slopes (orange dots and dashed line in the plots above), and to select the maximum value -- which here is \(D \approx 1.974\). In my last post, I claimed that the dimension of the Lorenz attractor should be \(D \approx 2.1\) based on something I read in some internet forum without checking the literature myself -- however, after some more reading I found this paper from 1983, which estimated \(D \approx 1.98 \pm 0.02\), which nicely agrees with my result.
Conclusions
Implementing the estimation myself was interesting and surprisingly educational for me, as it made me aware of some "real-life implementation issues" I didn't even think of while only reading the theoretical definition of the box-counting dimension. In particular, the problem of a discrete pointwise simulation of the system and how this bounds the counts of visited boxes (and thus causes problems in estimating the slope of the curve) did not occur to me in the first place. In addition, finding an implementation of every step of the procedure (numerical solution, actual box-counting) that shows acceptable time and space requirements was fun -- especially as it gave my an opportunity to play around in C again and to hack together a little Runge-Kutta solver :-).