Personal blog of Matthias M. Fischer


Yet Another Covid Dashboard, Part 2: Caching

Posted: 2nd January 2022

Preface

In the previous post, we have written a web app in Python using the flask library. This app loads data on the Covid pandemic from different web sources, processes them, generates plots, and serves those as a HTML dashboard to the web. I would love to deploy this app to the web; however, before doing so, an imporant point deservers some consideration:

Until now, every time we access the dashboard, all the data are completely fetched anew, which takes approximately two seconds on my machine. Afterwards, the plots are generated, which takes approximately another second. Thus, the loading time of the dashboard is suboptimal.

Additionally, we send out way more requests to the web sources than required, possibly risking getting blocked if the dashboard gets to many accesses. This might happen either because of genuine interest of sufficiently many people, or because someone with bad intent reloads the site too often.

An easy solution to these problems is caching, which simply means storing computed data in memory to be able to serve future requests faster. In our case, the data we fetch from the web is updated (depending on the source) daily or weekly. Thus, we can safely assume that storing the data and generated plots for half an hour in cache will be a good compromise between serving current data vs. limiting the amount of queries and computation time.

In this post, I'll share how to implement such a caching mechanism in Python, which turns out to be extremely easy to do by using an already existing library.

The 'cachetools' Library

Instead of implementing our own cache, we will make use of the cachetools library. It provides quite a lot of different cache types which are all worth checking out; however, we here are only interested in the TTLCache. This type of cache associates a time-to-live (TTL) value with each cached entry, and returns cached elements if they are still "alive." Entries that have exceeded their lifetime, however, will be removed and fetched anew upon the next call. This is exactly what we need for our use case.

The library also provides a nice decorator, which greatly reduces the amount of code we have to write. In fact, we only have to add exactly two lines of code (one of them being the import statement) to make our assemble_dashboard() function cached. The decorator specifies the type of cache, its maximum size (here exactly one element, as we only want to keep exactly one copy at a time), and the entries' time to live in seconds. We here go for 30*60=1800 seconds, as described above. Thus, we write:

from cachetools import cached, TTLCache

@cached(cache=TTLCache(maxsize=1, ttl=30 * 60))
def assemble_dashboard() -> str:
    # loading data from web
    ...
    # generating plots
    ...
    # converting plots
    ...
    # assembling HTML
    ...
    return html_string

In this way, the complete dashboard is generated at maximum once every half an hour; otherwise the previous HTML string is returned from memory, which leads to a quasi instantaneous response in the order of microseconds:

(Results of three independent replicates on my machine. Note the log-scaling of the y-axis.)

Conclusions and Outlook

With the caching problem solved that easily, we can now think about deploying the dashboard to the web. I am currently examining different options of how exactly this might be done. At this point of time, I cannot say for sure how long the whole process will take, especially given the fact that it's the very first time I am attempting to do anything like this. Stay tuned!