The Ancient Rite of Passage: Writing your own Malloc

Posted: 6th March 2022

Introduction

Recently, I've gotten back into C programming more and more again. To understand more of what is happening "under the hood", I decided to try my hands at implementing my own memory manager -- that is, writing my own versions of malloc(), calloc(), realloc(), and free() from scratch. You can find my implementation in this github repository. Much has been said and written about this topic in the past, so instead of repeating all of the basics I will just link to this excellent tutorial which I have used as a starting point for my own version. If you are not really familiar with the topic, you might want to read this first :).

However, I think (!) I have been able to improve upon the tutorial version a bit, and I have also extensively annotated it with comments. For this reason, I think it is worth sharing. In this blog post, I will briefly describe my changes to the original versions and my rationale behind it.

A Double-Linked List for increased Performance

A very straight-forward change, directly from Data Structures 101: The tutorial implemented the global list of memory blocks as a single-linked list, where the metadata structure of every block keeps a pointer to the metadata of its successor. In this way, finding a block's successor can be done in constant time; however to find the predecessor of a block X, we must start from the beginning of the list and iterate through every block until we find the block with X as successor. Hence, finding a block's predecessor requires linear time if we use the version implemented in the tutorial.

Fixing this to also get constant time for jumping a block "backwards" is easy -- just add a pointer *prev to the metadata of every block that points "backwards" to the predecessor of the respective block. Thus, our new block metadata struct now looks as follows:

struct metadata {
  size_t size;
  int free;
  struct metadata* next;
  struct metadata* prev;
};

Simpifying the Program Logic by adding a Pointer to the Tail

In the tutorial, a global pointer HEAD points to the beginning of the global list of memory blocks; however we don't keep track of the list's last block. This makes the function struct block_meta *find_free_block(struct block_meta **last, size_t size) of the tutorial a bit complicated: It gets called with a NULL-initialised parameter last, which is a pointer to a pointer to a metadata struct. During the function's loop, this double-pointer is used to store and afterwards return a pointer to the end of the list. Notably, this is done anew with every new call of the function.

Admittedly, I needed a few minutes to completely understand what was going on there, and I found it a bit complicated and not really necessary. For this reason, I decided to add a second global pointer, TAIL, which always points to the last block of the global list. This should make the code more efficient and, at least equally as important, easier to understand.

Adding a Best-Fit Allocator

In the tutorial, whenever multiple different sufficiently-sized free blocks of memory can be "recycled" for a new allocation, the allocator will simply return the first block that's large enough. (Note that the while loop in find_free_block will instantly terminate as soon as a suitable block has been found.) This has a big potential of being wasteful with memory; thus I decided to implement an alternative allocator which instead returns the block that fits the best, i.e. the block whose size matches the amount of memory we want to allocate the best.

Thus, we now have a #define ALLOC_BEST_FIT switch which allows us to select which of the two allocators will be used. The best-fit allocator itself is pretty straight-forward:

struct metadata* find_best_free_block(size_t size) {
    // Starting at the HEAD of the list
    struct metadata* current = HEAD;

    // Best block found so far is NULL
    struct metadata* best = NULL;
    size_t best_sizediff = SIZE_MAX;

    // Iterate through the complete list
    while (current) {
        // Found a better suitable block?
        if (current->free && current->size >= size && current->size - size < best_sizediff) {
            best = current;
            best_sizediff = current->size - size;
        }
        current = current->next;
    }    
    return best;
}

We basically comb through the entire list of memory blocks, and whenever we find a sufficiently large free block, we check whether it's a better fit than our previous best fit. If so, we save it, and then we continue our search, until we reach the end of the list. Finally, we return the best fit we were able to find.

As a side note, this better use of memory comes with an increased amount of time required to iterate through the entire list, instead of terminating our search as soon as we find the first suitable block. Yet another instance of the famous space-time tradeoff of computer size.

Removing a non-required [?] sbrk(0)

In the tutorial, the function request_space is responsible for getting new space from the OS if the list of memory blocks does not contain any suitable free block of sufficient size. This is done as follows:

struct block_meta *request_space(struct block_meta* last, size_t size) {
  struct block_meta *block;
  block = sbrk(0);
  void *request = sbrk(size + META_SIZE);
  assert((void*)block == request); // Not thread safe.
  if (request == (void*) -1) {
    return NULL; // sbrk failed.
  }

  // Some more stuff...

  return block;
}

In other words, we first call sbrk(0), which gives us a pointer to the current border of the heap. We store this pointer in block. Next, we call sbrk again, this time with the size of memory we want to allocate as a parameter. This will move the border of the heap accordingly and will return a pointer to the previous heap border -- i.e. it will give us the exact same pointer again, which we now store in request. For reasons of thread safety, we then compare these two pointers; then, we check whether we even were able to obtain memory in the first place, and if so, we then proceed to set up the block and finally return it.

As far as I can tell (and please correct me if I am wrong here, because I may very well be missing the obvious), calling sbrk twice is not required, and we can entirely leave out the first call without any problems. Which is exactly what I did in my implementation -- which until now has been working without causing any problems.

Adding Block Splitting, Block Merging, and Overflow Checking

These things have been left as exercises to the reader by the tutorial:

Whenever we "recycle" a free memory block from the list that's "too large", we split it in order to not waste memory.
Whenever we free a block, we check if its predecessor and/or its successor are free. If so, we merge them together in order to not fragment our memory too much.
Upon allocating memory using calloc(), we check (using a simple integer division) whether the number of bytes can be stored without causing overflow.

The logic behind all of this is basically just a bunch of if statements and a bit of pointer wrangling -- if you are curious about that, just check out my implementation in this github repository.

Adding a Convenience Function for Printing the List

Finally, I also wrote a little helper function print_list, which prints out the current state of the global list of memory blocks. This was of course paticularly useful for debugging; however, I think it's also very helpful for "peeking under the hood" while playing around with the code and trying to understand it. The logic behind the function is pretty simple -- we iterate through all of the blocks of the list, and print the block's information to screen. Some exemplary output:

------------------------------------------------------------------------
Adress               Previous             Size    Free   Next                
------------------------------------------------------------------------
HEAD is 94811827032064
94811827032064       0                    300     1      94811827032396      
94811827032396       94811827032064       200     0      94811827032628      
94811827032628       94811827032396       20      0      94811827032680      
94811827032680       94811827032628       16      1      0                   
TAIL is 94811827032680
------------------------------------------------------------------------

Conclusions

This has been a lot of fun -- I definitely enjoyed writing some more stuff in C again, and it also increased my understanding of how exactly memory management works under the hood. I can definitely recommend undertaking this Ancient Rite of Passage yourself ;).

Personal blog of Matthias M. Fischer

Home

About

Portfolio

Contact