Advent of Code at Wolfpack: High Performance C#

22/12/2022

.NET and C# have evolved a lot over the past few years. .NET has become fully cross-platform and has received astonishing performance improvements. This article shows off new features in .NET that are used to write very performant C#.

As the holiday season approaches, many of us at Wolfpack are getting into the festive spirit by participating in the Advent of Code. For those who are not familiar, Advent of Code is an annual online programming challenge that consists of a series of small coding puzzles released (like a true advent calendar) every day from the 1st of December until the 25th.

Participating in Advent of Code is a great way for us at Wolfpack to practice our problem-solving, coding and math skills, and to have some fun at the same time. We compare our approaches, discuss solutions and motivate each other to participate. Some developers try to see how far they can get, only programming in Excel spreadsheets, and we also have several project managers joining us each year!

Not every solution of an Advent of Code puzzle needs to be focused on performance. The goal is to learn and have fun. For day 6 this year I did focus on performance. In this blog post I’d like to highlight three ways to implement the puzzle, and explore their performance characteristics….

Day 06 of 2022

Day 6 of this year’s Advent of Code is about finding a “marker” in a string of characters. This marker is a substring of a certain size which contains unique characters (no duplicates). In part one of the puzzle, we are tasked to find a marker of four characters. Part two requires us to find a marker of fourteen distinct characters.

For example, in the following “message”:

zccnvctfwqggmdngplrshbvqnoimdavb

The marker of part one starts at index 3 (nvct) and the marker for part two starts at index 15 (gpl…imd).

If you’d like to solve this for yourself, do not read the solution below just yet!

The naive sliding window approach

The simple approach would be a sliding window. You would start at index 0 and create substrings of the required marker length. For every substring, check whether it contains duplicates.

This could be achieved using a HashSet in the following way:

public static int FindStartOfMarkerUsingHashSet(string input, int markerLength)
{
    for (int i = 0; i < input.Length - markerLength; i++)
        if (new HashSet<char>(input.Substring(i, markerLength)).Count == markerLength)
            return i;

    throw new InvalidOperationException("The input has no valid marker.");
}

This snippet is short and gets the job done, however, for every possible marker it checks, it allocates a new substring and HashSet and is quite slow. Running this code for part two of my actual input results in 1.58 MB of allocations and an average total time of 636.17 microseconds. (1 millisecond is 1000 microseconds)

Using Spans for no allocations

We can avoid allocating strings for each window we check by using the Span<T> feature which was added in C#7.2 and .NET Core 2.1.

A Span is nothing more than a reference to a particular memory address plus some length. In essence the Span is our sliding window. It tells us where in the string we are currently looking.

A faster and allocation-free implementation of this puzzle would be:.

public static int FindStartOfMarkerUsingSpan(string input, int markerLength)
{
    ReadOnlySpan<char> span = input.AsSpan();
    for (int index = 0; index < input.Length - markerLength; index++)
    {
        ReadOnlySpan<char> subSpan = span[index..(index + markerLength)];

        if (!ContainsDuplicates(subSpan))
            return index;
    }

    throw new InvalidOperationException("The input has no valid marker.");

    static bool ContainsDuplicates(ReadOnlySpan<char> input)
    {
        for (int i = 0; i < input.Length; i++)
            for (int j = i + 1; j < input.Length; j++)
                if (input[i] == input[j])
                    return true;

        return false;
    }
}

We can also benchmark this approach. The benchmark tells us that there are indeed no memory allocations happening inside our code. The time it takes for my part two input is now 19.26 microseconds! Almost 43 times faster! If this is a so-called “hot path” of a program (it is often executed), time saved can really add up and make processes run minutes if not hours faster.

Using jumps and stackalloc for optimal performance

Of course we are still not satisfied with this 43 times speedup, we can do even better.

The following solution has a different approach to the sliding window. We keep track of the last occurrence of every unique character. If we notice this character again inside our sliding window, we will slide our window to immediately after the previous occurrence of the character.

For example, if we have previously seen a h after 8 characters and we notice another h after 11 characters, we can skip 9 spaces because the next 8 windows will all contain the duplicate h. This information can be saved in an array of size 26. Each index corresponds to the letter of the alphabet in its position.

To keep the method allocation-free, we use the stackalloc keyword to allocate an array on the stack.

Here is the full snippet for the jump & stackalloc implementation:

public static int FindStartOfMarkerUsingSkipForward(string input, int payloadSize)
{
    ReadOnlySpan<char> span = input.AsSpan();
    Span<int> charOccurredAt = stackalloc int['z' - 'a' + 1];
    charOccurredAt.Fill(-1);

    int markerBase = 0;
    for (int i = payloadSize - 1; i >= 0; i--)
    {
        int currCharIndex = markerBase + i;
        char currChar = span[currCharIndex];
        int currCharAlreadyOccuredAt = charOccurredAt[currChar - 'a'];

        if (currCharAlreadyOccuredAt >= markerBase &&
            currCharIndex < currCharAlreadyOccuredAt)
        {
            markerBase = currCharIndex + 1;
            i = payloadSize;
            continue;
        }

        charOccurredAt[currChar - 'a'] = currCharIndex;
    }

    return markerBase;
}

Benchmarking this snippet reveals it runs at an average of 1.85 microseconds, which is just over ten times faster than our previous optimized version and in total 450 times faster than the HashSet approach! I’d say we have conquered today’s Advent of Code challenge.

Benchmark

There is a great NuGet package to benchmark C# code: Benchmark.net . It yields the following results:

Method	Length	Mean	Error	StdDev	Median	Ratio	Allocated
UsingHashSet	4	82.223 μs	1.6154 μs	1.6154 μs	81.112 μ	22.24	239360 B
UsingSpan	4	3,885 μs	0.0694 μ	0.0616 μs	3,890 μs	1.00	–
UsingJumps	4	1,784 μs	0.0342 μs	0.0320 μs	1,780 μs	0.46	–

UsingHashSet	14	836.168 μs	16.4871 μs	27.9964 μs	834,988 μs	42.80	1582833 B
UsingSpan	14	19,255 μs	0.2163 μs	0.2023 μs	19,252 μs	1.00	–
UsingJumps	14	1,845 μs	0.1820 μs	0.5365 μs	1,538 μs	0.08	–

Tbl 1. The results of the benchmark

This table can also be displayed as a graph. Note that the duration in microseconds is logarithmic.

Fig 1. A plot chart of the benchmark depicting the duration and memory usage (created using ChartBenchmark.net )