Top 6 Performance Tips when dealing with strings in C# 12 and .NET 8

Top 6 Performance Tips when dealing with strings in C# 12 and .NET 8

Small changes sometimes make a huge difference. Learn these 6 tips to improve the performance of your application just by handling strings correctly.

ยท

19 min read

Sometimes, just a minor change makes a huge difference. Maybe you won't notice it when performing the same operation a few times. Still, the improvement is significant when repeating the operation thousands of times.

In this article, we will learn five simple tricks to improve the performance of your application when dealing with strings.

Note: this article is part of C# Advent Calendar 2023, organized by Matthew D. Groves: it's maybe the only Christmas tradition I like (yes, I'm kind of a Grinch ๐Ÿ˜‚).

Benchmark structure, with dependencies

Before jumping to the benchmarks, I want to spend a few words on the tools I used for this article.

The project is a .NET 8 class library running on a laptop with an i5 processor.

Running benchmarks with BenchmarkDotNet

I'm using BenchmarkDotNet to create benchmarks for my code. BenchmarkDotNet is a library that runs your methods several times, captures some metrics, and generates a report of the executions. If you follow my blog, you might know I've used it several times - for example, in my old article "Enum.HasFlag performance with BenchmarkDotNet".

All the benchmarks I created follow the same structure:

[MemoryDiagnoser]
public class BenchmarkName()
{
    [Params(1, 5, 10)] // clearly, I won't use these values
    public int Size;

    public string[] AllStrings { get; set; }

    [IterationSetup]
    public void Setup()
    {
        AllStrings = StringArrayGenerator.Generate(Size, "hello!", "HELLO!");
    }

    [Benchmark(Baseline=true)]
    public void FirstMethod()
    {
        //omitted
    }

    [Benchmark]
    public void SecondMethod()
    {
        //omitted
    }
}

In short:

  • the class is marked with the [MemoryDiagnoser] attribute: the benchmark will retrieve info for both time and memory usage;
  • there is a property named Size with the attribute [Params]: this attribute lists the possible values for the Size property;
  • there is a method marked as [IterationSetup]: this method runs before every single execution, takes the value from the Size property, and initializes the AllStrings array;
  • the methods that are parts of the benchmark are marked with the [Benchmark] attribute.

Generating strings with Bogus

I relied on Bogus to create dummy values. This NuGet library allows you to generate realistic values for your objects with a great level of customization.

The string array generation strategy is shared across all the benchmarks, so I moved it to a static method:

 public static class StringArrayGenerator
 {
     public static string[] Generate(int size, params string[] additionalStrings)
     {
         string[] array = new string[size];
         Faker faker = new Faker();

         List<string> fixedValues = [
             string.Empty,
             "   ",
             "\n  \t",
             null
         ];

         if (additionalStrings != null)
             fixedValues.AddRange(additionalStrings);

         for (int i = 0; i < array.Length; i++)
         {
             if (Random.Shared.Next() % 4 == 0)
             {
                 array[i] = Random.Shared.GetItems<string>(fixedValues.ToArray(), 1).First();
             }
             else
             {
                 array[i] = faker.Lorem.Word();
             }
         }

         return array;
     }
 }

Here I have a default set of predefined values ([string.Empty, " ", "\n \t", null]), which can be expanded with the values coming from the additionalStrings array. These values are then placed in random positions of the array.

In most cases, though, the value of the string is defined by Bogus.

Generating plots with chartbenchmark.net

To generate the plots you will see in this article, I relied on chartbenchmark.net, a fantastic tool that transforms the output generated by BenchmarkDotNet on the console in a dynamic, customizable plot. This tool created by Carlos Villegas is available on GitHub, and it surely deserves a star!

Please note that all the plots in this article have a Log10 scale: this scale allows me to show you the performance values of all the executions in the same plot. If I used the Linear scale, you would be able to see only the biggest values.

We are ready. It's time to run some benchmarks!

Tip #1: StringBuilder is (almost always) better than String Concatenation

Let's start with a simple trick: if you need to concatenate strings, using a StringBuilder is generally more efficient than concatenating string.

[MemoryDiagnoser]
public class StringBuilderVsConcatenation()
{
    [Params(4, 100, 10_000, 100_000)]
    public int Size;

    public string[] AllStrings { get; set; }

    [IterationSetup]
    public void Setup()
    {
        AllStrings = StringArrayGenerator.Generate(Size, "hello!", "HELLO!");
    }

    [Benchmark]
    public void WithStringBuilder()
    {
        StringBuilder sb = new StringBuilder();

        foreach (string s in AllStrings)
        {
            sb.Append(s);
        }

        var finalString = sb.ToString();
    }

    [Benchmark]
    public void WithConcatenation()
    {
        string finalString = "";
        foreach (string s in AllStrings)
        {
            finalString += s;
        }
    }
}

Whenever you concatenate strings with the + sign, you create a new instance of a string. This operation takes some time and allocates memory for every operation.

On the contrary, using a StringBuilder object, you can add the strings in memory and generate the final string using a performance-wise method.

Here's the result table:

MethodSizeMeanErrorStdDevMedianRatioRatioSDAllocatedAlloc Ratio
WithStringBuilder44.891 us0.5568 us1.607 us4.750 us1.000.001016 B1.00
WithConcatenation43.130 us0.4517 us1.318 us2.800 us0.720.39776 B0.76
WithStringBuilder1007.649 us0.6596 us1.924 us7.650 us1.000.004376 B1.00
WithConcatenation10013.804 us1.1970 us3.473 us13.800 us1.960.8251192 B11.70
WithStringBuilder10000113.091 us4.2106 us12.081 us111.000 us1.000.00217200 B1.00
WithConcatenation1000074,512.259 us2,111.4213 us6,058.064 us72,593.050 us666.4391.44466990336 B2,150.05
WithStringBuilder1000001,037.523 us37.1009 us108.225 us1,012.350 us1.000.002052376 B1.00
WithConcatenation1000007,469,344.914 us69,720.9843 us61,805.837 us7,465,779.900 us7,335.08787.4446925872520 B22,864.17

Let's see it as a plot.

Beware of the scale in the diagram!: it's a Log10 scale, so you'd better have a look at the value displayed on the Y-axis.

StringBuilder vs string concatenation in C#: performance benchmark

As you can see, there is a considerable performance improvement.

There are some remarkable points:

  1. When there are just a few strings to concatenate, the + operator is more performant, both on timing and allocated memory;
  2. When you need to concatenate 100000 strings, the concatenation is ~7000 times slower than the string builder.

In conclusion, use the StringBuilder to concatenate more than 5 or 6 strings. Use the string concatenation for smaller operations.

Edit 2024-01-08: turn out that string.Concat has an overload that accepts an array of strings. string.Concat(string[]) is actually faster than using the StringBuilder. Read more this article by Robin Choffardet.

Tip #2: EndsWith(string) vs EndsWith(char): pick the right overload

One simple improvement can be made if you use StartsWith or EndsWith, passing a single character.

There are two similar overloads: one that accepts a string, and one that accepts a char.

[MemoryDiagnoser]
public class EndsWithStringVsChar()
{
    [Params(100, 1000, 10_000, 100_000, 1_000_000)]
    public int Size;

    public string[] AllStrings { get; set; }

    [IterationSetup]
    public void Setup()
    {
        AllStrings = StringArrayGenerator.Generate(Size);
    }

    [Benchmark(Baseline = true)]
    public void EndsWithChar()
    {
    foreach (string s in AllStrings)
    {
        _ = s?.EndsWith('e');
    }
    }

    [Benchmark]
    public void EndsWithString()
    {
    foreach (string s in AllStrings)
    {
        _ = s?.EndsWith("e");
    }
    }
}

We have the following results:

MethodSizeMeanErrorStdDevMedianRatio
EndsWithChar1002.189 us0.2334 us0.6771 us2.150 us1.00
EndsWithString1005.228 us0.4495 us1.2970 us5.050 us2.56
EndsWithChar100012.796 us1.2006 us3.4831 us12.200 us1.00
EndsWithString100030.434 us1.8783 us5.4492 us29.250 us2.52
EndsWithChar1000025.462 us2.0451 us5.9658 us23.950 us1.00
EndsWithString10000251.483 us18.8300 us55.2252 us262.300 us10.48
EndsWithChar100000209.776 us18.7782 us54.1793 us199.900 us1.00
EndsWithString100000826.090 us44.4127 us118.5465 us781.650 us4.14
EndsWithChar10000002,199.463 us74.4067 us217.0480 us2,190.600 us1.00
EndsWithString10000007,506.450 us190.7587 us562.4562 us7,356.250 us3.45

Again, let's generate the plot using the Log10 scale:

EndsWith(char) vs EndsWith(string) in C# performance benchmark

They appear to be almost identical, but look closely: based on this benchmark, when we have 10000, using EndsWith(string) is 10x slower than EndsWith(char).

Also, here, the duration ratio on the 1.000.000-items array is ~3.5. At first, I thought there was an error on the benchmark, but when rerunning it on the benchmark, the ratio did not change.

It looks like you have the best improvement ratio when the array has ~10.000 items.

Tip #3: IsNullOrEmpty vs IsNullOrWhitespace vs IsNullOrEmpty + Trim

As you might know, string.IsNullOrWhiteSpace performs stricter checks than string.IsNullOrEmpty.

(If you didn't know, have a look at this quick explanation of the cases covered by these methods).

Does it affect performance?

To demonstrate it, I have created three benchmarks: one for string.IsNullOrEmpty, one for string.IsNullOrWhiteSpace, and another one that lays in between: it first calls Trim() on the string, and then calls string.IsNullOrEmpty.

[MemoryDiagnoser]
public class StringEmptyBenchmark
{
    [Params(100, 1000, 10_000, 100_000, 1_000_000)]
    public int Size;

    public string[] AllStrings { get; set; }

    [IterationSetup]
    public void Setup()
    {
        AllStrings = StringArrayGenerator.Generate(Size);
    }

    [Benchmark(Baseline = true)]
    public void StringIsNullOrEmpty()
    {
        foreach (string s in AllStrings)
        {
            _ = string.IsNullOrEmpty(s);
        }
    }

    [Benchmark]
    public void StringIsNullOrEmptyWithTrim()
    {
        foreach (string s in AllStrings)
        {
            _ = string.IsNullOrEmpty(s?.Trim());
        }
    }

    [Benchmark]
    public void StringIsNullOrWhitespace()
    {
        foreach (string s in AllStrings)
        {
            _ = string.IsNullOrWhiteSpace(s);
        }
    }
}

We have the following values:

MethodSizeMeanErrorStdDevRatio
StringIsNullOrEmpty1001.723 us0.2302 us0.6715 us1.00
StringIsNullOrEmptyWithTrim1002.394 us0.3525 us1.0282 us1.67
StringIsNullOrWhitespace1002.017 us0.2289 us0.6604 us1.45
StringIsNullOrEmpty100010.885 us1.3980 us4.0781 us1.00
StringIsNullOrEmptyWithTrim100020.450 us1.9966 us5.8240 us2.13
StringIsNullOrWhitespace100013.160 us1.0851 us3.1482 us1.34
StringIsNullOrEmpty1000018.717 us1.1252 us3.2464 us1.00
StringIsNullOrEmptyWithTrim1000052.786 us1.2208 us3.5222 us2.90
StringIsNullOrWhitespace1000046.602 us1.2363 us3.4668 us2.54
StringIsNullOrEmpty100000168.232 us12.6948 us36.0129 us1.00
StringIsNullOrEmptyWithTrim100000439.744 us9.3648 us25.3182 us2.71
StringIsNullOrWhitespace100000394.310 us7.8976 us20.5270 us2.42
StringIsNullOrEmpty10000002,074.234 us64.3964 us186.8257 us1.00
StringIsNullOrEmptyWithTrim10000004,691.103 us112.2382 us327.4040 us2.28
StringIsNullOrWhitespace10000004,198.809 us83.6526 us161.1702 us2.04

As you can see from the Log10 table, the results are pretty similar:

string.IsNullOrEmpty vs string.IsNullOrWhiteSpace vs Trim in C#: performance benchmark

On average, StringIsNullOrWhitespace is ~2 times slower than StringIsNullOrEmpty.

So, what should we do? Here's my two cents:

  1. For all the data coming from the outside (passed as input to your system, received from an API call, read from the database), use string.IsNUllOrWhiteSpace: this way you can ensure that you are not receiving unexpected data;
  2. If you read data from an external API, customize your JSON deserializer to convert whitespace strings as empty values;
  3. Needless to say, choose the proper method depending on the use case. If a string like "\n \n \t" is a valid value for you, use string.IsNullOrEmpty.

Tip #4: ToUpper vs ToUpperInvariant vs ToLower vs ToLowerInvariant: they look similar, but they are not

Even though they look similar, there is a difference in terms of performance between these four methods.

[MemoryDiagnoser]
public class ToUpperVsToLower()
{
    [Params(100, 1000, 10_000, 100_000, 1_000_000)]
    public int Size;

    public string[] AllStrings { get; set; }

    [IterationSetup]
    public void Setup()
    {
        AllStrings = StringArrayGenerator.Generate(Size);
    }

    [Benchmark]
    public void WithToUpper()
    {
        foreach (string s in AllStrings)
        {
            _ = s?.ToUpper();
        }
    }

    [Benchmark]
    public void WithToUpperInvariant()
    {
        foreach (string s in AllStrings)
        {
            _ = s?.ToUpperInvariant();
        }
    }

    [Benchmark]
    public void WithToLower()
    {
        foreach (string s in AllStrings)
        {
            _ = s?.ToLower();
        }
    }

    [Benchmark]
    public void WithToLowerInvariant()
    {
        foreach (string s in AllStrings)
        {
            _ = s?.ToLowerInvariant();
        }
    }
}

What will this benchmark generate?

MethodSizeMeanErrorStdDevMedianP95Ratio
WithToUpper1009.153 us0.9720 us2.789 us8.200 us14.980 us1.57
WithToUpperInvariant1006.572 us0.5650 us1.639 us6.200 us9.400 us1.14
WithToLower1006.881 us0.5076 us1.489 us7.100 us9.220 us1.19
WithToLowerInvariant1006.143 us0.5212 us1.529 us6.100 us8.400 us1.00
WithToUpper100069.776 us9.5416 us27.833 us68.650 us108.815 us2.60
WithToUpperInvariant100051.284 us7.7945 us22.860 us38.700 us89.290 us1.85
WithToLower100049.520 us5.6085 us16.449 us48.100 us79.110 us1.85
WithToLowerInvariant100027.000 us0.7370 us2.103 us26.850 us30.375 us1.00
WithToUpper10000241.221 us4.0480 us3.588 us240.900 us246.560 us1.68
WithToUpperInvariant10000339.370 us42.4036 us125.028 us381.950 us594.760 us1.48
WithToLower10000246.861 us15.7924 us45.565 us257.250 us302.875 us1.12
WithToLowerInvariant10000143.529 us2.1542 us1.910 us143.500 us146.105 us1.00
WithToUpper1000002,165.838 us84.7013 us223.137 us2,118.900 us2,875.800 us1.66
WithToUpperInvariant1000001,885.329 us36.8408 us63.548 us1,894.500 us1,967.020 us1.41
WithToLower1000001,478.696 us23.7192 us50.547 us1,472.100 us1,571.330 us1.10
WithToLowerInvariant1000001,335.950 us18.2716 us35.203 us1,330.100 us1,404.175 us1.00
WithToUpper100000020,936.247 us414.7538 us1,163.014 us20,905.150 us22,928.350 us1.64
WithToUpperInvariant100000019,056.983 us368.7473 us287.894 us19,085.400 us19,422.880 us1.41
WithToLower100000014,266.714 us204.2906 us181.098 us14,236.500 us14,593.035 us1.06
WithToLowerInvariant100000013,464.127 us266.7547 us327.599 us13,511.450 us13,926.495 us1.00

Let's see it as the usual Log10 plot:

ToUpper vs ToLower comparison in C#: performance benchmark

We can notice a few points:

  1. The ToUpper family is generally slower than the ToLower family;
  2. The Invariant family is faster than the non-Invariant one; we will see more below;

So, if you have to normalize strings using the same casing, ToLowerInvariant is the best choice.

Tip #5: OrdinalIgnoreCase vs InvariantCultureIgnoreCase: logically (almost) equivalent, but with different performance

Comparing strings is trivial: the string.Compare method is all you need.

There are several modes to compare strings: you can specify the comparison rules by setting the comparisonType parameter, which accepts a StringComparison value.

[MemoryDiagnoser]
public class StringCompareOrdinalVsInvariant()
{
    [Params(100, 1000, 10_000, 100_000, 1_000_000)]
    public int Size;

    public string[] AllStrings { get; set; }

    [IterationSetup]
    public void Setup()
    {
        AllStrings = StringArrayGenerator.Generate(Size, "hello!", "HELLO!");
    }

    [Benchmark(Baseline = true)]
    public void WithOrdinalIgnoreCase()
    {
        foreach (string s in AllStrings)
        {
            _ = string.Equals(s, "hello!", StringComparison.OrdinalIgnoreCase);
        }
    }

    [Benchmark]
    public void WithInvariantCultureIgnoreCase()
    {
        foreach (string s in AllStrings)
        {
            _ = string.Equals(s, "hello!", StringComparison.InvariantCultureIgnoreCase);
        }
    }
}

Let's see the results:

MethodSizeMeanErrorStdDevRatio
WithOrdinalIgnoreCase1002.380 us0.2856 us0.8420 us1.00
WithInvariantCultureIgnoreCase1007.974 us0.7817 us2.3049 us3.68
WithOrdinalIgnoreCase100011.316 us0.9170 us2.6603 us1.00
WithInvariantCultureIgnoreCase100035.265 us1.5455 us4.4591 us3.26
WithOrdinalIgnoreCase1000020.262 us1.1801 us3.3668 us1.00
WithInvariantCultureIgnoreCase10000225.892 us4.4945 us12.5289 us11.41
WithOrdinalIgnoreCase100000148.270 us11.3234 us32.8514 us1.00
WithInvariantCultureIgnoreCase1000001,811.144 us35.9101 us64.7533 us12.62
WithOrdinalIgnoreCase10000002,050.894 us59.5966 us173.8460 us1.00
WithInvariantCultureIgnoreCase100000018,138.063 us360.1967 us986.0327 us8.87

As you can see, there's a HUGE difference between Ordinal and Invariant.

When dealing with 100.000 items, StringComparison.InvariantCultureIgnoreCase is 12 times slower than StringComparison.OrdinalIgnoreCase!

![Ordinal vs InvariantCulture comparison in C#: performance benchmarkhttps://www.code4it.dev/blog/top-6..)

Why? Also, why should we use one instead of the other?

Have a look at this code snippet:

var s1 = "Aa";
var s2 = "A" + new string('\u0000', 3) + "a";

string.Equals(s1, s2, StringComparison.InvariantCultureIgnoreCase); //True
string.Equals(s1, s2, StringComparison.OrdinalIgnoreCase); //False

As you can see, s1 and s2 represent equivalent, but not equal, strings. We can then deduce that OrdinalIgnoreCase checks for the exact values of the characters, while InvariantCultureIgnoreCase checks the string's "meaning".

So, in most cases, you might want to use OrdinalIgnoreCase (as always, it depends on your use case!)

Tip #6: Newtonsoft vs System.Text.Json: it's a matter of memory allocation, not time

For the last benchmark, I created the exact same model used as an example in the official documentation.

This benchmark aims to see which JSON serialization library is faster: Newtonsoft or System.Text.Json?

[MemoryDiagnoser]
public class JsonSerializerComparison
{
    [Params(100, 10_000, 1_000_000)]
    public int Size;
    List<User?> Users { get; set; }

    [IterationSetup]
    public void Setup()
    {
        Users = UsersCreator.GenerateUsers(Size);
    }

    [Benchmark(Baseline = true)]
    public void WithJson()
    {
        foreach (User? user in Users)
        {
            var asString = System.Text.Json.JsonSerializer.Serialize(user);

            _ = System.Text.Json.JsonSerializer.Deserialize<User?>(asString);
        }
    }

    [Benchmark]
    public void WithNewtonsoft()
    {
        foreach (User? user in Users)
        {
            string asString = Newtonsoft.Json.JsonConvert.SerializeObject(user);
            _ = Newtonsoft.Json.JsonConvert.DeserializeObject<User?>(asString);
        }
    }
}

As you might know, the .NET team has added lots of performance improvements to the JSON Serialization functionalities, and you can really see the difference!

MethodSizeMeanErrorStdDevMedianRatioRatioSDGen0Gen1AllocatedAlloc Ratio
WithJson1002.063 ms0.1409 ms0.3927 ms1.924 ms1.000.00--292.87 KB1.00
WithNewtonsoft1004.452 ms0.1185 ms0.3243 ms4.391 ms2.210.39--882.71 KB3.01
WithJson1000044.237 ms0.8787 ms1.3936 ms43.873 ms1.000.004000.00001000.000029374.98 KB1.00
WithNewtonsoft1000078.661 ms1.3542 ms2.6090 ms78.865 ms1.770.0814000.00001000.000088440.99 KB3.01
WithJson10000004,233.583 ms82.5804 ms113.0369 ms4,202.359 ms1.000.00484000.00001000.00002965741.56 KB1.00
WithNewtonsoft10000005,260.680 ms101.6941 ms108.8116 ms5,219.955 ms1.240.041448000.00001000.00008872031.8 KB2.99

As you can see, Newtonsoft is 2x slower than System.Text.Json, and it allocates 3x the memory compared with the other library.

So, well, if you don't use library-specific functionalities, I suggest you replace Newtonsoft with System.Text.Json.

Wrapping up

In this article, we learned that even tiny changes can make a difference in the long run.

Let's recap some:

  1. Using StringBuilder is generally WAY faster than using string concatenation unless you need to concatenate 2 to 4 strings;
  2. Sometimes, the difference is not about execution time but memory usage;
  3. EndsWith and StartsWith perform better if you look for a char instead of a string. If you think of it, it totally makes sense!
  4. More often than not, string.IsNullOrWhiteSpace performs better checks than string.IsNullOrEmpty; however, there is a huge difference in terms of performance, so you should pick the correct method depending on the usage;
  5. ToUpper and ToLower look similar; however, ToLower is quite faster than ToUpper;
  6. Ordinal and Invariant comparison return the same value for almost every input; but Ordinal is faster than Invariant;
  7. Newtonsoft performs similarly to System.Text.Json, but it allocates way more memory.

This article first appeared on Code4IT ๐Ÿง

My suggestion is always the same: take your time to explore the possibilities! Toy with your code, try to break it, benchmark it. You'll find interesting takes!

I hope you enjoyed this article! Let's keep in touch on Twitter or LinkedIn! ๐Ÿคœ๐Ÿค›

Happy coding!

๐Ÿง

ย