Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce AOT binary size #3091

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Reduce AOT binary size #3091

wants to merge 3 commits into from

Conversation

edwardneal
Copy link
Contributor

Contributes to #1942.

For win-x64, the pre-PR binary size is 21MB. Sizoscope indicates that these changes trim 0.9MB (~4.25%) from the final binary.

This has two small changes and one which requires explanation:

  1. Replace a Hashtable in the connection string parsing with a generic Dictionary.
  2. Remove a reference to Process.GetCurrentProcess, replacing it with the new Environment.ProcessId property.

In both cases, I'm just removing references to types which aren't used frequently. The Azure dependency drags the Process class in, but when that dependency's refactored out there'll be nothing else using it in the core library.

The third change removes the only reference to DataContractJsonSerializer, which is used to get the signing certificates for the attestation authority. I've built a standalone server running the Host Guardian Services role, and confirmed that the API endpoint returns the certificate data in text format as a JSON array. Example output from this API endpoint is thus the text string [ 128, 0, 35, 0, ... ].

JsonSerializer only returns a byte[] when passed a base64-encoded string, so I've returned a List instead.

Enable partial trimming of System.Diagnostics and one Hashtable
@Wraith2
Copy link
Contributor

Wraith2 commented Jan 1, 2025

Is System.Text.Json an unconditional dependency on all platforms already? Can it not deseiralize to a byte[] and thus requires the change from byte[] to List<byte> ?

@edwardneal
Copy link
Contributor Author

System.Text.Json is already required by the SqlJson type, although the Azure libraries also require it. I think it's also built in to newer versions of .NET, so we're not dragging a new dependency in for this.

I'm not completely happy with the switch to List either. S.T.J only successfully deserialises a base64 string to a byte array, a JSON array of numbers fails. I tested a couple of different approaches:

  1. Deserialising to an array of ints/shorts and copying to a new byte array.
  2. Write a custom JsonConverter.

A single certificate could be 2-4kb and there could be multiple of these, so I'm trying to avoid any data copies. This (along with the associated memory usage) rules approach 1 out. Approach 2 turned into something pretty similar: at the start of the JSON array we don't know how many elements it has, so we'd have to add to a List (and perform the associated copy in ToArray when returning the final value.)

@mdaigle mdaigle added this to the 7.0-preview1 milestone Jan 7, 2025
@mdaigle mdaigle requested a review from a team January 7, 2025 18:43
@Wraith2
Copy link
Contributor

Wraith2 commented Jan 7, 2025

Try this.

                Utf8JsonReader reader = new Utf8JsonReader(bytes);
                using (MemoryStream stream = new MemoryStream())
                {
                    bool writing = false;
                    while (reader.Read())
                    {
                        switch (reader.TokenType)
                        {
                            case JsonTokenType.StartArray:
                                writing = true;
                                break;
                            case JsonTokenType.EndArray:
                                writing = false;
                                break;
                            case JsonTokenType.Number:
                                if (reader.TryGetByte(out byte value))
                                {
                                    stream.WriteByte(value);
                                }
                                else
                                {
                                    throw new Exception();
                                }
                                break;
                            default:
                                break;
                        }
                    }

                    if (!writing)
                    {
                        stjBytes = stream.ToArray();
                    }
                }

@edwardneal
Copy link
Contributor Author

It'd definitely be more idiomatic, but the closing ToArray will always return a copy of the buffer, which is what I'm trying to avoid. I could call GetBuffer and convert the resultant ArraySegment to a Span, but this implies returning a MemoryStream (which is a little neater than returning a List, but still not ideal because we're tied to a single implementation of Stream. At a functional level, we're still returning a wrapper for a dynamically-sized array.)

IMO, part of the reason this feels messy is because MakeRequest isn't a natural boundary between the base class and the child implementation. What do you think about renaming it to GetSigningCertificateCore and having it return the X509Certificate2Collection instead? The base class is internal, so we've got some freedom of movement for that type of change.

This would make the child implementation responsible for converting a user-provided attestation URL into a "sensible" value, and for getting the set of signing certificates for a given attestation URL. The base class would just orchestrate those two implementations and cache the result.

@Wraith2
Copy link
Contributor

Wraith2 commented Jan 8, 2025

Does this happen often enough and waste enough memory that it's worth doing crazy things to avoid an array allocation of a few KB? As far as I can see the json parser affects a very uncommon scenario around login security so it isn't running frequently (in comparison to tds packet allocations for example) and it isn't something that people will hit multiple times per second (like reading from tds packets).

I'd say live with the relatively small and uncommon allocation, save your performance effort for places where it will affect the majority unless someone produces a specific use-case where this path is a problem.

@edwardneal
Copy link
Contributor Author

Thanks, and I agree. The performance comparison sparked a bit of curiosity though, so I benchmarked it.

  • Custom deserialization logic yields about a 12-15% reduction in execution time (when compared to the simple JsonSerializer method call) but triples memory usage. I imagine that's because we're reading the stream into memory first, and there's probably space to deal with that
  • Pre-sizing the MemoryStream or the List used in custom deserialization logic to the normal size of a certificate reduces execution time by a further 5%
  • None of the above matters. DataContractJsonSerializer allocates 20x more memory than JsonSerializer, and takes around 10x the execution time

DataContractJsonSerializer can take 1-2ms to execute, and is called directly after a network request. We can safely stick with JsonSerializer and perform the extra array copy, it'll still be faster and allocate less memory!

Copy link

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments in-file.

@edwardneal
Copy link
Contributor Author

Thanks @paulmedynski, I've removed the unused usings as noted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants