Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrated to MessagePack instead of BinaryFormatter #1202

Merged
merged 2 commits into from
Feb 19, 2025

Conversation

grazy27
Copy link
Contributor

@grazy27 grazy27 commented Jan 26, 2025

Overview

This PR replaces BinaryFormatter serialization with the MessagePack library.
I was unable to contribute to #1166, so I’ve opened a new one.

Replacing BinaryFormatter addresses a range of issues, including:

  • .NET 9 compatibility: Upgrading to .NET 9 requires a separate NuGet package, but this package is incompatible with .NET Framework 4.8, which is also targeted by the project.
  • Dotnet.Interactive incompatibility: Using Dotnet.Interactive result in a smth like BinarySerializationDisabled exception. The only workaround currently is adding:
    AppContext.SetSwitch("System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization", true);
    to the beginning of the notebook. It doesn't work each time, it's hard to troubleshoot Dotnet.Interactive, since these exceptions constantly arise and prevent UDFs from execution.
  • Deprecation: BinaryFormatter has been abandoned and is officially deprecated.

Compatibility Considerations

To ensure better compatibility with existing .NET types and reduce coupling with MessagePack, I opted to retain the Serializable and NonSerialized attributes.

One limitation of MessagePack is its handling of types with constructors containing parameters. If parameter names do not match the corresponding property or field names (e.g., a property _value and a constructor argument value), the parameters will not be automatically matched.

In such cases a default, parameterless constructor is required. If no default constructor is found, MessagePack throws a descriptive exception.

This PR fixes #1131.


Security Considerations

Using MessagePack improves serialization security by addressing common vulnerabilities associated with BinaryFormatter. Key points include:

  1. Safe Deserialization:

    • MessagePack does not support arbitrary executable code. It can only serialize and deserialize objects and their properties. Delegates are not supported by design.
    • Data exchange occurs exclusively between the Driver and Worker, ensuring that the input being deserialized is the same as what was serialized by the application.
  2. Typeless Mode:

    • Typeless mode is only used for deserializing custom objects. For primitives, an optimized serialization format is used, which adds a single byte for type information rather than serializing the entire System.Type.
  3. Restricted Type Resolution:

    • The binary payload includes only type descriptions, not definitions.
    • This ensures that only types known to the application domain or located in the deserializing side’s probing paths can be deserialized, preventing malicious derived types from executing custom logic.
  4. Mitigation of RCE (Remote Code Execution) Risks:

    • MessagePack Enhanced Security Mode is enabled to filter object types and disallow known vulnerable ones.
    • A custom filter is implemented to restrict deserialization to:
      • Simple types (primitives, collections, etc.)
      • Types explicitly marked with the [Serializable] attribute.

For reference, here’s an overview paper on potential RCE attack vectors with typeless MessagePack deserialization:
Generating Deserialization Payloads for MessagePack-C# Typeless Mode

@@ -9,11 +9,11 @@ namespace Microsoft.Spark.E2ETest.ExternalLibrary
[Serializable]
public class ExternalClass
{
private string _s;
private string s;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise messagePack cannot instantiate an ExternalClass instance, as it doesn't know what to use for argument.
We can either create a default parameter-less constructor, or rename field to match ctor argument name.

@wudanzy wudanzy self-requested a review February 13, 2025 02:53
@wudanzy
Copy link
Collaborator

wudanzy commented Feb 13, 2025

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wudanzy
Copy link
Collaborator

wudanzy commented Feb 13, 2025

Just come back from Chinese New Year.

Hi @grazy27, I see #1166 says do not merge this PR. #1166 (comment)

Could you please summarize what was the blocker before and if that has been resolved in this PR?

SparkSnail
SparkSnail previously approved these changes Feb 14, 2025
@@ -30,6 +29,7 @@

<ItemGroup>
<PackageReference Include="Apache.Arrow" Version="14.0.2" />
<PackageReference Include="MessagePack" Version="3.0.214-rc.1" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the pre-release version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing, I didn't see that checkbox 'Include pre-release' was checked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped up to the latest release one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, it seems like we don't have latest msgPack version in the Arcade nuget sources, and we don't have standard nuget.org enabled in nuget.config. And that version must have been the latest available in those feeds.

image

I'll have a look how to handle it, either choose existing in those sources version or add nuget.org to package sources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @SparkSnail @wudanzy,

It turns out there's no release MessagePack 3 in feeds recommended by Arcade project.
It seems to be some internal Microsoft strategy, since there's no nuget.org feed in any of dotnet** projects. Example of removing commit

There's this pre-prelease 3.0.214 and release 2.5
image

There were breaking changes between 3.0 and 2.5, 2.5 has different API and requires additional efforts for making it work - and we'll have to spend more efforts to migrate back up when it appears.

I suggest using pre-prelease 3.0, as since it's already in the feed it must have passed some internal review. And if you know who can be reached out to include release 3.* version to the dotnet-public feed, your help would be appreciated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if there's no security concerns, that's fine to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine, we can upgrade it in the future.

@SparkSnail SparkSnail self-requested a review February 14, 2025 06:52
@grazy27
Copy link
Contributor Author

grazy27 commented Feb 15, 2025

Just come back from Chinese New Year.

Hi @grazy27, I see #1166 says do not merge this PR. #1166 (comment)

Could you please summarize what was the blocker before and if that has been resolved in this PR?

Hello, @wudanzy, welcome back!

The comment not to merge refers to a few reasons:

1. The PR is not fully completed and doesn't function properly

See my review comments inside.

2. A Microsoft dev pointed out that both BinarySerializer and MessagePack.Typeless are considered RCE-vulnerable and require additional approval from a manager.

image

For this one, I performed threat modeling and spent extra effort restricting and limiting serialization. Even though we control both serialization and deserialization sides, I ensured that it's not possible to serialize executable code or a custom class that the deserializing part doesn't recognize. Additionally, we can only serialize classes marked as Serializable, which limits known types. The MsgPack blacklist of dangerous classes is explicitly enabled in this PR.

Since Microsoft released a new NuGet package after pr was created, the author suggested skipping this PR entirely.
image

However, I later found that continuing with BinarySerializer caused too many other issues (see PR description). So, I fixed the existing problems, added additional security measures, changed required attributes from MessagePack to .NET standard ones and raised this PR.

@SparkSnail
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

wudanzy
wudanzy previously approved these changes Feb 18, 2025
Copy link
Collaborator

@wudanzy wudanzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @grazy27, thanks for picking this up and make it proceed!

public override void ThrowIfDeserializingTypeIsDisallowed(Type type)
{
// Check against predefined blacklist
base.ThrowIfDeserializingTypeIsDisallowed(type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this is going to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no 'System.Serializable' attribute defined, we won't allow deserialization. Mimics behavior of binarySerializer

base.ThrowIfDeserializingTypeIsDisallowed(type);

// Check if MessagePack can handle this type safely
var formatter = StandardResolver.Instance.GetFormatterDynamic(type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a unit test to cover this?

Copy link
Contributor Author

@grazy27 grazy27 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep
image

We haven't tested internal MessagePack rules, but that's covered inside messagepack itself

@grazy27 grazy27 force-pushed the feature/message-pack-serialization branch from 86a662f to 60e66c3 Compare February 18, 2025 07:47
@wudanzy
Copy link
Collaborator

wudanzy commented Feb 18, 2025

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wudanzy wudanzy merged commit 3f4cd7c into dotnet:main Feb 19, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JVM IPC Deserialization uses BinaryFormatter, which is now Deprecated for OWASP CWE
3 participants