Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize for qualifiers #121

Open
CannedFish opened this issue Jun 6, 2023 · 5 comments
Open

Normalize for qualifiers #121

CannedFish opened this issue Jun 6, 2023 · 5 comments

Comments

@CannedFish
Copy link

Hello.

As the example of purl-spec, urls in qualifiers should be normalized as below:

pkg:maven/org.apache.xmlgraphics/[email protected]?repository_url=repo.spring.io%2Frelease

But in packageurl-python, when I set qualifiers to {"repository_url": "repo.spring.io/release"}, and the to_string will return as below:

pkg:maven/org.apache.xmlgraphics/[email protected]?repository_url=repo.spring.io/release

If I set qualifiers to {"repository_url": "repo.spring.io%2Frelease"}, the to_string will return as below:

pkg:maven/org.apache.xmlgraphics/[email protected]?repository_url=repo.spring.io%252Frelease

Which means the letter % is normalized to %25.

Is that a bug?

@matt-phylum
Copy link

According to the PURL test suite, the correct behavior is that / in a qualifier value should not be escaped (packageurl-python is correct). I've noticed the examples in the package types spec are sometimes improperly canonicalized or even wrong.

@mtichavsky
Copy link

@matt-phylum so are you saying that the library intentionally doesn't adhere to the package-url/purl-spec standard or am I missing something?

Because it says in the Rules for each component section that

qualifiers:

  • For each pair of key = value
    • A value must be a percent-encoded string

I want PURLs produced by this library to be processed by some other tooling, so adhering to some standard for data exchange is quite important to me, and it seems like percent encoded qualifiers are the more standard approach.

Either way, if I created a merge request that would just add an optional argument to to_string method that would encode those qualifiers, would you accept it?

@mtichavsky
Copy link

Looking into it, the URL encoding isn't specific only for qualifiers section, there's an open PR #123 that is trying to encode / in name and version sections of PURLs, but according to the package-url/purl-spec, these sections should be URL encoded as well (meaning not only / should be encoded).

@matt-phylum
Copy link

URL encoding, aka percent encoding, really only describes how to decode a string. It specifies how to encode individual characters, but it doesn't specify which characters are supposed to be encoded. The PURL spec specifies which characters are to be encoded, and / is not one of them: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#character-encoding

However, the PURL spec is probably not explicit enough. It seems like most readers interpret "percent-encoded" to mean some variant of x-www-formurlencoded, which is never mentioned once in the PURL spec. This results in incorrect canonicalization, and, more critically, it sometimes results in incorrect handling of spaces and plus signs.

Besides the section on character encoding, which could be much more explicit, the PURL spec has a test suite which contains the expected canonicalizations for some of the cases. This is one of the cases which is covered. / should not be escaped in qualifiers.

{
  "description": "MLflow model tracked in Azure Databricks (case insensitive)",
  "purl": "pkg:mlflow/CreditFraud@3?repository_url=https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow",
  "canonical_purl": "pkg:mlflow/creditfraud@3?repository_url=https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow",
  "type": "mlflow",
  "namespace": null,
  "name": "creditfraud",
  "version": "3",
  "qualifiers": {"repository_url": "https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow"},
  "subpath": null,
  "is_invalid": false
}

https://github.com/package-url/purl-spec/blob/346589846130317464b677bc4eab30bf5040183a/test-suite-data.json#L506-L517

@mtichavsky
Copy link

You're right, thanks a lot for detailed clarification! I assumed the percent encoding I know from web forms is some sort of a standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants