Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for maven-shade-plugin #472

Open
raboof opened this issue Mar 6, 2024 · 19 comments
Open

support for maven-shade-plugin #472

raboof opened this issue Mar 6, 2024 · 19 comments

Comments

@raboof
Copy link

raboof commented Mar 6, 2024

When using maven-shade-plugin, the sbom should likely somehow encode which dependencies are 'embedded' in the jar, and which are 'regular' dependencies.

AFAIK there is no convention on how to express this difference yet in CycloneDX. Likely it would make sense to make use of the assembly concept?

I have a test/demo project for this at https://github.com/raboof/maven-shade-sbom/

@stevespringett
Copy link
Member

Yes, using a CycloneDX assembly would be the correct way to represent this.

@prabhu
Copy link

prabhu commented Apr 18, 2024

evidence.identity can also be used to describe the technique and confidence.

@ppkarwasz
Copy link
Contributor

@stevespringett,

By assembly do you mean components.components or something else?

@stevespringett
Copy link
Member

Correct @ppkarwasz

@raboof
Copy link
Author

raboof commented Oct 26, 2024

Yes, using a CycloneDX assembly would be the correct way to represent this.

This makes sense embedded dependencies of dependencies, but I'm having trouble understanding how it works for embedded dependencies of the 'current' project: to take the example in https://github.com/raboof/maven-shade-sbom/, mvn install produces a maven-shaded-sbom-1.0-SNAPSHOT.pom that defines a dependency on org.apache.commons:commons-email:1.6.0, and a maven-shaded-sbom-1.0-SNAPSHOT.jar that contains the classes from org.apache.commons:commons-compress:1.26.0 (and some others).

The current generated SBOM comes down to (removed some details)

{
  "bomFormat" : "CycloneDX",
  "specVersion" : "1.6",
  "serialNumber" : "urn:uuid:e458a584-6054-38cb-bf05-b4c8095ee27d",
  "version" : 1,
  "metadata" : {
    "component" : {
      "type" : "library",
      "bom-ref" : "pkg:maven/net.bzzt/[email protected]?type=jar",
    },
    ...
  },
  "components" : [
    {
        "type" : "library",
        "bom-ref" : "pkg:maven/org.apache.commons/[email protected]?type=jar"
        ...
    },
    {
        "type" : "library",
        "bom-ref" : "pkg:maven/org.apache.commons/[email protected]?type=jar",
    },
    ...
  ],
  "dependencies" : ...

I can see how if some project would depend on maven-shaded-sbom, we could express the fact that commons-compress is embedded into maven-shaded-sbom in that projects' SBOM like this:

{
  "bomFormat" : "CycloneDX",
  "specVersion" : "1.6",
  "serialNumber" : "urn:uuid:e458a584-6054-38cb-bf05-b4c8095ee27d",
  "version" : 1,
  "metadata" : {
    "component" : {
      "type" : "library",
      "bom-ref" : "pkg:maven/net.bzzt/[email protected]?type=jar",
    },
    ...
  },
  "components" : [
    {
        "type" : "library",
        "bom-ref" : "pkg:maven/net.bzzt/[email protected]?type=jar",
        "components " : [
          {
            "type" : "library",
            "bom-ref" : "pkg:maven/org.apache.commons/[email protected]?type=jar",
            ...
          },
          ...
        ]
    },
    {
        "type" : "library",
        "bom-ref" : "pkg:maven/org.apache.commons/[email protected]?type=jar",
        ...
    },
    ...
  ],
  "dependencies" : ...

To be able to construct this SBOM we need to know that commons-compress is embedded into maven-shaded-sbom, rather than just listed as a dependency. That information seems missing from the maven-shaded-sbom SBOM above. How would we encode this information into the SBOM for maven-shaded-sbom? Or should we discover this some other way while generating the SBOM for maven-depending-sbom?

@ppkarwasz
Copy link
Contributor

We were just talking about this the other days with @hboutemy and the solution might be to introduce a property that says if a dependency is dynamically linked or _statically linked, or whatever couple of adjectives makes sense for the ecosystem.

For the Java ecosystem:

  • dynamically linked dependencies would be the external dependencies of a JAR file. The precise version of that dependency is just a recommendation and will be resolved at runtime or when the JAR file is used in an application.
  • statically linked dependencies would be all those embedded in a WAR file, shaded JAR or Spring Boot executable JAR. Security scanners can do their analysis on those.

@hboutemy
Copy link
Contributor

hboutemy commented Oct 27, 2024

"the solution": sadly it's not the full solution but only one key step, as we'll have other aspects to sort out too, like how to deal with a Maven module that produces both the "naked" artifact ("naked" => dynamic dependencies) AND the shaded one (=> a few static dependencies)

IMHO, these 2 topics are the key aspects to be solved before trying to implement the feature

@raboof
Copy link
Author

raboof commented Oct 31, 2024

the solution might be to introduce a property that says if a dependency is dynamically linked or statically linked, or whatever couple of adjectives makes sense for the ecosystem

OK. For the Maven ecosystem, the adjectives 'dynamically linked' and 'statically linked' seems a bit foreign. I think a boolean attribute named maven.embedded or maven.shaded would make sense?

how to deal with a Maven module that produces both the "naked" artifact ("naked" => dynamic dependencies) AND the shaded one (=> a few static dependencies)

That might be a topic for a different issue, or do you think it interacts with this issue strongly? I filed this as #574 - so far I think each artifact having 'their own' SBOM sounds most convenient? That also seems to work well for this use case, as then you don't have issues with how to tag a component that is embedded in one of the artifacts and a 'regular' dependency in the other: we can tag it as 'maven.embedded=true' for the shaded jar and 'maven.embedded=false' for the regular one.

@hboutemy
Copy link
Contributor

hboutemy commented Oct 31, 2024

For the Maven ecosystem, the adjectives 'dynamically linked' and 'statically linked' seems a bit foreign

I know people are not used to this "dynamically linked" term, but they'll have to learn: in a library, a dependency is not not really a strict dependency, the consumer will have conflict resolution that will try to use the library's preferred version, but may finally select another

dependencies in Java, given classpath conflict resolution, are always (and implicitely) "dynamic" = the effective version from a consumer perspective won't necessarily be the asked version

On static vs embedded vs shaded, this is really the case where the dependency is really there, not "virtual", or "soft referenced", or many other terms like "dynamically linked"

@raboof
Copy link
Author

raboof commented Nov 1, 2024

OK. I think embedded describes the difference that is relevant for the SBOM best: shading also somewhat suggests that the package paths inside the pom were changed, which may or may not be the case, and static doesn't do justice to the fact that the classloader still has some freedom.

@hboutemy
Copy link
Contributor

hboutemy commented Nov 1, 2024

in java (or JVM languages), I suppose we can say default is "not embedded" and exception is "embedded"
in containers, default is embedded (not sure "not embedded" does have a meaning for containers)
I don't know for Python or JavaScript

we need IMHO to clarify SBOM semantics more widely than what is implicit in JVM. And define in which CycloneDX field put the resulting info
@stevespringett WDYT?

@stevespringett
Copy link
Member

@hboutemy since this ticket is specifically about the maven-shade-plugin, I would recommend supporting that specifically and supporting component assemblies. That way we can represent that a shaded component includes other components.

If you want to support assemblies for other things or make it configurable, that enhancement should realistically be in another ticket and not conflict with the shade functionality described in this ticket.

@raboof
Copy link
Author

raboof commented Nov 1, 2024

since this ticket is specifically about the maven-shade-plugin, I would recommend supporting that specifically

That seems reasonable (though it can't hurt to 'think ahead' a little and choose an approach that will be easy to generalize later, avoiding conflict)

and supporting component assemblies. That way we can represent that a shaded component includes other components.

I'm not entirely sure we're on the same page yet on how to encode this information. Given the example in #472 (comment), am I summarizing correctly that:

  • In the SBOM of maven-depending-sbom, we should use an assembly to represent that its dependency maven-shaded-sbom (I regret that name now 😆) embeds commons-compress.
  • In the SBOM of maven-shaded-sbom, we should use a maven.embedded=true property on the commons-compress component to represent the fact that the maven-shaded-sbom artifact embeds commons-compress.

@hboutemy
Copy link
Contributor

hboutemy commented Nov 1, 2024

@stevespringett this does not answer how to encode, this does not answer how to differentiate ecosystem-specific default semantics

@hboutemy
Copy link
Contributor

hboutemy commented Nov 2, 2024

created draft PR #576 to list all the cases (and associated plugins) where dependencies are copied to the output artifact

and de-facto started to describe the default with JVM: Maven dependencies are just symbolic expectations, with very loose version preference description

@stevespringett
Copy link
Member

Unless I'm missing something, the information should be encoded in the following way:

  • If a component is embedded inside another component using the shade plugin, then using component[].components is recommended.
  • If a component is embedded and shaded inside another component, then component[].components along with component[].components[].pedigree should be used. With pedigree, you can capture any changes to class and package names along with any other modifications that were made during shading.

@raboof
Copy link
Author

raboof commented Nov 3, 2024

  • If a component is embedded inside another component using the shade plugin, then using component[].components is recommended.

If A embeds B, how do we put that in the SBOM for A? In that case there is no component in component[] that B can be 'put under'?

@stevespringett
Copy link
Member

stevespringett commented Nov 3, 2024

If A embeds B, how do we put that in the SBOM for A

If A is the artifact that the SBOM describes, then use metadata.component.components to describe that the final deliverable (A) also includes B.

@hboutemy
Copy link
Contributor

hboutemy commented Nov 4, 2024

I doubt that metadata.component.components can represent embedded dependencies in anyone expectation: we'll need to share concrete simple examples (I know a few huge ones, but I fear huge ones are too complex to work together)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants