Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for componentjs #4138

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

NucleonGodX
Copy link

Fixes #4107

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁

Signed-off-by: NucleonGodX [email protected]

@NucleonGodX NucleonGodX changed the title Add support for componentjs #4107 Add support for componentjs Feb 9, 2025
Signed-off-by: NucleonGodX <[email protected]>
Signed-off-by: NucleonGodX <[email protected]>
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NucleonGodX Thanks a lot for the PR, this is looking great as a start.
I have suggested some more improvements here for your considerations, please address these.
A couple general comments:

  • also implement package assembly
  • review and check if we are missing anything in the spec
  • we probably don't need all the staticmethods
  • make sure the tests pass, and make tests using expectation files

Apologies for the late review 😅

"""
datasource_id = "component_json_metadata"
path_patterns = ("*component.json",)
default_package_type = "library"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default_package_type = "library"
default_package_type = "generic"

return dependencies

@classmethod
def _extract_license_statement(cls, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to process/normalize this separately, this is handled while creating PackageData generally, since this is common across ecosystems, see https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/models.py#L782

)

if namespace and name:
package_data['purl'] = PackageURL(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to populate the purl field explicitly, this is populated based on the values while creating the PackageData object, in a more general way see https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/models.py#L302.

name=name,
namespace=namespace,
version=data.get('version'),
description=data.get('description', ''),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description=data.get('description', ''),
description=data.get('description'),

We want the defaults to be None (or whatever is defined at the model for this attribute) always if we don't have any value.

version=data.get('version'),
description=data.get('description', ''),
homepage_url=cls._extract_homepage(data),
keywords=data.get('keywords', []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keywords=data.get('keywords', []),
keywords=data.get('keywords'),

dependencies.append(
models.DependentPackage(
purl=purl,
scope='runtime',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scope='runtime',
scope='dependencies',

Here this is specific to the manifest type

with open(location, "r", encoding="utf-8") as f:
data = json.load(f)

name = data.get('name') or data.get('repo', '').split('/')[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is name not a required attribute here? is repo always formatted like the following: "repo": "chaijs/chai"?
From https://github.com/componentjs/spec/blob/master/component.json/specifications.md#name seems like this is required. Same comment for namespace processing
Please go through the full spec carefully

, "testing"
, "chai"
]
, "main": "index.js"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a assemble and assign_packages_to_resources_* functions to process these properly. These are used to process top level packages and assign files to these package objects (to resolve which files are part of a package). This is handled by default functions which are base implementations: https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/models.py#L1137 but whenever these are specific data we need to override these by explicit functions. This populates the for_packages attribute of resources and does a couple other things.

Then also add tests with directories and files to test this too.

See other examples of this in other datafilehandlers, like the simple assembly in https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/conda.py#L42
Please ask questions on this if you need help with this, as this can be more complex

test_file = self.get_test_loc('componentjs/jszip/component.json')
result_packages = list(componentjs.ComponentJSONMetadataHandler.parse(test_file))
expected_packages = [
models.PackageData(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test data which are like these should be in expectation JSON files, see

self.check_packages_data(packages, expected_loc, regen=REGEN_TEST_FIXTURES)
where we have check_packages_data to support this in an easy way, like full JSON scans. Otherwise we have to add/maintain this manually which isn't great :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is applicable for all the tests below

check_json_scan(expected_file, result_file, regen=REGEN_TEST_FIXTURES)

def test_parse_jszip_component_json(self):
test_file = self.get_test_loc('componentjs/jszip/component.json')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to add two tests, one with full scans with --package and then one here. See the comment below on how to test single manifests. Full scans are useful to test both package manifest parsing and package assembly together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Componentjs component.json
2 participants