Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Independent Finding Execution Behavior #1030

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

andrecsilva
Copy link
Contributor

Overview

Adds independent finding behavior for codemods

Description

Independent finding behavior means that codemods will execute once for each finding (i.e. remediation), as opposed to once for all findings (i.e. hardening). Hardening aims to produce a file with all the changes, while remediation will produce diffs for each finding. Remediation is the default behavior accessible with the codemodder command, while hardening is now accessible as codemodder_hardening.

Additional Details

  • Remediation won't write any files as the default behavior, this includes the dependencies.
  • While find and fix codemods will still be accessible over the remediation behavior, there is no difference over the hardening behavior. The only exception being codemods that use semgrep scans for findings.
  • Most tests with 2+ changes are now being tested for remediation behavior. I've found this gives us a healthy balance of testing for both behaviors while avoiding duplicating each test.

@andrecsilva andrecsilva marked this pull request as draft March 24, 2025 12:00
@andrecsilva andrecsilva marked this pull request as ready for review March 25, 2025 12:19
Copy link
Member

@drdavella drdavella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. My comments boil down to two basic issues:

  • We should preserve the current hardening behavior as the default for now
  • I'm not totally convinced that locations are being handled correctly for the hardening case

@@ -135,6 +136,7 @@ def run(
sast_only: bool = False,
ai_client: bool = True,
log_matched_files: bool = False,
hardening: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hardening: bool = False,
hardening: bool = True,

@@ -38,6 +38,6 @@ jobs:
repository: pixee/pygoat
path: pygoat
- name: Run Codemodder
run: codemodder --dry-run --output output.codetf pygoat
run: codemodder_hardening --dry-run --output output.codetf pygoat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should preserve the existing behavior of the command line for now.

@@ -136,7 +136,7 @@ def test_fail_to_add(self, tmp_repo):
os.chmod(tmp_repo / self.requirements_file, 0o400)

command = [
"codemodder",
"codemodder_hardening",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the existing behavior for now.

@@ -25,7 +25,7 @@ def test_two_codemods(self, codemods, tmpdir):
shutil.copy(pathlib.Path(SAMPLES_DIR) / source_file_name, directory)

command = [
"codemodder",
"codemodder_hardening",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserve existing behavior for now.

pyproject.toml Outdated
@@ -45,6 +45,7 @@ Repository = "https://github.com/pixee/codemodder-python"

[project.scripts]
codemodder = "codemodder.codemodder:main"
codemodder_hardening = "codemodder.codemodder:harden"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. We expect per-finding mode to be accessed exclusively by client library calls for now and we should preserve the existing codemodder behavior as the default in the near term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do use the new command line script for integration tests.

@@ -231,7 +233,7 @@ def run(
return codetf, 0, token_usage


def _run_cli(original_args) -> int:
def _run_cli(original_args, hardening=False) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _run_cli(original_args, hardening=False) -> int:
def _run_cli(original_args, hardening=True) -> int:

)
return status


def harden():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary; see comments above.

contexts.extend([process_file(file) for file in files_to_analyze])
else:
# Do each result independently and outputs the diffs
if not hardening:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else feels like each branch should maybe be a separate function. It would probably enable the call to be a one-liner as well.

singleton = results.__class__()
singleton.add_result(result)
result_locations = self.get_files_to_analyze(context, singleton)
# We do an execution for each location in the result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. We only need to execute each fix per finding. A finding may report multiple locations, but will still result in only a single fix (which itself may touch multiple locations). I think the distinction is subtle but we still execute each codemod only once per finding.

Let me know whether there's something I'm missing or whether maybe the comment isn't quite right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things to note: our transformers are file-based, that is, they execute per file. ChangeSet objects only reports changes to a single file.

This particular piece of code mimics the behavior we had before. If you only had one result with multiple locations, you would run the codemod for each location and produce a changeset object for each location.

What changed is that now each changeset is only associated with a single finding.

Copy link
Contributor

@clavedeluna clavedeluna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few code comments that seem unnecessarily explaining code (LLM generated likely) that are worth removing

pyproject.toml Outdated
@@ -45,6 +45,7 @@ Repository = "https://github.com/pixee/codemodder-python"

[project.scripts]
codemodder = "codemodder.codemodder:main"
codemodder_hardening = "codemodder.codemodder:harden"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency with other commands, I suggest codemodder-hardening

@@ -72,9 +73,10 @@ def run_and_assert(

path_exclude = [f"{tmp_file_path}:{line}" for line in lines_to_exclude or []]

print(expected_diff_per_change)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(expected_diff_per_change)

@andrecsilva andrecsilva force-pushed the independent-execution branch from 042b18c to aceebc2 Compare April 3, 2025 12:29
Copy link

sonarqubecloud bot commented Apr 3, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants