-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Independent Finding Execution Behavior #1030
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. My comments boil down to two basic issues:
- We should preserve the current hardening behavior as the default for now
- I'm not totally convinced that locations are being handled correctly for the hardening case
src/codemodder/codemodder.py
Outdated
@@ -135,6 +136,7 @@ def run( | |||
sast_only: bool = False, | |||
ai_client: bool = True, | |||
log_matched_files: bool = False, | |||
hardening: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardening: bool = False, | |
hardening: bool = True, |
.github/workflows/codemod_pygoat.yml
Outdated
@@ -38,6 +38,6 @@ jobs: | |||
repository: pixee/pygoat | |||
path: pygoat | |||
- name: Run Codemodder | |||
run: codemodder --dry-run --output output.codetf pygoat | |||
run: codemodder_hardening --dry-run --output output.codetf pygoat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should preserve the existing behavior of the command line for now.
@@ -136,7 +136,7 @@ def test_fail_to_add(self, tmp_repo): | |||
os.chmod(tmp_repo / self.requirements_file, 0o400) | |||
|
|||
command = [ | |||
"codemodder", | |||
"codemodder_hardening", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the existing behavior for now.
@@ -25,7 +25,7 @@ def test_two_codemods(self, codemods, tmpdir): | |||
shutil.copy(pathlib.Path(SAMPLES_DIR) / source_file_name, directory) | |||
|
|||
command = [ | |||
"codemodder", | |||
"codemodder_hardening", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve existing behavior for now.
pyproject.toml
Outdated
@@ -45,6 +45,7 @@ Repository = "https://github.com/pixee/codemodder-python" | |||
|
|||
[project.scripts] | |||
codemodder = "codemodder.codemodder:main" | |||
codemodder_hardening = "codemodder.codemodder:harden" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary. We expect per-finding mode to be accessed exclusively by client library calls for now and we should preserve the existing codemodder
behavior as the default in the near term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do use the new command line script for integration tests.
src/codemodder/codemodder.py
Outdated
@@ -231,7 +233,7 @@ def run( | |||
return codetf, 0, token_usage | |||
|
|||
|
|||
def _run_cli(original_args) -> int: | |||
def _run_cli(original_args, hardening=False) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _run_cli(original_args, hardening=False) -> int: | |
def _run_cli(original_args, hardening=True) -> int: |
src/codemodder/codemodder.py
Outdated
) | ||
return status | ||
|
||
|
||
def harden(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary; see comments above.
contexts.extend([process_file(file) for file in files_to_analyze]) | ||
else: | ||
# Do each result independently and outputs the diffs | ||
if not hardening: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if/else feels like each branch should maybe be a separate function. It would probably enable the call to be a one-liner as well.
singleton = results.__class__() | ||
singleton.add_result(result) | ||
result_locations = self.get_files_to_analyze(context, singleton) | ||
# We do an execution for each location in the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. We only need to execute each fix per finding. A finding may report multiple locations, but will still result in only a single fix (which itself may touch multiple locations). I think the distinction is subtle but we still execute each codemod only once per finding.
Let me know whether there's something I'm missing or whether maybe the comment isn't quite right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things to note: our transformers are file-based, that is, they execute per file. ChangeSet objects only reports changes to a single file.
This particular piece of code mimics the behavior we had before. If you only had one result with multiple locations, you would run the codemod for each location and produce a changeset object for each location.
What changed is that now each changeset is only associated with a single finding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few code comments that seem unnecessarily explaining code (LLM generated likely) that are worth removing
pyproject.toml
Outdated
@@ -45,6 +45,7 @@ Repository = "https://github.com/pixee/codemodder-python" | |||
|
|||
[project.scripts] | |||
codemodder = "codemodder.codemodder:main" | |||
codemodder_hardening = "codemodder.codemodder:harden" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency with other commands, I suggest codemodder-hardening
@@ -72,9 +73,10 @@ def run_and_assert( | |||
|
|||
path_exclude = [f"{tmp_file_path}:{line}" for line in lines_to_exclude or []] | |||
|
|||
print(expected_diff_per_change) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(expected_diff_per_change) |
e943f3c
to
5c37c17
Compare
042b18c
to
aceebc2
Compare
|
Overview
Adds independent finding behavior for codemods
Description
Independent finding behavior means that codemods will execute once for each finding (i.e. remediation), as opposed to once for all findings (i.e. hardening). Hardening aims to produce a file with all the changes, while remediation will produce diffs for each finding. Remediation is the default behavior accessible with the
codemodder
command, while hardening is now accessible ascodemodder_hardening
.Additional Details