Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodule recursion #1058

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Submodule recursion #1058

wants to merge 4 commits into from

Conversation

yudjinn
Copy link

@yudjinn yudjinn commented Feb 26, 2025

Description

Motivation and Context

How Has This Been Tested?

Screenshots / Logs (if applicable)

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (no code change)
  • Refactor (refactoring production code)
  • Other

Checklist:

  • My code follows the code style of this project.
  • I have updated the documentation accordingly.
  • I have formatted the code with rustfmt.
  • I checked the lints with clippy.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@yudjinn yudjinn requested a review from orhun as a code owner February 26, 2025 01:34
Copy link

welcome bot commented Feb 26, 2025

Thanks for opening this pull request! Please check out our contributing guidelines! ⛰️

Copy link
Owner

@orhun orhun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, I left some comments that I think help moving further.

Tested it quickly, but didn't see this in action yet. Can you also check again?

@@ -250,6 +252,29 @@ impl Commit<'_> {
}
}

/// Returns wether the commit changes the SHA of a submodule
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns wether the commit changes the SHA of a submodule
/// Returns whether the commit changes the SHA of a submodule

}
}
}
parse_commits(args, &tags, &mut commit_range, repository)?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this has been extracted to a new function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interim derp, thought I might reuse it but I dont need to 😓

@@ -303,6 +279,36 @@ fn process_repository<'a>(
releases.push(Release::default());
}
}
if recurse_submodules {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we are updating the releases vector here based on the commits in the submodules.

It would be nice to extract this to a function that takes &mut releases maybe.

if recurse_submodules {
for (_submodule, _commits) in submodule_commits.values() {
log::trace!("Recursing {:?}.", _submodule.path().to_str());
let _range = format!(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used.

_commits.last().unwrap().id
);
let _repo_path = _submodule.path().to_path_buf();
let _repo = Repository::init(_repo_path.clone())?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of the compiler error by doing this for now:

let repo = Box::leak(Box::new(Repository::init(repo_path.clone())?));

It's a bit bad, but this is also what I'm doing for the other repositories 💀

@lehmanju
Copy link

Oh yes, great, that's a feature I am very interested in. I volunteer for testing, we use this at work for changelog generation with about 200 submodules. Right now we use the repository flag to generate a merged changelog for each submodule.
However, version 2.8 did somehow break it so I am looking forward to true submodule support.

There is one particular issue on CI environments I ran into. If you don't specify fetch-depth: 0 in Github Actions checkout then the submodule history is completely missing and it only contains the last commit. Maybe there is room for further improvement to fetch the necessary submodule commits on demand without fetching the complete history/refs.

@orhun
Copy link
Owner

orhun commented Feb 28, 2025

It's great to hear there is some interest for this feature :)

However, version 2.8 did somehow break it so I am looking forward to true submodule support.

Ah, it's probably related to the monorepo related changes. Would love to take a look into it if you have time to submit an issue about this. But I agree that proper submodule support would be indeed better here.

Maybe there is room for further improvement to fetch the necessary submodule commits on demand without fetching the complete history/refs.

Interesting idea...

@lehmanju
Copy link

lehmanju commented Mar 3, 2025

Couldn't help myself and started tinkering with this code ;). Here are some thoughts:

  • the idea does seem to be:
    • iterate over all commits and save commits that change a submodule
    • gather all submodule commits for these toplevel commits and append them to a release
  • this does not work because:
    • only toplevel commit ids are gathered, but to iterate through submodule commits submodule commit ids need to be known. this requires reading the diff content, maybe commit_changed_submodules_no_cache .
    • instead of doing these two steps separately, recursive commit explosion can be done directly while iterating through toplevel commits. submodule commits are then just appended to the same release tag/repository with an additional submodule value (optional)
    • a template can then process a release grouping/filtering commits based on submodule.

@orhun
Copy link
Owner

orhun commented Mar 3, 2025

Interesting... I didn't take a deeper look into this code but your take makes sense. I think we can wait for @yudjinn for incorporate some changes into this PR or feel free to put up a new draft based on those.

@yudjinn
Copy link
Author

yudjinn commented Mar 3, 2025

I poked at it some more, but in short, @lehmanju is correct that we do need to get submodule changes as well. I am currently planning on doing that as a Vec based on the new/old filename that the commit sees, and then just diffing releases based on the first and last in that vec, but if you have a better idea to do that explosion earlier in one shot, I'd love to see it!

@lehmanju
Copy link

lehmanju commented Mar 4, 2025

I'd probably explode the commits between submodule before and after states somewhere in:

pub fn commits(
&self,
range: Option<&str>,
include_path: Option<Vec<Pattern>>,
exclude_path: Option<Vec<Pattern>>,
) -> Result<Vec<Commit>> {
let mut revwalk = self.inner.revwalk()?;
revwalk.set_sorting(Sort::TOPOLOGICAL)?;
Self::set_commit_range(&mut revwalk, range).map_err(|e| {
Error::SetCommitRangeError(
range.map(String::from).unwrap_or_else(|| "?".to_string()),
e,
)
})?;
let mut commits: Vec<Commit> = revwalk
.filter_map(|id| id.ok())
.filter_map(|id| self.inner.find_commit(id).ok())
.collect();
if include_path.is_some() || exclude_path.is_some() {
let include_patterns = include_path.map(|patterns| {
patterns.into_iter().map(Self::normalize_pattern).collect()
});
let exclude_patterns = exclude_path.map(|patterns| {
patterns.into_iter().map(Self::normalize_pattern).collect()
});
commits.retain(|commit| {
self.should_retain_commit(
commit,
&include_patterns,
&exclude_patterns,
)
});
}
Ok(commits)
}

if let Ok(diff) = self.inner.diff_tree_to_tree(
commit.tree().ok().as_ref(),
prev_commit.tree().ok().as_ref(),
None,
) {
changed_files.extend(
diff.deltas().filter_map(|delta| {
delta.new_file().path().map(PathBuf::from)
}),
);
}

And since this line already iterates through the diff, I'd extend it to look into the file contents to get the submodule commit id.

However, I'm stuck at how to ask libgit2 to get me the file contents if I have a diff?
DiffFile doesn't have a getter to get the contents.

@orhun
Copy link
Owner

orhun commented Mar 5, 2025

And since this line already iterates through the diff, I'd extend it to look into the file contents to get the submodule commit id.
However, I'm stuck at how to ask libgit2 to get me the file contents if I have a diff?

Not sure how it would be possible to get the submodule information from a Diff but it is possible to get the file contents via calling Diff::print. Here you can also filter by DiffDelta (and DiffFile).

Here is an example from git2: https://github.com/rust-lang/git2-rs/blob/d1ae3b6c2d1200e7d82468af447fa66259225ecf/examples/log.rs#L201-L209

@@ -273,10 +281,34 @@ fn process_repository<'a>(
let mut previous_release = Release::default();
let mut first_processed_tag = None;
let repository_path = repository.path()?.to_string_lossy().into_owned();
let mut submodule_commits: HashMap<PathBuf, (&mut Repository, Vec<Commit>)> =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this complex type? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't have to reinit The repo every time we want to parse commits, but tbh I put some of that together after a very full day so I plan on redoing it a bit. The pathbuf being the key is just to avoid having to impl comparisons for submodules

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. it would be nice to refactor/simplify this a bit for sure :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might just swap it to be a tuple and replace the end commit instead. Plan is that for any given submodule, the releases we include should be inclusive and in aggregate. So if a single release has multiple commits bumping the submodule underneath it, just aggregate all of those changes

@orhun orhun changed the title DRAFT: submodule recursion Support submodule recursion Mar 5, 2025
@orhun orhun marked this pull request as draft March 5, 2025 14:27
@orhun orhun changed the title Support submodule recursion Submodule recursion Mar 5, 2025
@yudjinn
Copy link
Author

yudjinn commented Mar 5, 2025

I'd probably explode the commits between submodule before and after states somewhere in:

pub fn commits(
&self,
range: Option<&str>,
include_path: Option<Vec<Pattern>>,
exclude_path: Option<Vec<Pattern>>,
) -> Result<Vec<Commit>> {
let mut revwalk = self.inner.revwalk()?;
revwalk.set_sorting(Sort::TOPOLOGICAL)?;
Self::set_commit_range(&mut revwalk, range).map_err(|e| {
Error::SetCommitRangeError(
range.map(String::from).unwrap_or_else(|| "?".to_string()),
e,
)
})?;
let mut commits: Vec<Commit> = revwalk
.filter_map(|id| id.ok())
.filter_map(|id| self.inner.find_commit(id).ok())
.collect();
if include_path.is_some() || exclude_path.is_some() {
let include_patterns = include_path.map(|patterns| {
patterns.into_iter().map(Self::normalize_pattern).collect()
});
let exclude_patterns = exclude_path.map(|patterns| {
patterns.into_iter().map(Self::normalize_pattern).collect()
});
commits.retain(|commit| {
self.should_retain_commit(
commit,
&include_patterns,
&exclude_patterns,
)
});
}
Ok(commits)
}

if let Ok(diff) = self.inner.diff_tree_to_tree(
commit.tree().ok().as_ref(),
prev_commit.tree().ok().as_ref(),
None,
) {
changed_files.extend(
diff.deltas().filter_map(|delta| {
delta.new_file().path().map(PathBuf::from)
}),
);
}

And since this line already iterates through the diff, I'd extend it to look into the file contents to get the submodule commit id.
However, I'm stuck at how to ask libgit2 to get me the file contents if I have a diff? DiffFile doesn't have a getter to get the contents.

the only way I was able to do it was checking the old_path and new_path against the paths from repo.submodules()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants