Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ///-style doc comments (and possibly also others) #233

Open
ruifengx opened this issue Jan 23, 2024 · 10 comments
Open

Support ///-style doc comments (and possibly also others) #233

ruifengx opened this issue Jan 23, 2024 · 10 comments
Labels
enhancement parser Related to the Clang glue layer

Comments

@ruifengx
Copy link

First, thanks for your work!

I believe clang already has support for this, i.e., it already combines consecutive single-line doc-comments together. Therefore, it should be a matter of stripping /// prefixes in hawkmoth.docstring. And FYI in the slides here (page 9-ish) there are some examples for which comment gets attached to which entity:

/// comment
void f();             int a; ///< comment
/** comment */
void g();             int b; /**< comment */
/*! comment */
void h();             int c; /*!< comment */
@jnikula jnikula added enhancement parser Related to the Clang glue layer labels Jan 23, 2024
@jnikula
Copy link
Owner

jnikula commented Jan 23, 2024

Thanks for the report!

I'm afraid it isn't as trivial as you might think. Clang can indeed do a bunch of comment parsing automatically, but it does not really support documenting macros. So either you get all the fancy things it does, without macro support, or you use its tokenization to identify and parse comments yourself, with support for documenting macros, but without all the fancy things. For me, being able to document macros was always must have, so I chose the latter for Hawkmoth.

This means having to match comments from the tokenized output with the Clang cursors from the AST. In theory, it's possibly to do all the things Clang does with the comments... but here opinions kick in a bit. Personally, I was never really fond of the multitude of comment styles supported by Doxygen, while I liked the simplicity of Javadoc, /** ... */ and that's it. I just didn't bother with the implementation complexity of things I personally didn't like.

I think there were also some issues with being able to add generic documentation comments (which clang unhelpfully combined with other comments) as well as whitespace (which reStructuredText is fussy about).

I'm not immediately rejecting this, but quite frankly I also don't see myself putting a lot of effort into this either. I might consider merging a decent pull request.

@BrunoMSantos
Copy link
Collaborator

Just a quick note on what you should expect if you do venture into this thing, allowing the documentation comment to come after the thing being documented is a very different thing from simply supporting additional forward documentation markers. docstring.is_doc would need to report back what flavour it is (forwards or backwards) and, more importantly, parser._comment_extract needs to get a significant update.

Note that there are cursor types that must be skipped and others that break the association of comments to (following) cursors and the logic operates on a look ahead basis only at the moment. Not only _comment_extract would need to deal with all that in both directions, it would now also have to make good sense of things like this of course:

/**< this is messed up */
int bar;
/**< is this allowed? */
/** foo */
int foo; /**< how about this? */
/**< is this still foo */
/** some unrelated docstring */
/**< what about now? */

@ruifengx
Copy link
Author

Thanks for your detailed explanations, now I see it is indeed complicated. It is very unfortunate that clang does not attach comments to macros, otherwise I found it very convenient to use the cursor API, with Cursor.brief_comment or Cursor.raw_comment. I think I won't be able to roll out a PR very soon, especially if wee also support attaching trailing comments with <, but I do still want to do it eventually at least for the single-line comment style (///).

@llbit
Copy link

llbit commented Oct 11, 2024

The diff below allowed me to be able to use //!< comments for struct fields. It should be fairly easy to adapt this to work with other styles of comments and declarations as well:

diff --git a/src/hawkmoth/docstring.py b/src/hawkmoth/docstring.py
index f0eddfd..5ffddf9 100644
--- a/src/hawkmoth/docstring.py
+++ b/src/hawkmoth/docstring.py
@@ -91,6 +91,10 @@ class Docstring():
         """Test if comment is a C documentation comment."""
         return comment.startswith('/**') and comment != '/**/'

+    @staticmethod
+    def is_doxygen_trailing(comment):
+        return comment.startswith('//!<')
+
     def _get_header_lines(self):
         name = self._get_decl_name()
         domain = self._domain
@@ -100,7 +104,10 @@ class Docstring():
         return header.splitlines()

     def _get_comment_lines(self):
-        return statemachine.string2lines(self._text, 8, convert_whitespace=True)
+        if self._text.startswith('//!<'):
+            return [self._text[5:]]
+        else:
+            return statemachine.string2lines(self._text, 8, convert_whitespace=True)

     @staticmethod
     def _remove_comment_markers(lines):
diff --git a/src/hawkmoth/parser.py b/src/hawkmoth/parser.py
index e8f86dd..9c3cd63 100644
--- a/src/hawkmoth/parser.py
+++ b/src/hawkmoth/parser.py
@@ -148,12 +148,18 @@ def _comment_extract(tu):
     top_level_comments = []
     comments = {}
     current_comment = None
+    prev_field = None

     def is_doc(cursor): return cursor and docstring.Docstring.is_doc(cursor.spelling)

     for token in tu.get_tokens(extent=tu.cursor.extent):
         # Handle all comments we come across.
         if token.kind == TokenKind.COMMENT:
+            if prev_field and docstring.Docstring.is_doxygen_trailing(token.spelling):
+                comments[prev_field.hash] = token
+                prev_field = None
+                current_comment = None
+                continue
             # If we already have a comment, it wasn't related to another cursor.
             if is_doc(current_comment):
                 top_level_comments.append(current_comment)
@@ -187,6 +193,9 @@ def _comment_extract(tu):
         # level comment.
         if is_doc(current_comment):
             comments[token_cursor.hash] = current_comment
+        elif token_cursor.kind is CursorKind.FIELD_DECL:
+            prev_field = token_cursor
+
         current_comment = None

     # Comment at the end of file.

Not the most elegant solution perhaps, but I only spent about 30 minutes on this.

Edit: If there is a stray //!< comment after the end of a struct that will get attached to the last field of the previous struct if it was uncommented. Should be easy to fix this by resetting prev_field in the right place.

@jnikula
Copy link
Owner

jnikula commented Oct 11, 2024

Basically this boils down to two things I've mentioned before.

First:

Personally, I was never really fond of the multitude of comment styles supported by Doxygen, while I liked the simplicity of Javadoc, /** ... */ and that's it. I just didn't bother with the implementation complexity of things I personally didn't like.

Originally, I never aimed at being a drop-in replacement for Doxygen. The goal was to see how you could import documentation comments from Clang to Sphinx with no legacy baggage. Specifically, the legacy baggage was the Linux kernel kernel-doc Perl script and its comment format (https://docs.kernel.org/doc-guide/kernel-doc.html). So you can imagine not being enthusiastic about adding support for Doxygen legacy baggage.

Contrast https://jnikula.github.io/hawkmoth/dev/syntax.html with https://www.doxygen.nl/manual/docblocks.html. Doxygen has a huge document describing the various ways you can document the code.

Second:

I'm not immediately rejecting this, but quite frankly I also don't see myself putting a lot of effort into this either. I might consider merging a decent pull request.

So there's no confusion, I'm not going to implement this.

I might consider merging the feature, but I truly fear that the complications and corner cases of Doxygen comment style would lead to an ugly and bloated implementation. For example, with // comments you also need to save and join the consecutive comments, for both before/after comments. The feature must be accompanied with tests and documentation, of course.

Doxygen source code is on the order 1000x more lines of code than Hawkmoth. Doxygen has 300+ configuration options, Hawkmoth has maybe ten. We're definitely not competing on features. Simplicity and native Sphinx integration are the main benefits of Hawkmoth. I want to balance that carefully.

@llbit
Copy link

llbit commented Oct 12, 2024

I truly fear that the complications and corner cases of Doxygen comment style would lead to an ugly and bloated implementation.

Amen brother, I know too well the pain of ever expanding software complexity. When the requirements just keep on piling up and your elegant beautiful code baby grows into an ugly mess of necessary complexity before your very eyes, not even solely by others doing but often by your own hand. It is truly the most awful experience of life as a programmer.

The simplest code is truly the most beautiful code and the simplest code is the code that does nothing. Therefore we should all code in Haskell. Unfortunately I live in the real world where evil people keep putting these constraints on me and causing me to have to use code that was documented in Doxygen even though everyone knows that uniformity is the root of simplicity and the only uniform documentation system is JavaDoc.

Complex code can have beauty too however, I suppose there is beauty in solving a complex problem in an elegant way. I therefore embarked upon a fork in the path called Dewmoth to solve the complex problem of joining directly adjoining //!< comments into one. I wish you good luck on your journey and may you find happiness in the simplicity you have constrained yourself to, or a beautiful merge request full of tests if that is what you truly wish for.

@jnikula
Copy link
Owner

jnikula commented Oct 13, 2024

@llbit It looks like my choices on definining the scope and quality of what's essentially my hobby frustrate you.

My promise is to take on maintainership of the stuff I merge, and keep it working for users in the real world, with fairly high standards. I don't think that's an insignificant commitment. That's why I require tests for new features.

I won't promise to take on extra voluntary work, I don't owe that to anyone, and I don't need to be apologetic about it.

I didn't much care for C++ support myself either, I didn't put much effort into it, but the contributions were of good quality, and came with tests and documentation, so I merged it. And now I support it. Obviously a much bigger change than the comment format.

I sincerely think it would be much easier to write tests and documentation than to maintain a fork in the long term. But you do you. And I'll be open to your pull requests if you have a change of heart.

@llbit
Copy link

llbit commented Oct 13, 2024

@jnikula I was just trying to be funny in my previous reply, sorry for any confusion it caused. I sincerely have no intention to try to convince you to add anything new to Hawkmoth. It seems to be a great tool and you should do whatever you see fit with it.

In the spirit of open-source altruism I just wanted to share my diff here in the hopes that it would be useful to anyone else trying to use Doxygen comments with Hawkmoth. We've been using Doxygen previously at my company and are in the process of switching to Sphinx and Doxygen support was needed so I briefly looked into this.

@jnikula
Copy link
Owner

jnikula commented Oct 14, 2024

Originally, I never aimed at being a drop-in replacement for Doxygen.

I guess it's a fair question whether Hawkmoth should become a drop-in replacement for Doxygen. What would be the minimal requirements for that? The low-hanging fruit, the 20% effort that covers 80% of the use cases. Is it worth it to try to attract more users and hopefully more contributors to look after it?

Previously I've also thought about using Sphinx extensions for supporting different comment styles i.e. allow user extensions for identifying and removing comment markers instead of embedding everything in Hawkmoth. Sure, adds complexity but offloads some of that to extensions.

@BrunoMSantos
Copy link
Collaborator

I guess it's a fair question whether Hawkmoth should become a drop-in replacement for Doxygen.

Sphinx and Doxygen are fundamentally different approaches to documentation in my mind, even if hawkmoth in particular and Doxygen share some similarities at surface level. My motivation here was never the different syntax (though that's quite an improvement as well). In the same line of thought, I think an equal feature set is not only impossible, it's undesirable as the core goal.

A plugin approach is probably the only sensible way to get compatibility and I think you've already delivered on that: 1st define what compatibility looks like (what to do with @ingroup, etc.?) and then write a (very quirky) plugin. We currently provide one that probably already covers nearly as much as we can without dictating things that go well beyond in-source documentation.

That said, comment markers are a bit different and I'm less inclined to disregard such proposals, even if I would be unlikely to ever use different / post markers myself. But not for compatibility with Doxygen's sake:

  1. Supporting a // lead would likely make sense (especially with C++ being supported), but probably not all kinds that Doxygen supports. I'm biased towards /// as a counterpoint to repeating the * in /**. Doing so through a plugin or supporting additional styles through plugins would be ok and in line with letting users do their own thing.

  2. Post line comments are often abused for trivialities, but they arguably allow for neat documentation some times. Unlike the previous one, this cannot be done through a simple pre-processing plugin. We'd need to support it in comment extraction. Unless part of / the whole parser becomes a plugin, which may be interesting for other reasons too, but not too appealing in the near term I should think.

And that's what justifies keeping this issue open in my opinion, but it also doesn't change the fact that tests are a good requirement, especially in this sort of features as there are so many edge cases to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement parser Related to the Clang glue layer
Projects
None yet
Development

No branches or pull requests

4 participants