-
-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(parse/html): handle unclosed elements more gracefully #5063
Open
dyc3
wants to merge
1
commit into
main
Choose a base branch
from
html-unclosed-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+60
−35
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,7 +27,7 @@ HtmlRoot { | |
r_angle_token: [email protected] ">" [] [], | ||
}, | ||
children: HtmlElementList [], | ||
closing_element: missing (required), | ||
closing_element: missing (optional), | ||
}, | ||
], | ||
eof_token: [email protected] "" [Newline("\n")] [], | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,7 +31,7 @@ HtmlRoot { | |
value_token: [email protected] "foo" [] [], | ||
}, | ||
], | ||
closing_element: missing (required), | ||
closing_element: missing (optional), | ||
}, | ||
], | ||
eof_token: [email protected] "" [Newline("\n")] [], | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ HtmlRoot { | |
r_angle_token: missing (required), | ||
}, | ||
children: HtmlElementList [], | ||
closing_element: missing (required), | ||
closing_element: missing (optional), | ||
}, | ||
], | ||
eof_token: [email protected] "" [Newline("\n")] [], | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct though, by spec a closing element should always be there. If we require handling some error case, we should use bogus nodes instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatter needs to be able to add closing tags to unclosed elements. See prettier's output: https://biomejs.dev/playground/?lintRules=all&files.main.html=PABkAGkAdgA%2BAA%3D%3D
The reason I did it like this is that that bogus nodes are not structured like normal nodes, and so it would be harder to extract the tag name so that the closing tag can be added.
HTML_BOGUS_ELEMENT
doesn't necessarily have aHTML_OPENING_ELEMENT
when it occurs.(Also, fun fact, the HTML spec does allow some elements to omit their closing tag, like
<tr>
and<td>
. Prettier's parser doesn't handle that though. See the code examples here: https://html.spec.whatwg.org/multipage/tables.html#the-table-element)I'll look into it some more and add some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this comes from the fact that browsers can "fix" the HTML, this means that having something like the following snippet in valid in the browser.
However, the w3c validator emits errors if the closing element is missing 🤔
I know that Astro parses the HTML as it was the browser, so it patches it during the compilation. I suppose it makes sense for a compiler, but I'm not sure it makes sense for a formatter.
I am torn about the change. There's also to note that Prettier uses a fork of the angular HTML parser, so we should expect that the parser is made for angular in the first place.
Maybe we could evaluate some options for HTML parsing (a strict one, where opening elements are mandatory, and a loose one; we can discuss it later). If you want to move forward with this change, that's fine. However, we need to change the parsing logic and not emit a diagnostic if the closing element is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, as far as I'm aware, the "auto fixing" that browsers do is just a thing browsers do and not actually specified in the html spec. The behavior I was talking about is the one that where if you give it
<td> foo <td> bar
it will actually result in<td> foo </td> <td> bar </td>
(2 sibling tags) and not<td> foo <td> bar </td> </td>
(where the first is the parent of the second). But I digress.Would it make sense to have a new node defined like this?
I attempted this in another branch and encountered some difficulties with the parser assigning the closing tag to the wrong element in cases like this:
Where it would assign the
</div>
to the<span>
instead of thediv
, resulting indiv
becoming the unclosed element in the AST rather than thespan
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s how it originated, but I thought even these behaviors became standardized in HTML5.
Given that the behavior is allowed for some nodes, I think the
closing_element: HtmlClosingElement?
fix is a valid approach.I actually think adding this behavior in the formatter too makes sense. It doesn’t change semantics and it formats the same nodes in a more intuitive manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should update the parser, and remove the diagnostic for those cases where it's allowed to not have a closing tag