Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parse/html): handle unclosed elements more gracefully #5063

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 28 additions & 10 deletions crates/biome_html_factory/src/generated/node_factory.rs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 19 additions & 12 deletions crates/biome_html_formatter/src/html/auxiliary/element.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ impl FormatNodeRule<HtmlElement> for FormatHtmlElement {
closing_element,
} = node.as_fields();

let closing_element = closing_element?;
let opening_element = opening_element?;
let tag_name = opening_element.name()?;
let tag_name = tag_name
Expand Down Expand Up @@ -62,9 +61,13 @@ impl FormatNodeRule<HtmlElement> for FormatHtmlElement {
.last_token()
.is_some_and(|tok| tok.has_trailing_whitespace())
|| closing_element
.l_angle_token()
.ok()
.is_some_and(|tok| tok.has_leading_whitespace_or_newline());
.as_ref()
.map(|e| {
e.l_angle_token()
.ok()
.is_some_and(|tok| tok.has_leading_whitespace_or_newline())
})
.unwrap_or_default();

// "Borrowing" in this context refers to tokens in nodes that would normally be
// formatted by that node's formatter, but are instead formatted by a sibling
Expand Down Expand Up @@ -98,7 +101,7 @@ impl FormatNodeRule<HtmlElement> for FormatHtmlElement {
None
};
let borrowed_closing_tag = if should_borrow_closing_tag {
Some(closing_element.clone())
closing_element.clone()
} else {
None
};
Expand Down Expand Up @@ -147,13 +150,17 @@ impl FormatNodeRule<HtmlElement> for FormatHtmlElement {
}
}
}
FormatNodeRule::fmt(
&FormatHtmlClosingElement::default().with_options(FormatHtmlClosingElementOptions {
tag_borrowed: should_borrow_closing_tag,
}),
&closing_element,
f,
)?;
if let Some(closing_element) = closing_element {
FormatNodeRule::fmt(
&FormatHtmlClosingElement::default().with_options(
FormatHtmlClosingElementOptions {
tag_borrowed: should_borrow_closing_tag,
},
),
&closing_element,
f,
)?;
}

Ok(())
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ HtmlRoot {
r_angle_token: [email protected] ">" [] [],
},
children: HtmlElementList [],
closing_element: missing (required),
closing_element: missing (optional),
},
],
eof_token: [email protected] "" [Newline("\n")] [],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ HtmlRoot {
value_token: [email protected] "foo" [] [],
},
],
closing_element: missing (required),
closing_element: missing (optional),
},
],
eof_token: [email protected] "" [Newline("\n")] [],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ HtmlRoot {
r_angle_token: missing (required),
},
children: HtmlElementList [],
closing_element: missing (required),
closing_element: missing (optional),
},
],
eof_token: [email protected] "" [Newline("\n")] [],
Expand Down
8 changes: 4 additions & 4 deletions crates/biome_html_syntax/src/generated/nodes.rs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions crates/biome_html_syntax/src/generated/nodes_mut.rs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion xtask/codegen/html.ungram
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ HtmlSelfClosingElement =
HtmlElement =
opening_element: HtmlOpeningElement
children: HtmlElementList
closing_element: HtmlClosingElement
closing_element: HtmlClosingElement?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct though, by spec a closing element should always be there. If we require handling some error case, we should use bogus nodes instead.

Copy link
Contributor Author

@dyc3 dyc3 Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatter needs to be able to add closing tags to unclosed elements. See prettier's output: https://biomejs.dev/playground/?lintRules=all&files.main.html=PABkAGkAdgA%2BAA%3D%3D

The reason I did it like this is that that bogus nodes are not structured like normal nodes, and so it would be harder to extract the tag name so that the closing tag can be added. HTML_BOGUS_ELEMENT doesn't necessarily have a HTML_OPENING_ELEMENT when it occurs.

(Also, fun fact, the HTML spec does allow some elements to omit their closing tag, like <tr> and <td>. Prettier's parser doesn't handle that though. See the code examples here: https://html.spec.whatwg.org/multipage/tables.html#the-table-element)

I'll look into it some more and add some tests.

Copy link
Member

@ematipico ematipico Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this comes from the fact that browsers can "fix" the HTML, this means that having something like the following snippet in valid in the browser.

<div><div><div><div><div><span><div><td><tr>

However, the w3c validator emits errors if the closing element is missing 🤔

I know that Astro parses the HTML as it was the browser, so it patches it during the compilation. I suppose it makes sense for a compiler, but I'm not sure it makes sense for a formatter.

I am torn about the change. There's also to note that Prettier uses a fork of the angular HTML parser, so we should expect that the parser is made for angular in the first place.

Maybe we could evaluate some options for HTML parsing (a strict one, where opening elements are mandatory, and a loose one; we can discuss it later). If you want to move forward with this change, that's fine. However, we need to change the parsing logic and not emit a diagnostic if the closing element is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as far as I'm aware, the "auto fixing" that browsers do is just a thing browsers do and not actually specified in the html spec. The behavior I was talking about is the one that where if you give it <td> foo <td> bar it will actually result in <td> foo </td> <td> bar </td> (2 sibling tags) and not <td> foo <td> bar </td> </td> (where the first is the parent of the second). But I digress.

Would it make sense to have a new node defined like this?

HtmlElementUnclosed = 
	opening_element: HtmlOpeningElement
	children: HtmlElementList

I attempted this in another branch and encountered some difficulties with the parser assigning the closing tag to the wrong element in cases like this:

<div>
   <span>
</div>

Where it would assign the </div> to the <span> instead of the div, resulting in div becoming the unclosed element in the AST rather than the span.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "auto fixing" that browsers do is just a thing browsers do and not actually specified in the html spec.

That’s how it originated, but I thought even these behaviors became standardized in HTML5.

Given that the behavior is allowed for some nodes, I think the closing_element: HtmlClosingElement? fix is a valid approach.

I suppose it makes sense for a compiler, but I'm not sure it makes sense for a formatter.

I actually think adding this behavior in the formatter too makes sense. It doesn’t change semantics and it formats the same nodes in a more intuitive manner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the behavior is allowed for some nodes, I think the closing_element: HtmlClosingElement? fix is a valid approach.

Then we should update the parser, and remove the diagnostic for those cases where it's allowed to not have a closing tag



// <a href="">
Expand Down
Loading