Add a space before the closing bracket in the parser#1855
Add a space before the closing bracket in the parser#1855levensta wants to merge 2 commits intohtmlhint:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the HTMLParser's tag regex to allow whitespace before the closing bracket. A review comment identifies that the current implementation captures this trailing whitespace within the self-closing slash group, which may cause issues with exact string comparisons in other parts of the system. A suggestion was provided to move the whitespace matching outside the capture group.
| const regTag = | ||
| // eslint-disable-next-line no-control-regex | ||
| /<(?:\/([^\s>]+)\s*|!--([\s\S]*?)--|!([^>]*?)|([\w\-:]+)((?:\s+[^\s"'>\/=\x00-\x0F\x7F\x80-\x9F]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s"'>]*))?)*?)\s*(\/?))>/g | ||
| /<(?:\/([^\s>]+)\s*|!--([\s\S]*?)--|!([^>]*?)|([\w\-:]+)((?:\s+[^\s"'>\/=\x00-\x0F\x7F\x80-\x9F]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s"'>]*))?)*?)\s*(\/?\s*))>/g |
There was a problem hiding this comment.
The addition of \s* inside the capture group for the self-closing slash will cause the close property of the tag event to include trailing whitespace (e.g., "/ " instead of "/"). This is likely to break existing rules, such as tag-self-close or tag-pair, which often perform exact string comparisons against "/" to identify self-closing tags. The whitespace should be matched outside the capture group to ensure the close property remains clean while still allowing the flexible syntax.
| /<(?:\/([^\s>]+)\s*|!--([\s\S]*?)--|!([^>]*?)|([\w\-:]+)((?:\s+[^\s"'>\/=\x00-\x0F\x7F\x80-\x9F]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s"'>]*))?)*?)\s*(\/?\s*))>/g | |
| /<(?:\/([^\s>]+)\s*|!--([\s\S]*?)--|!([^>]*?)|([\w\-:]+)((?:\s+[^\s"'>\/=\x00-\x0F\x7F\x80-\x9F]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s"'>]*))?)*?)\s*(\/?)\s*)>/g |
Short description of what this resolves
From time to time, users of code editors may make mistakes in the syntax of closing tags and add an extra space before the closing bracket. However, there is no guarantee that syntax highlighting will work in a way that allows the user to intuitively realize they have made a syntax error. For example, in monaco-editor, there is no difference in highlighting between
<tag />and<tag / >I have also seen in practice that users can exploit this flaw and intentionally bypass validation rules
Proposed changes
The regular expression has been updated to include
\s*before the closing bracket