Finding all HTML tags in a project not being self-closed

html, vue, regex, cli

02 Apr 2024

I am currently working on upgrading an existing Vue project from version 2 to 3, which involves quite some breaking changes. I don’t want to go into the details, but at one point it was useful to find all elements of a certain Vue component that were not self-closed. In this specific, case it was about a base-input component. The following cases were of interest to me:

<base-input value="Some text"></base-input>
<base-input disabled>Some text in a slot</base-input>

However, the following were not:

<base-input value="Some text" />
<base-input disabled />

There were quite some occurrences of this component in the entire project, therefore just searching for base-input was not going to cut it for me. Instead, I decided to use regular expressions resp. regex with ripgrep. After installing ripgrep it provides a rg command line tool.

The following solution worked for my use case:

rg --multiline '<base-input[^>]*[^/]>'

Let’s break it down:

The --multiline flag will make sure that this pattern is also matched across multiple lines, i.e. the match can contain line breaks.
The <base-input will be searched for literally, i.e. this exact character sequence.
With [^>]* an arbitrary amount (that’s what * stands for) of characters not being > will be matched.
After that, there must be at least one character not being a /, which would indicate a self-closing tag.
Finally, the > finishes the tag.

Although this works for the above examples, it is not a universal solution to the problem. It does for instance not match the following cases:

<base-input></base-input>
<base-input value="Some > text" />

The first line will not be matched, because there must be at least one character not being / after the <base-input literal. Fortunately, that was not a problem for me, since I knew that using that component without attributes does not make any sense, so I could ignore that case.

The second line will match although it shouldn’t, since it recognized the > within the quotes as the end of the tag. This will result in a false positive, but that was also fine for me since this did not occur quite often in the code base.

Unfortunately, it is not even possible to write a full HTML parser using regular expressions, even though so many people ask about this on Stack Overflow that they’ve decided to make this part of their regular expressions FAQ. But that should not stop you from using regular expressions to do quick one-off tasks such as finding some occurrences in a big code base if you know the limitations and how they might affect the results.