HTML Tag Matcher in PY
Match paired HTML tags and capture the tag name and inner content using a back-reference.
Try it in the PY tester →Pattern
regexPY
<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>([\s\S]*?)<\/\1> (flags: g)Python (re) code
pyPython
import re
pattern = re.compile(r"<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>([\s\S]*?)<\/\1>")
input_text = "<p>Hello world</p>"
for m in pattern.finditer(input_text):
print(m.group(0))Stdlib `re` module — no third-party dependency. Works on Python 3.6+.
How the pattern works
Group 1 captures the tag name. [^>]* matches attributes. [\s\S]*? lazily captures inner content. \1 back-references the opening tag name to ensure the closing tag matches.
Examples
Input
<p>Hello world</p>Matches
<p>Hello world</p>
Input
<div class="box">content</div>Matches
<div class="box">content</div>