Python (re)

HTML Tag Matcher in PY

Match paired HTML tags and capture the tag name and inner content using a back-reference.

Try it in the PY tester →

Pattern

regexPY
<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>([\s\S]*?)<\/\1>   (flags: g)

Python (re) code

pyPython
import re

pattern = re.compile(r"<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>([\s\S]*?)<\/\1>")
input_text = "<p>Hello world</p>"
for m in pattern.finditer(input_text):
    print(m.group(0))

Stdlib `re` module — no third-party dependency. Works on Python 3.6+.

How the pattern works

Group 1 captures the tag name. [^>]* matches attributes. [\s\S]*? lazily captures inner content. \1 back-references the opening tag name to ensure the closing tag matches.

Examples

Input

<p>Hello world</p>

Matches

  • <p>Hello world</p>

Input

<div class="box">content</div>

Matches

  • <div class="box">content</div>

Same pattern, other engines

← Back to HTML Tag Matcher overview (all engines)