Python (re)

HTML Tag Matcher in PY

Match paired HTML tags and capture the tag name and inner content using a back-reference.

Pattern

regexPY

<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>([\s\S]*?)<\/\1>   (flags: g)

Python (re) code

pyPython

import re

pattern = re.compile(r"<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>([\s\S]*?)<\/\1>")
input_text = "<p>Hello world</p>"
for m in pattern.finditer(input_text):
    print(m.group(0))

Stdlib `re` module — no third-party dependency. Works on Python 3.6+.

How the pattern works

Group 1 captures the tag name. [^>]* matches attributes. [\s\S]*? lazily captures inner content. \1 back-references the opening tag name to ensure the closing tag matches.

Examples

Input

<p>Hello world</p>

Matches

<p>Hello world</p>

Input

<div class="box">content</div>

Matches

<div class="box">content</div>

Same pattern, other engines

JavaScript / ECMAScript

Supported

See how this pattern looks (and behaves) in JavaScript's built-in RegExp.

Go (RE2)

See workaround

See how this pattern looks (and behaves) in Go's `regexp` package (RE2 engine).

← Back to HTML Tag Matcher overview (all engines)