Python (re)

HTML Entity in PY

Match HTML entities in named (`&`), numeric (`{`), or hex (`💩`) form.

Pattern

regexPY

&(?:[a-zA-Z][a-zA-Z0-9]+|#\d+|#x[0-9a-fA-F]+);   (flags: g)

Python (re) code

pyPython

import re

pattern = re.compile(r"&(?:[a-zA-Z][a-zA-Z0-9]+|#\d+|#x[0-9a-fA-F]+);")
input_text = "Tom &amp; Jerry &lt;3"
for m in pattern.finditer(input_text):
    print(m.group(0))

Stdlib `re` module — no third-party dependency. Works on Python 3.6+.

How the pattern works

The leading `&` and trailing `;` bracket the entity. The middle alternation matches: a named entity ([a-zA-Z][a-zA-Z0-9]+ — letters then alphanumerics, like `amp`, `lt`, `nbsp`); a decimal entity (#\d+, like `#160`); or a hex entity (#x[0-9a-fA-F]+, like `#xA0` or `#x1F600` for emoji).

Examples

Input

Tom & Jerry <3

Matches

Input

Numeric:   Hex: 😀

Matches

 
😀

Input

no entities here

No match

—

Same pattern, other engines

JavaScript / ECMAScript

Supported

See how this pattern looks (and behaves) in JavaScript's built-in RegExp.

Go (RE2)

Supported

See how this pattern looks (and behaves) in Go's `regexp` package (RE2 engine).

← Back to HTML Entity overview (all engines)