JavaScript / ECMAScript

Non-ASCII Character in JS

Match runs of non-ASCII characters (anything outside U+0000–U+007F).

Pattern

regexJS

[^\x00-\x7F]+   (flags: g)

JavaScript / ECMAScript code

jsJavaScript

const re = new RegExp("[^\\x00-\\x7F]+", "g");
const input = "Hello, café!";
const matches = [...input.matchAll(re)];
console.log(matches.map(m => m[0]));

Uses `String.prototype.matchAll` for global iteration (Node 12+ / all modern browsers).

How the pattern works

[^\x00-\x7F] is a negated character class: anything NOT in the ASCII range 0x00–0x7F. The trailing + groups consecutive non-ASCII characters into a single match (so `café` matches as `é`, `naïve` as `ï`, etc.). Useful for finding accented characters, emoji, CJK, and other Unicode in otherwise-ASCII source.

Examples

Input

Hello, café!

Matches

é

Input

naïve résumé 🎉

Matches

ï
é
é
🎉

Input

plain ascii here

No match

—

Same pattern, other engines

Python (re)

Supported

See how this pattern looks (and behaves) in Python's stdlib `re` module.

Go (RE2)

Supported

See how this pattern looks (and behaves) in Go's `regexp` package (RE2 engine).

← Back to Non-ASCII Character overview (all engines)