What is meaning of [^] in Javascript regexps?

[^a] means any character other than a, we know, but what does [^] (with no following characters) mean? Just as - loses its meaning of character range in cases such as [-], I assumed that [^] would match the caret. I spent way too long debugging this problem, only to find out that at least in Chrome 19 it appears to match anything--in other words, be equivalent to .. Is there a spec applicable here or what is the expected behavior?

Yes, I'm aware that I can and probably should use [\^]. This question is more in the nature of morbid curiosity.

Answers:

Answer

According to the JavaScript specification (ES3 and ES5), [^] matches any single code unit, the same as [\s\S], [\0-\uffff], (.|\s) (don't use that; unlike the others, it relies on backtracking), etc. The difference from . is that the dot doesn't match the four newline code points (\r, \n, \u2028, and \u2029).

I don't recommend using [^] or [], because they don't work consistently cross-browser, and they prevent your regexes from working in other programming languages. IE <= 8 and older versions of Safari use the traditional (non-JavaScript) regex behavior for empty character classes. Older versions of Opera reverse the correct JavaScript behavior, so that [] matches any code unit and [^] never matches. The traditional regex behavior is that a leading, unescaped ] within a character class is treated as a literal character and does not end the character class.

If you use the XRegExp library, [] and [^] work correctly and consistently cross-browser. XRegExp also adds the s (aka dotall or singleline) flag that makes a dot match any code unit (the same as [^] in a browser that correctly follows the JavaScript spec).

Answer

The caret ^ has many meanings - as with most characters in the regular expression syntax. Furthermore, all characters heavily depend on their context. To complicate things further, some characters and syntax depend on the underlying engine (Perl, Java).

Let's break apart [^]:

[] is a character class.

[^ is the:

Negation of the character class, matching a character not listed in the character class.

You didn't define any characters in the character class. So the behavior is undefined. Meaning there is nothing to negate and therefore it matches anything.

Answer

The meaning is the negation of what follows. Nothing follows here, therefore:

anything except nothing = everything

However, most other RegEx engines throw an error at the expression though:

  • ereg(): REG_EBRACK
  • preg_match(): Compilation failed: missing terminating ]

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.