This page has been robot translated, sorry for typos if any. Original content here.

Regular expressions and special characters

Regular expressions in javascript have a special short form and a standard PCRE syntax.

They work through a special object RegExp .

In addition, the strings have their own methods search , match , replace , but to understand them, we'll first analyze RegExp .

An object of type RegExp , or, more briefly, a regular expression, can be created in two ways

/ pattern / flags
new RegExp ("pattern" [, flags])

pattern is a regular expression for the search (about replacement later), and flags are a string from any combination of characters g (global search), i (case insignificant), and m (multiline search).

The first method is used often, the second - sometimes. For example, two such calls are equivalent:

var reg = / ab + c / i
var reg = new RegExp ("ab + c", "i")

At the second call - because the regular expression in quotes, you need to duplicate \

// are equivalent
re = new RegExp ("\\ w +")
re = / \ w + /

When searching, you can use most of the features of modern PCRE syntax.

Collapse / Expand Table

Symbol Value
\ For ordinary characters - makes them special. For example, the expression / s / looks for just the character 's'. And if you put \ in front of s, then \ \ s / already denotes a whitespace character. Conversely, if the character is special, for example *, then \ will make it just the usual "asterisk" character. For example, / a * / searches for 0 or more consecutive characters 'a'. To find a with the asterisk 'a *' - put \ before spets. by the symbol: / a \ * / .
^ Indicates the beginning of the input data. If the multiline search flag ("m") is set , it will also work when a new line starts. For example, / ^ A / does not find 'A' in "an A", but finds the first 'A' in "An A."
$ Indicates the end of the input data. If the multiline search flag is set, it will also work at the end of the line. For example, / t $ / will not find 't' in the "eater", but will find it in "eat".
* Indicates a repetition of 0 or more times. For example, / bo * / will find 'boooo' in "A ghost booooed" and "b" in "A bird warbled", but will not find anything in "A goat grunted".
+ Indicates a repetition of 1 or more times. It is equivalent to {1,} . For example, / a + / will find 'a' in 'candy' and all 'a' in "caaaaaaaandy".
? Indicates that the item may or may not be present. For example, / e? Le? / Will find 'el' in 'angel' and 'le' in "angle." If used immediately after one of the quantifiers * , + ,? , or {} , it specifies "non-greedy" search (the repetition is the minimum possible number of times, to the nearest next pattern element), as opposed to the "greedy" default mode, in which the number of repetitions is maximum, even if the next pattern element also fits. ,? is used in the preview, which is described in the table under (? =) , (?!) , and (?:) .
. (Decimal point) means any character, except the line feed: \ n \ r \ u2028 or \ u2029. ( you can use [\ s \ S] to search for any character, including line breaks). For example, /.n/ will find 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.
( x ) Finds x and remembers. This is called "memory brackets". For example, / (foo) / finds and remembers 'foo' in "foo bar." The found substring is stored in the search result array or in the predefined properties of the RegExp object: $ 1, ..., $ 9. In addition, the brackets combine what is in them, into a single element of the pattern. For example, (abc) * is a repetition abc 0 or more times.
(?: x ) Finds x , but does not remember what it found. This is called "non-memorizing brackets". The found substring is not stored in the results array and the RegExp properties. Like all brackets, they combine the one found in them into a single subpattern.
x (? = y ) Find x only if x is followed by y . For example, / Jack (? = Sprat) / finds 'Jack' only if it is followed by 'Sprat'. / Jack (? = Sprat | Frost) / will find 'Jack' only if followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' will be included in the search result.
x (?! y ) Find x only if x does not follow y . For example, /\d+(?!\.)/ will find a number only if it is not followed by a decimal point. /\d+(?!\.)/.exec("3.141 ") will find 141, but not 3.141.
x | y Finds x or y . For example, / green | red / will find 'green' in the "green apple" and "red" in the "red apple."
{ n } Where n is a positive integer. Find exactly n repetitions of the preceding element. For example, / a {2} / will not find 'a' in "candy," but will find both a in "caandy," and the first two a in "caaandy."
{ n ,} Where n is a positive integer. Finds n or more repetitions of the element. For example, / a {2,} does not find 'a' in "candy", but finds everything 'a' in "caandy" and in "caaaaaaandy."
{ n , m } Where n and m are positive integers. Find from n to m element repetitions.
[ xyz ] Character set. Finds any of the characters listed. You can specify a gap using a dash. For example, [abcd] is the same as [ad] . Find 'b' in the "brisket", as well as 'a' and 'c' in the "ache".
[^ xyz ] Any character other than those specified in the set. You can also specify a gap. For example, [^ abc] is the same as [^ ac] . Find 'r' in 'brisket' and 'h' in "chop."
[\ b] Finds the backspace character. (Not to be confused with \ b .)
\ b Finds the boundary of words (Latin), for example, a space. (Not to be confused with [\ b] ). For example, / \ bn \ w / will find 'no' in "noonday"; / \ wy \ b / will find 'ly' in "possibly yesterday."
\ B Indicates not the boundary of words. For example, / \ w \ Bn / finds 'on' in "noonday", and / y \ B \ w / finds 'ye' in "possibly yesterday."
\ c X Where X is the letter from A to Z. Indicates the control character in the line. For example, / \ cM / denotes the character Ctrl-M.
\ d finds a number from any alphabet (we have the same Unicode). Use [0-9] to find only normal digits. For example, / \ d / or / [0-9] / will find '2' in "B2 is the suite number."
\ D Find a non-numeric character (all alphabets). [^ 0-9] is the equivalent for ordinary digits. For example, / \ D / or / [^ 0-9] / will find 'B' in "B2 is the suite number."
\ f, \ r, \ n The corresponding special characters form-feed, line-feed, line feed.
\ s Finds any whitespace character, including a space, tab, line breaks, and other unicode whitespace characters. For example, / \ s \ w * / will find 'bar' in the "foo bar."
\ S Finds any character except for the space character. For example, / \ S \ w * / will find 'foo' in "foo bar."
\ t The tab character.
\ v The symbol of vertical tabulation.
\ w Find any verbal (Latin alphabet) character, including letters, numbers and an underscore. Equivalent to [A-Za-z0-9_] . For example, / \ w / will find 'a' in "apple," '5' in "$ 5.28," and "3" in "3D."
\ W Find any non-latin verbal symbol. Equivalent to [^ A-Za-z0-9_] . For example, / \ W / and / [^ $ A-Za-z0-9 _] / will equally find '%' in "50%."
\ n where n is an integer. A backward reference to the nth memorized substring. For example, / apple (,) \ sorange \ 1 / will find 'apple, orange,' in "apple, orange, cherry, peach.". There is a more complete example behind the table.
\ 0 Find the NUL character. Do not add other numbers to the end.
\ x hh Find the character with the code hh (2 hexadecimal digits)
\ u hhhh Find the character with the code hhhh (4 hexadecimal digits).

To just test whether a string matches a regular expression, use the test method:

if (/\s/.test( "line")) {
... There are spaces in the line! ...
}

The exec method returns an array and sets the properties of the regular expression.
If there are no matches, null is returned.

For example,

// Find one d, followed by 1 or more b, followed by one d
// Remember found b and the next d
// Register-independent search
var myRe = / d (b +) (d) / ig;
var myArray = myRe.exec ("cdbBdbsbz");

As a result of the script execution, there will be such results:

An object Property / Index Descriptions Example
myArray Contents of myArray . ["dbBd", "bB", "d"]
index Match Index (from 0) 1
input The source string. cdbBdbsbz
[0] Recent matched characters dbBd
[1], ... [ n ] Matches in nested brackets, if any. The number of nested brackets is unlimited. [1] = bB
[2] = d
myRe lastIndex The index from which to start the next search. 5
ignoreCase Indicates that a case-insensitive search has been enabled, the " i " flag. true
global Indicates that the " g " flag of all matches was turned on. true
multiline Indicates whether the multi-line search flag " m " has been enabled. false
source Pattern text. d (b +) (d)

If the " g " flag is included in the regular expression, you can call the exec method many times to find successive matches on the same line. When you do this, the search begins on the substring str , with the index lastIndex . For example, here is a script:

var myRe = / ab * / g;
var str = "abbcdefabh";
while ((myArray = myRe.exec (str))! = null) {
var msg = "Found" + myArray [0] + ".";
msg + = "Next match starts at" + myRe.lastIndex;
print (msg);
}

This script will display the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 9

In the following example, the function searches for input. Then loop through the array to see if there are any other names.

It is assumed that all registered names are in array A:

var A = [Frank, Emily, Jane, Harry, Nick, Beth, Rick,
Terrence, Carol, Ann, Terry, Frank, Alice, Rick,
"Bill", "Tom", "Fiona", "Jane", "William", "Joan", "Beth"];

function lookup (input)
{
var firstName = /\w +/i.exec(input);
if (! firstName)
{
print (input + "is not a name!");
return;
}

var count = 0;
for (var i = 0; i <A.length; i ++)
{
if (firstName [0] .toLowerCase () == A [i] .toLowerCase ())
count ++;
}
var midstring = (count == 1)? "other has": "others have";
print ("Thanks," + count + midstring + "the same name!")
}

The following methods work with regular expressions from strings.

All methods, except replace, can be called both with objects of type regexp in arguments, and with strings that are automatically converted to RegExp objects.

So the calls are equivalent:

var i = str.search (/ \ s /) var i = str.search ("\\ s")

When using quotes, you need to duplicate \ and there is no possibility to specify flags, so sometimes the full form is also convenient

var i = str.search (new RegExp ("\\ s", "g"))

Returns the index of the regular expression in a string, or -1.

If you want to know if a string matches a regular expression, use the search method (similar to the RegExp test methods). To get more information, use the slower match method (similar to the RegExp exec method).

This example displays a message, depending on whether the string matches the regular expression.

function testinput (re, str) {
if (str.search (re)! = -1)
midstring = "contains";
else
midstring = "does not contain";
document.write (str + midstring + re.source);
}

If regexp does not have the g flag, it returns the same result as regexp.exec (string) .

If in regexp there is a flag g , then returns an array with all matches.

To simply find out if a string matches the regexp regular expression, use regexp.test (string) .

If you want to get the first result, try r egexp.exec (string) .

In the following example, match is used to find "Chapter", followed by 1 or more digits, followed by digits separated by a period. In the regular expression, there is a flag i , so the register will be ignored.

str = "For more information, see Chapter 3.4.5.1";
re = / chapter (\ d + (\. \ d) *) / i;
found = str.match (re);
alert (found);

The script will return an array of matches:

  • Chapter 3.4.5.1 - fully matched string
  • 3.4.5.1. First Parenthesis
  • .1 - inner bracket

The following example demonstrates the use of global and case-insensitive search flags with match . There will be found all the letters from A to E and from a to e, each - in a separate element of the array.

var str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
var regexp = / [AE] / gi;
var matches = str.match (regexp);
document.write (matches);
// matches = [A, B, C, D, E, A, B, C, D, e)

The replace method can replace occurrences of a regular expression not only with a string, but with the result of a function. Its full syntax is this:

var newString = str.replace (regexp / substr, newSubStr / function)
regexp
The RegExp object. Its occurrences will be replaced by a value that will return parameter number 2
substr
A string that will be replaced by newSubStr .
newSubStr
A string that replaces a substring from argument number 1.
function
A function that can be called to generate a new substring (to substitute it for the substring obtained from argument 1).

The replace method does not change the line on which it is called, but simply returns a new, modified string.

To implement a global replacement, include the "g" flag in the regular expression.

If the first argument is a string, it is not converted to a regular expression, so for example,

var ab = "ab" .replace ("\\ s", "..") // = "ab"

The call to replace left the string unchanged, because I did not look for the regular expression \ s , but for the string "\ s".

In the replacement line there can be such special characters:

Pattern Inserts
$$ Inserts "$".
$ & Inserts the found substring.
$ ` Inserts a portion of the string that precedes the occurrence of the occurrence.
$ ' Inserts a portion of the string that follows the occurrence of the occurrence.
$ n or $ nn Where n or nn are decimal digits, inserts an occurrence substring stored with the nth nested parenthesis, if the first argument is a RegExp object.

If you specify a function as the second parameter, it is executed every time it matches.

In a function, you can dynamically generate and return a substitution string.

The first parameter of the function is the found substring. If the first argument to replace is a RegExp object, then the next n parameters contain matches from nested parentheses. The last two parameters are the position in the line on which the match occurred and the string itself.

For example, the following call replace will return XXzzzz - XX, zzzz.

function replacer (str, p1, p2, offset, s)
{
return str + "-" + p1 + "," + p2;
}
var newString = "XXzzzz" .replace (/ (X *) (z *) /, replacer)

As you can see, there are two parentheses in the regular expression, and therefore there are two parameters p1 , p2 in the function.
If there were three parentheses, the function would have to add the parameter p3 .

The following function replaces the words borderTop to border-top :

function styleHyphenFormat (propertyName)
{
function upperToHyphenLower (match)
{
return '-' + match.toLowerCase ();
}
return propertyName.replace (/ [AZ] /, upperToHyphenLower);
}

For a common understanding of regular expressions, you can read the article in wikipedia .

They are described in more detail in the book Beginning Regular Expressions .