This page has been robot translated, sorry for typos if any. Original content here.

Regular expressions and special characters

Regular expressions in javascript have a special short form and standard PCRE syntax.

They work through a special RegExp object.

In addition, strings have their own methods search , match , replace , but in order to understand them, we will analyze RegExp first.

An object of type RegExp , or, in short, a regular expression, can be created in two ways

/ pattern / flags
new RegExp ("pattern" [, flags])

pattern is the regular expression to search for (later about replacement), and the flags are a string from any combination of characters g (global search), i (case insensitive) and m (multi-line search).

The first method is used often, the second - sometimes. For example, two such calls are equivalent:

var reg = / ab + c / i
var reg = new RegExp ("ab + c", "i")

At the second call - because the regular expression is in quotation marks, then you need to duplicate \

// equivalent
re = new RegExp ("\\ w +")
re = / \ w + /

When searching, you can use most of the features of modern PCRE syntax.

Collapse / Expand Table

Symbol Value
\ For ordinary characters - makes them special. For example, the expression / s / searches simply for the character 's'. And if you put \ in front of s, then / \ s / already denotes a whitespace character. And vice versa, if the character is special, for example *, then \ will make it just an ordinary asterisk character. For example, / a * / searches for 0 or more consecutive 'a' characters. To find a with an asterisk 'a *' - put \ in front of the spec. character: / a \ * / .
^ Indicates the beginning of the input. If the multi-line search flag ("m") is set , it will also work when starting a new line. For example, / ^ A / will not find 'A' in "an A", but it will find the first 'A' in "An A."
$ Indicates the end of the input. If the multi-line search flag is set, it will also work at the end of the line. For example, / t $ / will not find 't' in "eater", but will find it in "eat".
* Represents a repeat of 0 or more times. For example, / bo * / will find 'boooo' in "A ghost booooed" and 'b' in "A bird warbled", but will not find anything in "A goat grunted".
+ Indicates a repeat of 1 or more times. Equivalent to {1,} . For example, / a + / will find 'a' in "candy" and all 'a' in "caaaaaaandy".
? Indicates that an element can be either present or absent. For example, / e? Le? / Will find 'el' in "angel" and 'le' in "angle." If used immediately after one of the quantifiers * , + ,? , or {} , then sets the "non-greedy" search (repeat as few times as possible until the next next element of the pattern), as opposed to the "greedy" default mode, in which the number of repetitions is maximum, even if the next element of the pattern is also suitable. In addition ,? used in the preview, which is described in the table under (? =) , (?!) , and (?:) .
. (Decimal point) denotes any character except line feed: \ n \ r \ u2028 or \ u2029. ( you can use [\ s \ S] to search for any character, including line feeds). For example, /.n/ will find 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.
( x ) Finds x and remembers. This is called "memory brackets." For example, / (foo) / will find and remember 'foo' in "foo bar." The found substring is stored in the search result array or in the predefined properties of the RegExp object: $ 1, ..., $ 9. In addition, the brackets combine what is in them into a single element of the pattern. For example, (abc) * - repeating abc 0 or more times.
(?: x ) Finds x , but does not remember what was found. This is called "unrecorded brackets." The found substring is not stored in the array of results and RegExp properties. Like all brackets, they combine what is in them into a single subpattern.
x (? = y ) Finds x only if x is followed by y . For example, / Jack (? = Sprat) / will find 'Jack' only if it is followed by 'Sprat'. / Jack (? = Sprat | Frost) / will find 'Jack' only if followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' will be included in the search result.
x (?! y ) Finds x only if x is not followed by y . For example, /\d+(?!\.)/ will find a number only if it is not followed by a decimal point. /\d+(?!\.)/.exec("3.141 ") will find 141, but not 3.141.
x | y Finds x or y . For example, / green | red / will find 'green' in "green apple" and 'red' in "red apple."
{ n } Where n is a positive integer. Finds exactly n repetitions of the previous item. For example, / a {2} / will not find 'a' in "candy," but it will find both a in "caandy," and the first two a in "caaandy."
{ n ,} Where n is a positive integer. Finds n or more repetitions of an element. For example, / a {2,} will not find 'a' in "candy", but will find all 'a' in "caandy" and in "caaaaaaandy."
{ n , m } Where n and m are positive integers. Find n to m repetitions of an element.
[ xyz ] Character set. Finds any of the listed characters. You can specify the gap using a dash. For example, [abcd] is the same as [ad] . Finds 'b' in "brisket", as well as 'a' and 'c' in "ache".
[^ xyz ] Any character other than those specified in the set. You can also specify a gap. For example, [^ abc] is the same as [^ ac] . Finds 'r' in "brisket" and 'h' in "chop."
[\ b] Finds the backspace character. (Not to be confused with \ b .)
\ b Finds the border of words (Latin), for example, a space. (Not to be confused with [\ b] ). For example, / \ bn \ w / will find 'no' in "noonday"; / \ wy \ b / will find 'ly' in "possibly yesterday."
\ B Indicates no word boundary. For example, / \ w \ Bn / finds 'on' in "noonday", and / y \ B \ w / finds 'ye' in "possibly yesterday."
\ c X Where X is a letter from A to Z. Indicates the control character in the string. For example, / \ cM / denotes the character Ctrl-M.
\ d finds a digit from any alphabet (we have Unicode). Use [0-9] to find only regular numbers. For example, / \ d / or / [0-9] / will find '2' in "B2 is the suite number."
\ D Find a non-digital character (all alphabets). [^ 0-9] is the equivalent for ordinary digits. For example, / \ D / or / [^ 0-9] / will find 'B' in "B2 is the suite number."
\ f, \ r, \ n Corresponding special characters form-feed, line-feed, line feed.
\ s Find any whitespace character, including space, tab, line feeds, and other unicode whitespace characters. For example, / \ s \ w * / will find 'bar' in "foo bar."
\ S Find any character except whitespace. For example, / \ S \ w * / will find 'foo' in "foo bar."
\ t Tab character.
\ v Vertical tab character.
\ w Find any verbal (latin alphabet) character, including letters, numbers, and the underscore. Equivalent to [A-Za-z0-9_] . For example, / \ w / will find 'a' in "apple," '5' in "$ 5.28," and '3' in "3D."
\ W Find any non- (lat.) Word symbol. Equivalent to [^ A-Za-z0-9_] . For example, / \ W / and / [^ $ A-Za-z0-9 _] / will equally find '%' in "50%."
\ n where n is an integer. Link back to the nth parenthesized substring. For example, / apple (,) \ sorange \ 1 / finds 'apple, orange,' in "apple, orange, cherry, peach.". Behind the table there is a more complete example.
\ 0 Find the NUL character. Do not add other numbers to the end.
\ x hh Find the character with the hh code (2 hexadecimal digits)
\ u hhhh Find the character with the code hhhh (4 hexadecimal digits).

To simply check if a string matches a regular expression, the test method is used:

if (/\s/.test("line ")) {
... There are spaces in the line! ...
}

The exec method returns an array and sets the properties of the regular expression.
If there are no matches, null is returned.

For example,

// Find one d, followed by 1 or more b, followed by one d
// Remember the found b and the next d
// Case-insensitive search
var myRe = / d (b +) (d) / ig;
var myArray = myRe.exec ("cdbBdbsbz");

As a result of the script, the results are as follows:

An object Property / Index Descriptions Example
myArray The contents of myArray . ["dbBd", "bB", "d"]
index Match Index (from 0) one
input Source string cdbBdbsbz
[0] Last matching characters dbBd
[1], ... [ n ] Matches in parentheses, if any. The number of parentheses is not limited. [1] = bB
[2] = d
myRe lastIndex The index from which to start the next search. five
ignoreCase Indicates that case-insensitive search was enabled, flag " i ". true
global Indicates that the " g " flag was turned on to search for all matches. true
multiline Indicates whether the multi-line search flag " m " has been enabled. false
source The text of the pattern. d (b +) (d)

If the " g " flag is enabled in a regular expression, you can call the exec method many times to find consecutive matches on the same line. When you do this, the search begins on the substring str , with the index lastIndex . For example, here is a script like this:

var myRe = / ab * / g;
var str = "abbcdefabh";
while ((myArray = myRe.exec (str))! = null) {
var msg = "Found" + myArray [0] + ".";
msg + = "Next match starts at" + myRe.lastIndex;
print (msg);
}

This script will output the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 9

In the following example, the function searches for input. Then loop through the array to see if there are any other names.

It is assumed that all registered names are in array A:

var A = ["Frank", "Emily", "Jane", "Harry", "Nick", "Beth", "Rick",
"Terrence", "Carol", "Ann", "Terry", "Frank", "Alice", "Rick",
"Bill", "Tom", "Fiona", "Jane", "William", "Joan", "Beth"];

function lookup (input)
{
var firstName = /\w+/i.exec(input);
if (! firstName)
{
print (input + "isn't a name!");
return
}

var count = 0;
for (var i = 0; i <A.length; i ++)
{
if (firstName [0] .toLowerCase () == A [i] .toLowerCase ())
count ++;
}
var midstring = (count == 1)? "other has": "others have";
print ("Thanks," + count + midstring + "the same name!")
}

The following methods work with regular expressions from strings.

All methods, except replace, can be called both with objects of type regexp in arguments, and with strings that are automatically converted to RegExp objects.

So the calls are equivalent:

var i = str.search (/ \ s /) var i = str.search ("\\ s")

When using quotes, you need to duplicate \ and there is no way to specify the flags, so sometimes the full form is also convenient

var i = str.search (new RegExp ("\\ s", "g"))

Returns the regular expression index in a string, or -1.

If you want to know if a string matches a regular expression, use the search method (similar to the RegExp test methods). For more information, use the slower match method (similar to RegExp exec ).

This example displays a message, depending on whether the string matches the regular expression.

function testinput (re, str) {
if (str.search (re)! = -1)
midstring = "contains";
else
midstring = "does not contain";
document.write (str + midstring + re.source);
}

If regexp does not have the g flag, then returns the same result as regexp.exec (string) .

If regexp has the g flag, it returns an array with all matches.

To simply find out if a string matches the regexp regular expression, use regexp.test (string) .

If you want to get the first result, try r egexp.exec (string) .

In the following example, match is used to find "Chapter" followed by 1 or more digits, and then numbers separated by a period. There is an i flag in the regular expression, so the case will be ignored.

str = "For more information, see Chapter 3.4.5.1";
re = / chapter (\ d + (\. \ d) *) / i;
found = str.match (re);
alert (found);

The script will return an array of matches:

  • Chapter 3.4.5.1 - a completely matching line
  • 3.4.5.1 - first bracket
  • .1 - inner bracket

The following example demonstrates the use of global and case-insensitive search flags with match . All letters from A to E and from a to e will be found, each in a separate element of the array.

var str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
var regexp = / [AE] / gi;
var matches = str.match (regexp);
document.write (matches);
// matches = ['A', 'B', 'C', 'D', 'E', 'a', 'b', 'c', 'd', 'e']

The replace method can replace occurrences of a regular expression not only with a string, but also with the result of the function execution. Its full syntax is:

var newString = str.replace (regexp / substr, newSubStr / function)
regexp
RegExp object. Its occurrences will be replaced by a value that parameter number 2 will return
substr
The string to be replaced with newSubStr .
newSubStr
A string that replaces the substring from argument number 1.
function
A function that can be called to generate a new substring (to substitute it for the substring obtained from argument 1).

The replace method does not change the line on which it is called, but simply returns a new, changed line.

To perform a global replacement, include the "g" flag in the regular expression.

If the first argument is a string, then it will not be converted to a regular expression, so, for example,

var ab = "ab" .replace ("\\ s", "..") // = "ab"

The replace call left the line unchanged, because it was not looking for the regular expression \ s , but the string "\ s".

The replacement string may contain such special characters:

Pattern Inserts
$$ Inserts "$".
$ & Inserts the found substring.
$ ` Inserts the portion of the line that precedes the found entry.
$ ' Inserts the part of the line that comes after the found entry.
$ n or $ nn Where n or nn are decimal digits, inserts the substring of occurrence remembered by the n- th parenthesis, if the first argument is a RegExp object.

If you specify a function as the second parameter, then it is executed at each match.

In a function, you can dynamically generate and return a substitution string.

The first parameter of the function is the substring found. If the first argument to replace is a RegExp object, then the next n parameters contain matches from nested brackets. The last two parameters are the position in the line at which the match occurred and the line itself.

For example, the next call to replace will return XXzzzz - XX, zzzz.

function replacer (str, p1, p2, offset, s)
{
return str + "-" + p1 + "," + p2;
}
var newString = "XXzzzz" .replace (/ (X *) (z *) /, replacer)

As you can see, there are two brackets in the regular expression, and therefore the function has two parameters p1 , p2 .
If there were three brackets, then the parameter p3 would have to be added to the function.

The following function replaces borderTop words with border-top :

function styleHyphenFormat (propertyName)
{
function upperToHyphenLower (match)
{
return '-' + match.toLowerCase ();
}
return propertyName.replace (/ [AZ] /, upperToHyphenLower);
}

For a general understanding of regular expressions, you can read the Article on wikipedia .

They are described in more detail in the book (English) Beginning Regular Expressions .