This page has been robot translated, sorry for typos if any. Original content here.

Regular expressions and special characters

Regular expressions in javascript have a special short form and standard PCRE syntax.

They work through a special RegExp object.

In addition, the lines have their own methods search , match , replace , but in order to understand them, let us first analyze RegExp .

An object of type RegExp , or, in short, a regular expression, can be created in two ways.

/ pattern / flags
new RegExp ("pattern" [, flags])

pattern is a regular expression to search for (replacement is later), and flags are a string of any combination of the characters g (global search), i (case insensitive) and m (multi-line search).

The first method is often used, the second - sometimes. For example, two such calls are equivalent:

var reg = / ab + c / i
var reg = new RegExp ("ab + c", "i")

On the second call - because the regular expression is in quotes, then you need to duplicate \

// equivalent
re = new RegExp ("\\ w +")
re = / \ w + /

When searching, you can use most of the features of modern PCRE syntax.

Collapse / Expand Table

Symbol Value
\ For ordinary characters - makes them special. For example, the expression / s / looks for just the character 's'. And if you put \ before s, then \ \ s / already stands for whitespace. And vice versa, if a character is special, for example *, then \ will make it just an ordinary asterisk. For example, / a * / searches for 0 or more consecutive 'a' characters. To find a with an asterisk 'a *' - put \ before spec. symbol: / a \ * / .
^ Indicates the beginning of the input data. If the multiline search flag is set ("m") , it will also work when a new line is started. For example, / ^ A / will not find 'A' in "an A", but will find the first 'A' in "An A."
$ Indicates the end of the input data. If the multiline search flag is set, it will also work at the end of the line. For example, / t $ / will not find the 't' in the "eater", but will find it in the "eat".
* Indicates repetition of 0 or more times. For example, / bo * / will find 'boooo' in "A ghost booooed" and 'b' in "A bird warbled", but will not find anything in "A goat grunted".
+ Indicates repetition 1 or more times. Equivalent to {1,} . For example, / a + / will find 'a' in "candy" and all 'a' in "caaaaaaandy".
? Indicates that the element may or may not be present. For example, / e? Le? / Will find 'el' in "angel" and 'le' in "angle." If used immediately after one of the quantifiers * , + ,? , or {} , it sets a non-greedy search (repeating the minimum possible number of times, up to the next next element of the pattern), as opposed to the default greedy mode, where the number of repetitions is maximum, even if the next element of the pattern is also suitable. ,? used in the preview, which is described in the table under (? =) , (?!) , and (?:) .
. (Decimal point) means any character except the newline: \ n \ r \ u2028 or \ u2029. ( you can use [\ s \ S] to search for any character, including line breaks). For example, /.n/ will find 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.
( x ) Finds x and remembers. This is called "memory brackets". For example, / (foo) / will find and remember 'foo' in "foo bar." The found substring is stored in the search result array or in the predefined properties of the RegExp object: $ 1, ..., $ 9. In addition, brackets combine what is in them into a single element of the pattern. For example, (abc) * - repeating abc 0 or more times.
(?: x ) Finds x , but does not remember what was found. This is called “non-memory brackets”. The found substring is not stored in the results array and RegExp properties. Like all brackets, they combine the ones in them into a single subpattern.
x (? = y ) Finds x only if x follows x . For example, / Jack (? = Sprat) / will find 'Jack' only if it is followed by 'Sprat'. / Jack (? = Sprat | Frost) / will find 'Jack' only if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' will enter the search result.
x (?! y ) Finds x only if x is not followed by y . For example, /\d+(?!\.)/ will find the number only if it is not followed by a decimal point. /\d+(?!\.)/.exec ("3.141") will find 141, but not 3.141.
x | y Find x or y . For example, / green | red / will find 'green' in "green apple" and 'red' in "red apple."
{ n } Where n is a positive integer. Finds exactly n repetitions of the previous item. For example, / a {2} / will not find 'a' in "candy," but will find both a in "caandy," and the first two a in "caaandy."
{ n ,} Where n is a positive integer. Finds n or more repetitions of the item. For example, / a {2,} will not find 'a' in "candy", but will find all 'a' in "caandy" and in "caaaaaaandy."
{ n , m } Where n and m are positive integers. Find from n to m repetitions of the element.
[ xyz ] Character set Finds any of the listed characters. You can specify a gap using a dash. For example, [abcd] is the same as [ad] . Finds 'b' in brisket, as well as a and c in ache.
[^ xyz ] Any character except those specified in the set. You can also specify a gap. For example, [^ abc] is the same as [^ ac] . Will find 'r' in brisket and h in chop.
[\ b] Finds the backspace character. (Not to be confused with \ b .)
\ b Finds the boundary of words (Latin), for example a space. (Not to be confused with [\ b] ). For example, / \ bn \ w / will find 'no' in "noonday"; / \ wy \ b / will find the 'ly' in "possibly yesterday."
\ B Indicates no word boundary. For example, / \ w \ Bn / will find 'on' in "noonday", and / y \ B \ w / will find 'ye' in "possibly yesterday."
\ c X Where X is the letter from A to Z. Denotes the control character in the string. For example, / \ cM / denotes the character Ctrl-M.
\ d finds a digit from any alphabet (we have the same unicode). Use [0-9] to find only regular numbers. For example, / \ d / or / [0-9] / will find '2' in "B2 is the suite number."
\ D Finds a non-numeric character (all alphabets). [^ 0-9] - the equivalent for ordinary numbers. For example, / \ D / or / [^ 0-9] / will find the 'B' in "B2 is the suite number."
\ f, \ r, \ n The corresponding special characters form-feed, line-feed, line feed.
\ s Find any whitespace character, including space, tab, line breaks and other unicode whitespace characters. For example, / \ s \ w * / will find 'bar' in "foo bar."
\ S Find any character except whitespace. For example, / \ S \ w * / will find 'foo' in "foo bar."
\ t Tab character.
\ v Vertical tab character.
\ w Finds any verbal (Latin alphabet) character, including letters, numbers, and underscore. Equivalent to [A-Za-z0-9_] . For example, / \ w / will find 'a' in "apple," '5' in "$ 5.28," and '3' in "3D."
\ W Find any non- (lat.) Word symbol. Equivalent to [^ A-Za-z0-9_] . For example, / \ W / and / [^ $ A-Za-z0-9 _] / will equally find the '%' in "50%."
\ n where n is an integer. Backward reference to the n-th parenthesis memorized substring. For example, / apple (,) \ sorange \ 1 / will find 'apple, orange,' in 'apple, orange, cherry, peach. ". Behind the table is a more complete example.
\ 0 Find the null character. Do not add other numbers to the end.
\ x hh Find a character with the hh code (2 hexadecimal digits)
\ u hhhh Finds the character with the hhhh code (4 hexadecimal digits).

To simply check whether the string is suitable for a regular expression, use the test method:

if (/\s/.test ("string")) {
... There are spaces in the line! ...
}

The exec method returns an array and sets the regular expression properties.
If there is no match, then null is returned.

For example,

// Find one d, followed by 1 or more b, followed by one d
// Remember found b and next d
// case-sensitive search
var myRe = / d (b +) (d) / ig;
var myArray = myRe.exec ("cdbBdbsbz");

As a result of the script will be the following results:

An object Property / Index Descriptions Example
myArray Contents myArray . ["dbBd", "bB", "d"]
index Match index (from 0) one
input Source string cdbBdbsbz
[0] Last matched characters dbBd
[1], ... [ n ] Matches in nested brackets, if any. The number of nested brackets is unlimited. [1] = bB
[2] = d
myRe lastIndex The index from which to start the next search. five
ignoreCase Indicates that a case-insensitive search was enabled, the " i " flag. true
global Indicates that the match search flag " g " was turned on. true
multiline Shows whether the " m " multiline search flag has been enabled. false
source The text of the pattern. d (b +) (d)

If the g flag is enabled in a regular expression, you can call the exec method many times to search for consecutive matches in the same string. When you do this, the search begins on the substring str , with the index lastIndex . For example, here is the script:

var myRe = / ab * / g;
var str = "abbcdefabh";
while ((myArray = myRe.exec (str))! = null) {
var msg = "Found" + myArray [0] + ".";
msg + = "Next match starts at" + myRe.lastIndex;
print (msg);
}

This script will output the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 9

In the following example, the function searches for input. It then loops through the array to see if there are other names.

It is assumed that all registered names are in array A:

var A = ["Frank", "Emily", "Jane", "Harry", "Nick", "Beth", "Rick",
"Terrence", "Carol", "Ann", "Terry", "Frank", "Alice", "Rick",
"Bill", "Tom", "Fiona", "Jane", "William", "Joan", "Beth"];

function lookup (input)
{
var firstName = /\w+/i.exec (input);
if (! firstName)
{
print (input + "isn't a name!");
return;
}

var count = 0;
for (var i = 0; i <A.length; i ++)
{
if (firstName [0] .toLowerCase () == A [i] .toLowerCase ())
count ++;
}
var midstring = (count == 1)? "other has": "others have";
print ("Thanks," + count + midstring + "the same name!")
}

The following methods work with regular expressions from strings.

All methods, except for replace, can be called both with objects of regexp type in arguments, and with strings that are automatically converted into RegExp objects.

So the calls are equivalent:

var i = str.search (/ \ s /) var i = str.search ("\\ s")

When using quotes, you need to duplicate \ and there is no way to specify flags, so sometimes the full form is convenient

var i = str.search (new RegExp ("\\ s", "g"))

Returns the index of the regular expression in a string, or -1.

If you want to know if a string is suitable for a regular expression, use the search method (similar to the RegExp test methods). For more information, use the slower match method (similar to the RegExp exec method).

This example displays a message, depending on whether the string is suitable for a regular expression.

function testinput (re, str) {
if (str.search (re)! = -1)
midstring = "contains";
else
midstring = "does not contain";
document.write (str + midstring + re.source);
}

If there is no g flag in regexp, then it returns the same result as regexp.exec (string) .

If the regexp has the g flag, then returns an array with all matches.

To simply find out if the string matches the regexp regular expression, use regexp.test (string) .

If you want to get the first result - try r egexp.exec (string) .

In the following example, match is used to find "Chapter," followed by 1 or more digits, followed by numbers separated by a dot. In a regular expression, there is an i flag, so the register will be ignored.

str = "For more information, see Chapter 3.4.5.1";
re = / chapter (\ d + (\. \ d) *) / i;
found = str.match (re);
alert (found);

The script will return an array of matches:

  • Chapter 3.4.5.1 - fully matched line
  • 3.4.5.1 - the first bracket
  • .1 - internal bracket

The following example demonstrates the use of global and case-insensitive search flags with match . All letters from A to E and from a to e will be found, each in a separate element of the array.

var str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
var regexp = / [AE] / gi;
var matches = str.match (regexp);
document.write (matches);
// matches = ['A', 'B', 'C', 'D', 'E', 'a', 'b', 'c', 'd', 'e']

The replace method can replace occurrences of a regular expression not only with a string, but also with the result of the function execution. Its full syntax is:

var newString = str.replace (regexp / substr, newSubStr / function)
regexp
RegExp object. Its entries will be replaced by the value returned by parameter number 2
substr
The string to be replaced by newSubStr .
newSubStr
The string that replaces the substring in argument number 1.
function
The function that can be called to generate a new substring (to substitute it for the substring obtained from argument 1).

The replace method does not change the line on which it was called, but simply returns a new, modified line.

To make a global replacement, include the flag "g" in the regular expression.

If the first argument is a string, then it is not converted to a regular expression, so, for example,

var ab = "ab" .replace ("\\ s", "..") // = "ab"

The call to replace left the string unchanged, because it was not the regular expression \ s that was searched for, but the string "\ s".

In the replacement string can be such special characters:

Pattern Inserts
$$ Inserts "$".
$ & Inserts the found substring.
$ ` Inserts the part of the line that precedes the found entry.
$ ' Inserts the part of the line that comes after the occurrence found.
$ n or $ nn Where n or nn are decimal digits, inserts the substring of the entry stored by the nth nested bracket, if the first argument is a RegExp object.

If you specify a function as the second parameter, then it is executed with every match.

In a function, you can dynamically generate and return a substitution string.

The first parameter of the function is the found substring. If the first argument of the replace object is a RegExp object, then the following n parameters contain matches from the nested brackets. The last two parameters are the position in the line at which the coincidence occurred and the line itself.

For example, the following call to replace will return XXzzzz - XX, zzzz.

function replacer (str, p1, p2, offset, s)
{
return str + "-" + p1 + "," + p2;
}
var newString = "XXzzzz" .replace (/ (X *) (z *) /, replacer)

As you can see, there are two brackets in the regular expression, and therefore in the function are two parameters p1 , p2 .
If there were three brackets, then the p3 parameter would have to be added to the function.

The following function replaces words like borderTop with border-top :

function styleHyphenFormat (propertyName)
{
function upperToHyphenLower (match)
{
return '-' + match.toLowerCase ();
}
return propertyName.replace (/ [AZ] /, upperToHyphenLower);
}

For a common understanding of regular expressions, you can read the article in wikipedia .

In more detail they are described in the book (eng.) Beginning Regular Expressions .