Regular expressions and special characters

Regular expressions in javascript have a special short form and a standard PCRE syntax.

They work through a special RegExp object.

In addition, the strings have their own methods search , match , replace , but in order to understand them, we will first analyze RegExp .

An object of type RegExp , or, more briefly, a regular expression, can be created in two ways

/ Pattern / flags
New RegExp ("pattern" [, flags])

Pattern is a regular expression for the search (about replacement later), and flags are a string of any combination of characters g (global search), i (case insignificant), and m (multiline search).

The first method is used often, the second - sometimes. For example, two such calls are equivalent:

Var reg = / ab + c / i
Var reg = new RegExp ("ab + c", "i")

At the second call - because the regular expression in quotation marks, it is necessary to duplicate \

// are equivalent
Re = new RegExp ("\\ w +")
Re = / \ w + /

When searching, you can use most of the features of modern PCRE syntax.

Collapse / Expand Table

Symbol Value
\ For ordinary characters - makes them special. For example, the expression / s / looks for just the character 's'. And if you put \ in front of s, then \ \ s / already denotes a whitespace character. Conversely, if the character is special, for example *, then \ will make it just the usual "asterisk" character. For example, / a * / searches for 0 or more consecutive characters 'a'. To find a with an asterisk 'a *' - put \ before spets. By the symbol: / a \ * / .
^ Indicates the beginning of the input data. If the multiline search flag ("m") is set , it will also work when a new line starts. For example, / ^ A / does not find 'A' in "an A", but finds the first 'A' in "An A."
$ Indicates the end of the input data. If the multiline search flag is set, it will also work at the end of the line. For example, / t $ / will not find 't' in the "eater", but will find it in "eat".
* Indicates a repetition of 0 or more times. For example, / bo * / will find 'boooo' in 'A ghost booooed' and 'b' in 'A bird warbled', but will not find anything in "A goat grunted".
+ Indicates a repetition of 1 or more times. It is equivalent to {1,} . For example, / a + / will find 'a' in 'candy' and all 'a' in "caaaaaaandy".
? Indicates that the element can either be present or not. For example, / e? Le? / Will find 'el' in 'angel' and 'le' in "angle." If used immediately after one of the quantifiers * , + ,? , Or {} , it specifies "non-greedy" search (repetition the minimum possible number of times, to the nearest next pattern element), as opposed to the "greedy" default mode, in which the number of repetitions is maximal even if the next pattern element also fits. ,? Is used in the preview, which is described in the table under (? =) , (?!) , And (?:) .
. (Decimal point) means any character, except the line feed: \ n \ r \ u2028 or \ u2029. ( You can use [\ s \ S] to search for any character, including line breaks). For example, /.n/ will find 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.
( X ) Finds x and remembers. This is called "memory brackets". For example, / (foo) / finds and remembers 'foo' in "foo bar." The found substring is stored in the search result array or in the predefined properties of the RegExp object: $ 1, ..., $ 9. In addition, the brackets combine what is in them, into a single element of the pattern. For example, (abc) * is a repetition abc 0 or more times.
(?: X ) Finds x , but does not remember what it found. This is called "non-memorizing brackets". The found substring is not stored in the results array and the RegExp properties. Like all brackets, they combine the one found in them into a single subpattern.
X (? = Y ) Find x only if x is followed by y . For example, / Jack (? = Sprat) / will find 'Jack' only if followed by 'Sprat'. / Jack (? = Sprat | Frost) / will find 'Jack' only if followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' will be included in the search result.
X (?! Y ) Find x only if x does not follow y . For example, /\d+(?!\.)/ will find a number only if it is not followed by a decimal point. /\d+(?!\.)/.exec("3.141 ") will find 141, but not 3.141.
X | Y Finds x or y . For example, / green | red / will find 'green' in the 'green apple' and 'red' in the "red apple."
{ N } Where n is a positive integer. Find exactly n repetitions of the preceding element. For example, / a {2} / will not find 'a' in "candy," but will find both a in "caandy," and the first two a in "caaandy."
{ N ,} Where n is a positive integer. Finds n and more repetitions of the element. For example, / a {2,} does not find 'a' in "candy", but finds everything 'a' in "caandy" and in "caaaaaaandy."
{ N , m } Where n and m are positive integers. Find from n to m element repetitions.
[ Xyz ] Character set. Finds any of the following characters. You can specify a gap using a dash. For example, [abcd] is the same as [ad] . Find 'b' in the "brisket", as well as 'a' and 'c' in the "ache".
[^ Xyz ] Any character other than those specified in the set. You can also specify a gap. For example, [^ abc] is the same as [^ ac] . Find 'r' in the "brisket" and 'h' in the "chop."
[\ B] Finds the backspace character. (Not to be confused with \ b .)
\ B Finds the boundary of words (Latin), for example, a space. (Not to be confused with [\ b] ). For example, / \ bn \ w / will find 'no' in "noonday"; / \ Wy \ b / will find 'ly' in "possibly yesterday."
\ B Denotes not the boundary of words. For example, / \ w \ Bn / will find 'on' in "noonday", and / y \ B \ w / will find 'ye' in "possibly yesterday."
\ C X Where X is the letter from A to Z. Indicates the control character in the line. For example, / \ cM / denotes the character Ctrl-M.
\ D Finds a number from any alphabet (we have the same Unicode). Use [0-9] to find only normal digits. For example, / \ d / or / [0-9] / will find '2' in "B2 is the suite number."
\ D Find a non-numeric character (all alphabets). [^ 0-9] is the equivalent for ordinary digits. For example, / \ D / or / [^ 0-9] / will find 'B' in "B2 is the suite number."
\ F, \ r, \ n The corresponding special characters form-feed, line-feed, line feed.
\ S Finds any whitespace character, including a space, tab, line breaks, and other unicode whitespace characters. For example, / \ s \ w * / will find 'bar' in the "foo bar."
\ S Find any character except for the space character. For example, / \ S \ w * / will find 'foo' in "foo bar."
\ T A tab character.
\ V The character of vertical tabulation.
\ W Find any verbal (Latin alphabet) character, including letters, numbers and an underscore. Equivalent to [A-Za-z0-9_] . For example, / \ w / will find 'a' in "apple," '5' in "$ 5.28," and "3" in "3D."
\ W Find any non-latin verbal symbol. Equivalent to [^ A-Za-z0-9_] . For example, / \ W / and / [^ $ A-Za-z0-9 _] / will equally find '%' in "50%."
\ N Where n is an integer. A back reference to the nth stored substring. For example, / apple (,) \ sorange \ 1 / will find 'apple, orange,' in "apple, orange, cherry, peach.". There is a more complete example behind the table.
\ 0 Find the NUL character. Do not add other numbers to the end.
\ X hh Find the character with the code hh (2 hexadecimal digits)
\ U hhhh Find a character with the code hhhh (4 hexadecimal digits).

To simply check if the string matches the regular expression, use the test method:

If (/\s/.test( "line")) {
... There are spaces in the line! ...
}

The exec method returns an array and sets the properties of the regular expression.
If there are no matches, null is returned.

For example,

// Find one d, followed by 1 or more b, followed by one d
// Remember found b and the next d
// Register-independent search
Var myRe = / d (b +) (d) / ig;
Var myArray = myRe.exec ("cdbBdbsbz");

As a result of the script execution, the following results will be achieved:

An object Property / Index Descriptions Example
MyArray The contents of myArray . ["DbBd", "bB", "d"]
Index Match Index (from 0) 1
Input The source string. CdbBdbsbz
[0] Last matched characters DbBd
[1], ... [ n ] Matches in nested brackets, if any. The number of nested brackets is unlimited. [1] = bB
[2] = d
MyRe LastIndex The index from which to start the next search. 5
IgnoreCase Indicates that a case-insensitive search has been enabled, the " i " flag. True
Global Indicates that the " g " flag of all matches was turned on. True
Multiline Indicates whether the multiline search flag " m " has been enabled. False
Source Pattern text. D (b +) (d)

If the " g " flag is included in the regular expression, you can call the exec method many times to find successive matches on the same line. When you do this, the search begins on the substring str , with the lastIndex index. For example, here is a script:

Var myRe = / ab * / g;
Var str = "abbcdefabh";
While ((myArray = myRe.exec (str))! = Null) {
Var msg = "Found" + myArray [0] + ".";
Msg + = "Next match starts at" + myRe.lastIndex;
Print (msg);
}

This script will display the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 9

In the following example, the function searches for input. Then loop through the array to see if there are any other names.

It is assumed that all registered names are in array A:

Var A = ["Frank", "Emily", "Jane", "Harry", "Nick", "Beth", "Rick",
"Terrence", "Carol", "Ann", "Terry", "Frank", "Alice", "Rick",
"Bill", "Tom", "Fiona", "Jane", "William", "Joan", "Beth"];

Function lookup (input)
{
Var firstName = /\w +/i.exec(input);
If (! FirstName)
{
Print (input + "is not a name!");
Return;
}

Var count = 0;
For (var i = 0; i <A.length; i ++)
{
If (firstName [0] .toLowerCase () == A [i] .toLowerCase ())
Count ++;
}
Var midstring = (count == 1)? "Other has": "others have";
Print ("Thanks," + count + midstring + "the same name!")
}

The following methods work with regular expressions from strings.

All methods, except replace, can be called both with objects of type regexp in arguments, and with strings that are automatically converted to RegExp objects.

So the calls are equivalent:

Var i = str.search (/ \ s) var i = str.search ("\\ s")

When using quotes, you need to duplicate \ and there is no possibility to specify flags, so sometimes the full form is also convenient

Var i = str.search (new RegExp ("\\ s", "g"))

Returns the index of the regular expression in a string, or -1.

If you want to know if a string matches a regular expression, use the search method (similar to the RegExp test methods). To get more information, use the slower match method (similar to the RegExp exec method).

This example outputs a message, depending on whether the string matches the regular expression.

Function testinput (re, str) {
If (str.search (re)! = -1)
Midstring = "contains";
Else
Midstring = "does not contain";
Document.write (str + midstring + re.source);
}

If regexp does not have the g flag, it returns the same result as regexp.exec (string) .

If in regexp there is a flag g , then returns an array with all matches.

To just find out if the string matches the regexp regular expression, use regexp.test (string) .

If you want to get the first result, try r egexp.exec (string) .

In the following example, match is used to find "Chapter" followed by 1 or more digits followed by digits separated by a period. In the regular expression, there is a flag i , so the register will be ignored.

Str = "For more information, see Chapter 3.4.5.1";
Re = / chapter (\ d + (\. \ D) *) / i;
Found = str.match (re);
Alert (found);

The script will return an array of matches:

  • Chapter 3.4.5.1 - the fully matched string
  • 3.4.5.1. First Parenthesis
  • .1 - inner bracket

The following example demonstrates the use of global and case-insensitive search flags with match . Will be found all the letters from A to E and from a to e, each - in a separate element of the array.

Var str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
Var regexp = / [AE] / gi;
Var matches = str.match (regexp);
Document.write (matches)
// matches = ['A', 'B', 'C', 'D', 'E', 'a', 'b', 'c', 'd', 'e']

The replace method can replace occurrences of a regular expression not only with a string, but with the result of a function. Its full syntax is this:

Var newString = str.replace (regexp / substr, newSubStr / function)
Regexp
The RegExp object. Its occurrences will be replaced by a value that will return parameter number 2
Substr
The string to be replaced with newSubStr .
NewSubStr
A string that replaces the substring from argument number 1.
Function
A function that can be called to generate a new substring (to substitute it for the substring obtained from argument 1).

The replace method does not change the line on which it is called, but simply returns a new, modified string.

To implement a global replacement, include the "g" flag in the regular expression.

If the first argument is a string, it is not converted to a regular expression, so for example,

Var ab = "ab" .replace ("\\ s", "..") // = "ab"

The call to replace left the string unchanged, because I did not look for the regular expression \ s , but for the string "\ s".

In the replacement line there can be such special characters:

Pattern Inserts
$$ Inserts "$".
$ & Inserts the found substring.
$ ` Inserts a portion of the string that precedes the occurrence of the occurrence.
$ ' Inserts a portion of the string that follows the occurrence of the occurrence.
$ N or $ nn Where n or nn are decimal digits, inserts an occurrence substring stored with the nth nested parenthesis, if the first argument is a RegExp object.

If you specify a function as the second parameter, then it is executed with each match.

In the function, you can dynamically generate and return a substitution string.

The first parameter of the function is the found substring. If the first argument to replace is a RegExp object, then the next n parameters contain matches from nested parentheses. The last two parameters are the position in the line on which the match occurred and the string itself.

For example, the following call replace will return XXzzzz - XX, zzzz.

Function replacer (str, p1, p2, offset, s)
{
Return str + "-" + p1 + "," + p2;
}
Var newString = "XXzzzz" .replace (/ (X *) (z *) /, replacer)

As you can see, there are two parentheses in the regular expression, and therefore the function has two parameters p1 , p2 .
If there were three parentheses, the function would have to add the parameter p3 .

The following function replaces the words borderTop to border-top :

Function styleHyphenFormat (propertyName)
{
Function upperToHyphenLower (match)
{
Return '-' + match.toLowerCase ();
}
Return propertyName.replace (/ [AZ] /, upperToHyphenLower);
}

For a common understanding of regular expressions, you can read the article in wikipedia .

They are described in more detail in the book Beginning Regular Expressions .