This page has been robot translated, sorry for typos if any. Original content here.

A few examples of regular expressions

On this topic:


Is the string a number up to 77 digits long:

  if (ereg ("^ [0-9] {1.77} $", $ string)) echo "yes";  else echo "no"; 

Whether the string consists only of letters, numbers and "_", a length of 5 to 20 characters:

  if (ereg ("^ [a-za-y0-9 _] {5,20} $", $ string)) echo "yes";  else echo "no"; 

Is there any character in the string, other than allowed. Letters, digits and "_" are considered valid. Length here can not be checked, except that just an additional condition strlen ($ string). Do not confuse with the previous example - although the result is the same, but the method is different, "by contradiction"

  if (! ereg ("[^ a-za-y0-9 _]", $ string)) 
  echo "no foreign characters (OK)"; 
 else 
  echo "is a foreign character (FALSE)"; 

For a case-insensitive comparison, use ereg i ().

Are there any consecutive characters in the string, not less than 3 characters in a row (such as "abgGDDeYe", but not "AABBAABB"):

  if (preg_match ("/ (.) \\ 1 \\ 1 /", $ string)) echo "yes";  else echo "no"; 

Replace everywhere in text LINE1 on LINE2 (the task is solved without regegular expressions):

  $ string = str_replace ("LINE1", "LINE2", $ string); 

Replace the curves of the codes for the transition of the line to normal: for this you need only delete "\ r".

Transitions are normal (but different!): "\ N" or "\ r \ n".

Still there are glitches, type "\ r \ r \ n".

  $ string = str_replace ("\ r", "", $ string); 

Replace all duplicate spaces with one. Do not try to apply str_replace here, this is a good function, but not for this example.

  $ string = preg_replace ("/ XX + /", "X", $ string);  // put a space instead of X 

In the text there are some words, let's say "WORD" and "LYALYAL" (etc.), which should be replaced in the same way with the same thing, but with additives.

Perhaps words are missing or found many times in any register.

Those. if there was a "word" or "word" (or else like that), you need to replace it with "<b> word </ b>" or "<b> SLOUVO </ b>" (depending on how it was).

In other words, it is necessary to find a list of words in any register and insert fixed lines ("<b>" and "</ b>") along the edges of the nested words.

  $ string = preg_replace ("/ (word1 | word2 | halal | word99) / si", " \\ 1 ", $ string); 

Find the text enclosed in some tag, for example <TITLE> ... </ TITLE> from an HTML file ( $ string - source text).

  if (preg_match ("!  (. *?)  ! si ", $ string, $ ok))
  echo "Tag found, text: $ ok [1]";
 else
  echo "Tag not found"; 

Find the text enclosed in some tag and replace it with another tag, for example: <TITLE> ... </ TITLE> replace the same in <MY_TEG> ... </ MY_TEG> in the HTML file:

  preg_replace ("! ! si "," <MY_TEG> \\ 1 МОЙ_ТЕГ>  ", $ string); 

Highlighting PHP code in messages

For example, you have a forum like vBulletin, where you can highlight the code, if you highlight it specially: [PHP] any code [/ PHP] .

In the end, after this (when viewing the message), you get a nice and colorful php-code.

And so, if you want all the pieces between [PHP] .. [/ PHP] and <? ..?> To be perceived as code and colored, then this can be done quite easily.

Text of the program.

 <?

 // Original message:
 // ------------------------------------------------ ------
 $ str = '
 Pamagite, does not work!  Here is an example:
 [php]
 // comment
 # comment
 phpinfo ();
 [/ php] 

 La la la la la la 

 [php]
 for ($ i = 0; $ i <100; $ i ++) {
 ping ("- f", "www.ru");
 }
 [/ php]
 <? 
 echo "<a href=http://shram.kiev.ua/> click here! </a>";
 phpinfo (); 
 ?>
 ';
 // ------------------------------------------------ ------

 // suppress warnings (in highlight_string there are glitches) 
 error_reporting (0);

 // function of highlighting one piece of text
 function _my _ ($ s, $ a1, $ a2) {
  if ($ a1! = "<?") {$ a1 = "<?";  $ a2 = "?>";  }
  $ s = str_replace ("\\\" "," \ "", $ s);
  ob_start ();
  highlight_string ($ a1. $ s. $ a2);
  $ s = ob_get_contents ();
  ob_end_clean (); 
  return $ s;
 }

 // search in the text for all the pieces between <? ... or [PHP] ...
 $ str = preg_replace ("! (\ [php \] | <\?) (. *?) (\ [/ php \] | \?>)! ise", "_ my _ ('\\ 2', '\ \ 1 ',' \\ 3 ') ", $ str);

 echo $ str;

 ?>

After such a program, the screen shows:

Pamagite, does not work! Here's an example: <?
// comment
# comment
phpinfo ();
?> for a lyalal <?
for ( $ i = 0 ; $ i < 100 ; $ i ++) {
ping ( "-f" , "www.ru" );
}
?> <?
echo "<a href=http://shram.kiev.ua/> click here! </a>" ;
phpinfo ();
?>

Apparently, everything that was between the special lines was highlighted, and the extraneous text did not change in any way. If you are going to apply for the forum, then think about the transition to new lines.

If you have the whole message - it's solid code, then use highlight_string directly, without searching for <? ..?> In the code ...

Checking URL for correctness

This function is taken from the source of the chat.

Supports everything that only can be in the URL ...

Remember that you should not only check, but also take a new value

from function, t. she appends "http: //" in case of his absence.

  // add.  function for the removal of dangerous sivols
 function pregtrim ($ str) {
  return preg_replace ("/ [^ \ x20- \ xFF] /", "", @ strval ($ str));
 }
 //
 // check the URL and return:
 // * +1 if the URL is empty 
 // if (checkurl ($ url) == 1) echo "empty"
 // * -1 if the URL is not empty, but with errors
 // if (checkurl ($ url) == - 1) echo "error"
 // * string (new URL), if the URL is found and sparse
 // if (checkurl ($ url) == 0) echo "all ok"
 // or if (strlen (checkurl ($ url))> 1) echo "all ok"
 //
 // If the protocol was not in the URL, it will be added ("http: //")
 //
 function checkurl ($ url) {
  // cut left characters and trailing whitespace
  $ url = trim (pregtrim ($ url));
  // if empty - exit
  if (strlen ($ url) == 0) return 1;
  // check the URL for correctness
  if (! preg_match ("~ ^ (? :( ?: https? | ftp | telnet): // (?: [a-z0-9_-] {1.32} ".
  "(? :: [a-z0-9 _-] {1.32})? @)?)? (? :( ?: [a-z0-9 -] {1,128} \.) + (?: com | net | ".
  "org | mil | edu | arpa | gov | biz | info | aero | inc | name | [az] {2}) | (?! 0) (? :( ? ".
  "! 0 [^.] | 255) [0-9] {1,3} \.) {3} (?! 0 | 255) [0-9] {1,3}) (?: / [A -z0-9., _ @% & ".
  "? + = \ ~ / -] *)? (?: # [^ '\" & <>] *)? $ ~ i ", $ url, $ ok))
  return -1;  // if not correct - exit
  // if there is no protokala - add
  if (! strstr ($ url, ": //")) $ url = "http: //". $ url;
  // replace the protocol with lower case: hTtP -> http
  $ url = preg_replace ("~ ^ [az] + ~ ie", "strtolower ('\\ 0')", $ url);
  return $ url;
 } 

So for verification you need to use something like this:

  $ url = checkurl ($ url);  // re-written the URL to itself
 if ($ url) exit ("Invalid URL"); 

Validation of E-mail

Checking the correctness of E-mail - check the same as the previous example.

  //
 // checks the soap and returns
 // * +1, if the soap is empty
 // * -1, if not empty, but with an error
 // * string, if soap is right
 //

 function checkmail ($ mail) {
  // cut left characters and trailing whitespace
  $ mail = trim (pregtrim ($ mail));  // function pregtrim () take the above in the example
  // if empty - exit
  if (strlen ($ mail) == 0) return 1;
  if (! preg_match ("/ ^ [a-z0-9 _-] {1,20} @ (([a-z0-9 -] + \.) + (com | net | org | mil |".
  "edu | gov | arpa | info | biz | inc | name | [az] {2}) | [0-9] {1,3} \. [0-9] {1,3} \. [0- ".
  "9] {1,3} \. [0-9] {1,3}) $ / is", $ mail))
  return -1;
  return $ mail;
 } 

Cutting URLs from text and HTML curves

Sometimes you need to cut out links from the HTML text to a URL or Email.

If you do not have any curved code in HTML, then this is a very simple task for a regular type expression:

  ] + href = ([^>] +) [^>] *> (. *?) 

But links are different ... How to make your program, it's up to you.

You can take only 100% of the correct links, but then some curves will not be sent (although they are also true).

You can take everything in a row, but then some links will not be absolutely correctly cut out.

Text of the program:

  <?
 $ str = "
 <a href=url1> name1 </a> 
 <a href=url2> name2 </a>
 <a href='url3'> name3 </a> 
 <a href=url4> <brackets </a> </a>
 <a href=\"url5\"> <b> bold </ b> </a> 
 <a href=url6> \ "quotes \" </a>
 <a target=\"<try to try to outwit the program </a> hahaha \ "href = url7> 77777 </a>
 <a href=url8 target=\"<A attempt to outwit the program> hahaha \ "> 88888 </a>";
 echo "<pre> Source code:". htmlspecialchars ($ str). "</ pre>";
 echo "--------------- Option 1 ---------------";
 preg_match_all ("!  ] +) \ "? '?. *?> (. *?)! is", $ str, $ ok);
 for ($ i = 0; $ i  ". $ ok [1] [$ i]."  - ". $ ok [2] [$ i];
 }
 echo " 
--------------- Option 2 --------------- "; preg_match_all ("! ] + href = \ "? '? ([^ \"'>] +) \ "? '? [^>] *> (. *?)! is", $ str, $ ok); for ($ i = 0; $ i ". $ ok [1] [$ i]." - ". $ ok [2] [$ i]; } echo "
--------------- Option 3 --------------- "; preg_match_all ("!
] + href = \ "? '? ([^ \"'>] +) \ "? '? [^>] *> ([^ <>] *?)! is", $ str, $ ok); for ($ i = 0; $ i ". $ ok [1] [$ i]." - ". $ ok [2] [$ i]; } ?>

Example output:

Source:
<a href=url1> name1 </a>
<a href=url2> name2 </a>
<a href='url3'> name3 </a>
<a href=url4> <brackets </a> </a>
<a href="url5"> <b> bold </ b> </a>
<a href=url6> "quotes" </a>
<a target="<A attempt to outwit the program> hahaha "href = url7> 77777 </a>
<a href=url8 target="<A attempt to outwit the program> hahaha >> 88888 </a>
---------------Option 1---------------
  • url1 - name1
  • url2 - name2
  • url3 - name3
  • url4 - <brackets>
  • url5 - bold
  • url6 - "quotes"
  • url7 - 77777
  • url8 - hahaha "> 88888
    --------------- Option 2 ---------------
  • url1 - name1
  • url2 - name2
  • url3 - name3
  • url4 - <brackets>
  • url5 - bold
  • url6 - "quotes"
  • url8 - hahaha "> 88888
    --------------- Option 3 ---------------
  • url1 - name1
  • url2 - name2
  • url3 - name3
  • url6 - "quotes"