This page has been robot translated, sorry for typos if any. Original content here.

How to find out where the visitors came from

Hanging on my site counters, I suddenly found that not one of the organizations that provided them to me, does not allow me to see what query in Yandex (and other search engines) a person came to my site. At least, no one provides such a service for free.

Then I took offense at them for the fact that, in this case, I myself have to write to myself a small logger. The problem is the easiest, therefore it broke it. However, the owners of all sites-counters have the audacity to take money for this, instilling in people the belief that their money is not for nothing. Well, we will dispel their hopes of fucking us!

The page you came from

So, to get the page with which the user got on this one is easier than easy. Its default address is passed in the header of the user's HTTP request in the line "Referrer:". You can get its value from PHP by calling the corresponding function as follows:

  getenv ("HTTP_REFERER") 

So, in principle, you can simply take and write a separate function that:

  $ H = getenv ("HTTP_REFERER");  // get the URL from which the visitor came 
 $ f = fopen ("mylog.log", 'a');  // opens the log file to add 
 flock ($ f, 2);  // denies access to it, 
 // until it is closed (in case two 

 // script will want to write something at the same time 
 // to a file, one of them will have to wait) 
 fwrite ($ f, "$ H \ n");  // writes to the file received three lines above the URL 
 fclose ($ f);  // closes the file 

Then this function can be called at the beginning of each php-script of the site pages.

How to recognize ip, your destiny and the exact time

But! Since we are here, you can slightly modify the function to find out the visitor's IP (you can roughly estimate the area where it is physically located), its browser (interestingly) and the page where it actually came (it sounds a bit strange - we are so we know that he came to our page, but imagine that you have this function called in the section of the header code, which is equally inserted into all your pages - and so, perhaps, you will). It would also be nice to have a time when people came to our site - then you can analyze the activity of users, etc.

So, the function will take the following form:

  $ er_time = date ("H: i: sd M Y");  // write the current time in a string, 
 // using the specified format 
 $ U = getenv ("HTTP_USER_AGENT");  // get the data about the software, 

 // which is used by the user 
 $ H = getenv ("HTTP_REFERER");  // get the URL from which the visitor came 
 $ R = getenv ("REMOTE_ADDR");  // get the visitor's IP 
 $ W = getenv ("REQUEST_URI");  // get the relative address of the page, 
 // the visitor requested 
 $ f = fopen ("logs / visits.log", 'a');  // Further - it is understandable, we write all this in a file 
 flock ($ f, 2); 
 fwrite ($ f, "$ er_time \ n Br: $ U \ n Rf: $ H \ n IP: $ R \ n Rq: $ W \ n"); 
 fclose ($ f); 

But this function is far from perfect! The fact is that you will not see Russian characters in these URLs - they will be replaced by their 16-riches (preceded by the sign "%"). Therefore, it would be nice if the script independently leads them to a readable form:

  http://www.yandex.ru/yandsearch?text=%E9%EE%E6%FB%E3+%F4%F2%F3%EC%E0%ED%E5&stype=www 

on

  http://www.yandex.ru/yandsearch?text=yzhyg+fthumane&stype=www 

Doing this we will be the following piece of code using regular expressions:

  while (ereg ('% ([0-9A-F] {2})', $ H)) {// as long as $ H is at least one 
 // the combination of the sign% and two characters from the ranges 0-9 and AF 
 // (hexadecimal digits) 
 $ val = ereg_replace ('. *% ([0-9A-F] {2}). *', '\ 1', $ H); 
 // assign $ val the result of replacing the expression, 
 // containing a percentage and two characters from the above intervals, 
 // these two symbols 
 // in short, now in $ val we have the next 2 characters, 
 // before which in the original expression was the percentage 
 $ newval = chr (hexdec ($ val));  // get the sigil with the number, 
 // the received translation in $ val 
 // hexadecimal number in the "normal" 
 $ H = str_replace ('%'. $ Val, $ newval, $ H); 
 // standard string substitution function - 
 // replaces a substring consisting of percent and other characters 
 // from the $ var variable to the character that these two 
 // hexadecimal characters encoded 
 } // end of cycle :)

Total:

  $ er_time = date ("H: i: sd M Y");  // write the current time in a string, 
 // using the specified format 
 $ U = getenv ("HTTP_USER_AGENT");  // get the data about the software, 
 // which is used by the user 
 $ H = getenv ("HTTP_REFERER");  // get the URL from which the visitor came 
 $ R = getenv ("REMOTE_ADDR");  // get the visitor's IP 
 $ W = getenv ("REQUEST_URI");  // get the relative address of the page, 
 // the visitor requested 
 while (ereg ('% ([0-9A-F] {2})', $ H)) {// as long as $ H is at least one 
 // the combination of the sign% and two characters from the ranges 0-9 and AF 
 // (hexadecimal digits) 
 $ val = ereg_replace ('. *% ([0-9A-F] {2}). *', '\ 1', $ H); 
 // assign $ val the result of replacing the expression, 
 // containing a percentage and two characters from the above intervals, 
 // these two symbols 
 // in short, now in $ val we have the next 2 characters, 
 // before which in the original expression was the percentage 
 $ newval = chr (hexdec ($ val));  // get the sigil with the number, 
 // the received translation in $ val 
 // hexadecimal number in the "normal" 
 $ H = str_replace ('%'. $ Val, $ newval, $ H); 
 // standard string substitution function - 
 // replaces a substring consisting of percent and other characters 
 // from the $ var variable to the character that these two 
 // hexadecimal characters encoded 
 } // end of cycle :) $ f = fopen ("logs / visits.log", 'a');  // Further - it is understandable, we write all this in a file 
 flock ($ f, 2); 
 fwrite ($ f, "$ er_time \ n Br: $ U \ n Rf: $ H \ n IP: $ R \ n Rq: $ W \ n"); 
 fclose ($ f); 

Already on the logs of this script you can judge where the visitor came from, what pages he went through and on which page he left the site. Yes, of course, you could do this and the sessions - but we are too lazy. And so - 2 minutes, and you can enjoy! :)

You will recognize her from a thousand

This is the basis. Then you can do a lot of things: for example, from the lines containing "http://www.yandex.ru/yandsearch", cut out the part that actually contains the request, and write to some file like "yandex.log ". In general, on what the imagination will suffice - all it is possible zababahat!