Discussion:
idle curiosity
(too old to reply)
Nick Wedd
2011-01-19 20:07:34 UTC
Permalink
I have some web pages whose visitors I try to keep track of, using a
perl script which notes the $ENV{"REMOTE_ADDR"} value (this is the
visitor's IP address) for all visitors.

I have recently noticed an IP address 95.108.158.235, belonging to
someone/something which visits my pages exactly once a day. I find that
this is Yandex, the biggest Russian-language search engine. Fair
enough, Yandex is checking out my pages. But why don't I see similar
visits from Google, or any other search engine? I know that Google is
aware of these pages.

Nick
--
Nick Wedd ***@maproom.co.uk
www.1-script.com
2011-01-21 15:11:07 UTC
Permalink
responding to
http://www.1-script.com/forums/idle-curiosity-article57877--1.htm
Fair enough, Yandex is checking out my pages. But why don't I see similar
visits from Google, or any other search engine? I know that Google is
aware of these pages.
Google does not have such visit pattern - to check one page once every
day. In fact, I would guess that Yandex does not have such visit pattern
either. Most likely it is someone (a competitor perhaps) using Yandex
translation tools as a free Web proxy to check on your page. And this is
probably done as a cron job via wget or a Perl or PHP script. Since they
don't want to be seen by their own IP, I think it's safe to assume it's a
competitor.

To find a remedy you'll have to consider what sort of page it is. If it's
a dynamically-built page (such as account registration page - the usual
vector of attack) you may want to check your site's cookie before showing
the page. If it's a static page, I don't believe you have a remedy
although a few things can be done e.g. JavaScript encoding of certain
sections of the page you don't want shown to bots.

Although it won't help with this particular attack (if it is in fact an
attack), you may want to use the "noarchive" meta tag just so people
cannot get all the info they need without even hitting your site (leaving
any trace) by browsing search engines' cache. I believe Yandex, as well as
Google, honor that meta tag ( <meta name="robots" content="noarchive"> ).

A side note: kudos for picking the most appropriate message subject! :)
Chasing *one* single IP hitting you *once* a day exactly qualifies for
"idle curiosity" :)

Good luck!

-------------------------------------
--
Cheers,
Dmitri
http://www.1-script.com/
Loading...