Prevent search engines from indexing Magento test sites
While we recommend you develop Magento sites in such a way that the development site is not publicly accessible, it is sometimes needed to connect a Magento testing site to the web right away. Here are some tricks to make sure search engines like Google Search do not start indexing this test site.
You might have a great
robots.txt for your live site, that tells search engines what to index and what not. When it comes to testing sites or development sites, this is less efficient. A
robots.txt that tells search engines to move along and not index anything, looks like this:
User-agent: * Disallow: *
Sending the X-Robots-Tag header
Another way is to add the
X-Robots-Tag header to all of your content. With the
robots.txt file, you rely on searching engines to read and interpret this file. Search engines will cache this file, so changes to this file might not have direct effect. With the
X-Robots-Tag you do have direct results, because the HTTP header is added to any response sent from your server to the search engine.
You can add the following to your
.htaccess file (assuming you are running Apache):
Header set X-Robots-Tag "noindex, follow"
Using a PHP line for adding X-Robots-Tag
If you are running another webserver like Nginx, or if you have setup where
.htaccess files are simply ignored, you might be forced to add the statement to the Virtual Host configuration of your webserver. An alternative is to simply add the following PHP code to the top of your
index.php file in the root of the Magento filesystem:
header('X-Robots-Tag: noindex, follow');
Which method to use?
In this guide you find 3 methods to instruct search engines to skip your site. A question would be which method is best. Personally I think you should simply implement all 3 methods, to make sure that a change of environment does not cause your non-production site to be indexed all of a sudden. The
index.php change perhaps classifies as a core hack, though I personally see the
index.php file as part of the modifiable configuration. It also contains switches for debugging and maintenance, that I modify most of the time. So my suggestion is to use all 3 methods to be sure.
Blocking access by IP
Probably the safest way of hooking your site to the web, without actually any chance of search engines indexing anything, is to block access. The strategy here would be to allow your own IP addresses and blocking all other IPs. This can be done by adding the following to the top of your
Order deny,allow Deny from all Allow from 127.0.0.1 Allow from 127.0.0.2
In this case, only the IPs
127.0.0.2 are allowed access. You probably need your own IP to this as well. This example uses the
.htaccess file, which only works under Apache. For other webservers like Nginx, you will need to add
allow rules directly to your Nginx configuration files.