# robots.txt for http://www.example.com/
# Permit indexing (reading and registration in the database) for all data.
User-agent: *
Disallow:
In the example above, described to Search engine (the Spider), which all "user-agent" are allowed to index the pages and follow the links. This is the famous (Index, Follow) recommended as standard.
By adding a / Slash after Disallow: / you got the exact opposite that is the standard noindex, nofollow we are going to see below.
In the next example, indicate that all Robots will be able to index the page without the Altavista search engine from a reading of the files in the "Temp" directory.
User-agent: *
Disallow:
User-agent: scooter
Disallow: /temp/
One could add that for a specific file "temp" is him allowed to read that file.
User-agent: scooter
Disallow: /temp/
Allow: /temp/example2.html
Exceptions:
Prohibited at all Browser indexing and reading links.
User-agent: *
Disallow: /
With the Slash immediately after Disallow:, indicate that the indexing and reading of the links is prohibited at all browsers. (Robots in the metatags of HTML page is: "noindex,nofollow")
Please note that in the Html page is necessary to include in the metatag robots content the URL of your robots.txt, and NOT , leave this with an indication Index, Follow Potrebbe causare in molti spider il salto del tuo file robots.txt with everything that would result!
The meta robots in Header area will be similar to the following:
<meta name="robots" content="http://www.yourdomain.com/robots.txt">
Some exceptions (the ban), can be reached by entering the Passwords or indicating CHMOD rights entry into your FTP software, according to the restrictions of the page.
While to avoid in Html page that the robot follow a particular link, you can use the attribute rel , filling in as follows:
<a href="http://www.example.com/" rel="nofollow">This link will not be taken into consideration</a>.
Please note that the attribute rel is recognized only by Google and Yahoo. For other Spider there is no official recognition of this element.
If the robot encounters this indication, he does'nt take into account the particular link, and not following him and not indexes (and not penalized) if the links lead to sites or pages of little value or being use of illegal techniques positioning, the page from which emanates. Attention! The page linked with the rel="nofollow", if part of your domain, and is not linked from another page internal or external lose pagerank first and then quit completely by Google indexes.
Check directly after completing the text, its features and the possible presence of errors, after having entered into online Root your site. To put online the robots.txt, just fill out the text as examples above, appoint robots.txt paying attention to the font used, which must be all lower case, and then put it as any document in your domain Root. The address of your robots.txt will be: http://www.yourdomain.com/robots.txt
This is the address where you can do the test: Test online robots.txt
This other instead is the official website where you can find more information: www.robotstxt.org
Other information about Googlebot you can find it here: GoogleBot Info Site