Robots.txt – instructions for SEO
The first thing that starts with internal website optimization is writing robots.txt. If you don’t know what we are talking about, then be sure to read on, otherwise you risk declassifying all your personal data and folders.
Robots.txt is a simple text document, but with great features. Each seochnik is required to possess the skills of writing the most popular directives. To write it, UTF-8 and FTP encoding is used; the robots do not recognize other symbols or interpret them incorrectly. It is with the help of this document that search robots know what can be crawled and what needs to be circumvented. In turn, this increases the ranking in search results. But the instructions described in the robot only work in the host where it is located. We will talk about the rules of creation later.
Principle of operation
Before proceeding to the direct instructions, we will figure out how it will help your site. To do this, turn to the search engines.
Search engine algorithms perform:
- looking for new information, bypassing the Internet;
- scanning submitted information to facilitate the search for results.
Thanks to domain names, search engines have the opportunity to visit all Internet resources (and this is a huge number of links). Immediately upon arrival, the bot searches for robots.txt and only after reading it proceeds to further research the website, following the prescribed rules. If this file exists, then the scanner understands what can be processed and what can be done with them. Otherwise, just all the information is scanned.
- Use only the register “robots.txt”, no capital letters!
- Please note that some robots may ignore an existing document, so you should not use it to hide highly sensitive data.
- File size should not exceed 500Kb.
The most common question for newcomers to SEO is “What can I hide?”. The answer is simple – anything! But seoshniki usually hide not unique texts, links to third-party Internet resources, etc. Of course, it is better to use exclusively unique content, but this is not always possible (regulations, legal documentation, etc.). But if you index this info, then the rating will drop significantly. It is in this case that you need to hide them.
How to create?
The creation process is completely straightforward, as it is just a text document located in the root directory. It is created using a simple file manager, which the programmer uses to work with the website. But the main difficulty lies not in creating, but in filling out a document. It indicates 3 basic instructions:
- Disallow – a complete ban on scanning;
- Allow – allowed to scan all content;
- partial access where specific files are prohibited. There can be an unlimited number, the main thing is to start each time with a new line.
Other encodings are also used, among which:
- User-Agent – to indicate a specific bot that is allowed to index;
- # – used to write a comment on a specific line. Everything written after the # symbol will not be taken into account.
- Host – indicates the main mirror of the site. But recently, it does not have to be indicated. More details here.
- Crawl-delay – limits the speed of checking your website. It works in cases when the resource has very high attendance, and robots create unnecessary load and system braking.
- Sitemap – to indicate the location of the site map;
- Clean-param – struggling with duplication of texts, photos and videos.
- * – means directive for any search engine;
- $ – end of line.
User-Agent/Disallow are necessarily separated by a string without characters, but not in the case of the Disallow trailing directive. Note that directories and names are case-sensitive, and “name”, “Name” and “NAME” will be treated as separate directories.
Remember that the ban applies only to the search engine that is listed in this block.
What is preferable to noindex or robots.txt?
This question cannot be precisely answered, since they are used in different cases. For example, if you want this page not to be indexed, then use noindex in the meta tag. That is, in the section, write the meta tag:
<meta name=”robots” content=”noindex, follow”>
Now you do not need to manually delete the page using Webmaster, since it itself is removed from the scan during subsequent scanning.
Robots.txt, in turn, more reliably hides the admin panel from the index, search results and sections with personal data (registration and recovery of password and login)
Is it worth checking for errors?
Definitely yes! Since, you could have been mistaken and accidentally imposed a ban on the wrong group of pages. And this will lead to unpleasant consequences.
Therefore, immediately after writing, make sure that there are no typos. Apply:
- Google Webmasters. Requires authorization and confirmation of ownership of the site. Allows:
- instantly detect all existing errors;
- immediately correct the files, re-check and afterwards transfer to your resource;
- check whether you have correctly closed / opened all the necessary directives correctly.
- Yandex Webmaster. Does not require authorization and confirmation of ownership of the web resource. It saves time and allows you to immediately set all pages, and not alternately, and also checks whether Yandex will understand all the instructions. The rest is similar to the previous one.
Why does not it work?
It happens that even after checking and following all the recommendations, the robot does not work. What is the reason?
- Check if you accidentally blocked an extra folder or even the entire web resource (this also happens by negligence);
- Check through Webmaster, perhaps the search engines can not reindex the resource;
- Check if there are any external links to the page under the ban. Otherwise, no ban will help.
- Take your time, and before the first launch, do all the manipulations described above to save your time in the future.
- Remember that any protection cannot guarantee absolute success. Everywhere there are misfires.
And finally, if you yourself are afraid to do something wrong, then turn to professionals immediately. They will be able to quickly solve any of your problems and even prevent it. Read our blog, it is likely that in it you will find a lot of new and informative for yourself. For all other questions, we will be happy to answer personally. Related article: 301 redirect – the most comprehensive guide.