Writing robots. Recommendations for setting up the robots txt file

First, I’ll tell you what robots.txt is.

Robots.txt– a file that is located in the root folder of the site, where special instructions for search robots are written. These instructions are necessary so that when entering the site, the robot does not take into account the page/section; in other words, we close the page from indexing.

Why do you need robots.txt?

The robots.txt file is considered a key requirement for SEO optimization of absolutely any website. The absence of this file may negatively affect the load from robots and slow indexing and, even moreover, the site will not be completely indexed. Accordingly, users will not be able to access pages through Yandex and Google.

Impact of robots.txt on search engines?

Search engines(especially Google) will index the site, but if there is no robots.txt file, then, as I said, not all pages. If there is such a file, then the robots are guided by the rules that are specified in this file. Moreover, there are several types of search robots; while some can take into account the rule, others ignore it. In particular, the GoogleBot robot does not take into account the Host and Crawl-Delay directives, the YandexNews robot has recently stopped taking into account the Crawl-Delay directive, and the YandexDirect and YandexVideoParser robots ignore generally accepted directives in robots.txt (but take into account those that are written specifically for them).

The site is loaded the most by robots that load content from your site. Accordingly, if we tell the robot which pages to index and which to ignore, as well as at what time intervals to load content from the pages (this applies more to large sites that have more than 100,000 pages in the search engine index). This will make it much easier for the robot to index and download content from the site.

Files that are unnecessary for search engines include files that belong to the CMS, for example, in Wordpress – /wp-admin/. In addition, ajax, json scripts responsible for pop-up forms, banners, captcha output, and so on.

For most robots, I also recommend blocking all Javascript and CSS files from indexing. But for GoogleBot and Yandex, it is better to index such files, since they are used by search engines to analyze the convenience of the site and its ranking.

What is a robots.txt directive?

Directives– these are the rules for search robots. The first standards for writing robots.txt and, accordingly, appeared in 1994, and the extended standard in 1996. However, as you already know, not all robots support certain directives. Therefore, below I have described what the main robots are guided by when indexing website pages.

What does User-agent mean?

This is the most important directive that determines which search robots will follow further rules.

For all robots:

For a specific bot:

User-agent: Googlebot

The register in robots.txt is not important, you can write both Googlebot and googlebot

Google search robots

Yandex search robots

Yandex's main indexing robot

Used in the Yandex.Images service

Used in the Yandex.Video service

Multimedia data

Blog search

A search robot accessing a page when adding it through the “Add URL” form

robot that indexes website icons (favicons)



Used in the Yandex.Catalog service

Used in the Yandex.News service


Mobile services search robot

Search robots Bing, Yahoo, Mail.ru, Rambler

Disallow and Allow directives

Disallow blocks sections and pages of your site from indexing. Accordingly, Allow, on the contrary, opens them.

There are some peculiarities.

First, the additional operators are *, $ and #. What are they used for?

“*” – this is any number of characters and their absence. By default, it is already at the end of the line, so there is no point in putting it again.

“$” – indicates that the character before it should come last.

“#” – comment, the robot does not take into account everything that comes after this symbol.

Examples of using Disallow:

Disallow: *?s=

Disallow: /category/

Accordingly, the search robot will close pages like:

But pages like this will be open for indexing:

Now you need to understand how nesting rules are executed. The order in which directives are written is absolutely important. Inheritance of rules is determined by which directories are specified, that is, if we want to block a page/document from indexing, it is enough to write a directive. Let's look at an example

This is our robots.txt file

Disallow: /template/

This directive can also be specified anywhere, and several sitemap files can be specified.

Host directive in robots.txt

This directive is necessary to indicate the main mirror of the site (often with or without www). Please note that the host directive is specified without the http:// protocol, but with the https:// protocol. The directive is taken into account only by Yandex and Mail.ru search robots, and other robots, including GoogleBot, will not take the rule into account. Host should be specified once in the robots.txt file

Example with http://

Host: website.ru

Example with https://

Crawl-delay directive

Sets the time interval for indexing site pages by a search robot. The value is indicated in seconds and milliseconds.


It is used mostly on large online stores, information sites, portals, where site traffic is from 5,000 per day. It is necessary for the search robot to make an indexing request within a certain period of time. If this directive is not specified, it can create a serious load on the server.

The optimal crawl-delay value is different for each site. For search engines Mail, Bing, Yahoo, the value can be set to a minimum value of 0.25, 0.3, since these search engine robots can crawl your site once a month, 2 months, and so on (very rarely). For Yandex, it is better to set a higher value.

If the load on your site is minimal, then there is no point in specifying this directive.

Clean-param directive

The rule is interesting because it tells the crawler that pages with certain parameters do not need to be indexed. Two arguments are specified: the page URL and a parameter. This directive is supported by the Yandex search engine.


Disallow: /admin/

Disallow: /plugins/

Disallow: /search/

Disallow: /cart/

Disallow: *sort=

Disallow: *view=

User-agent: GoogleBot

Disallow: /admin/

Disallow: /plugins/

Disallow: /search/

Disallow: /cart/

Disallow: *sort=

Disallow: *view=

Allow: /plugins/*.css

Allow: /plugins/*.js

Allow: /plugins/*.png

Allow: /plugins/*.jpg

Allow: /plugins/*.gif

User-agent: Yandex

Disallow: /admin/

Disallow: /plugins/

Disallow: /search/

Disallow: /cart/

Disallow: *sort=

Disallow: *view=

Allow: /plugins/*.css

Allow: /plugins/*.js

Allow: /plugins/*.png

Allow: /plugins/*.jpg

Allow: /plugins/*.gif

Clean-Param: utm_source&utm_medium&utm_campaign

In the example, we wrote down the rules for 3 different bots.

Where to add robots.txt?

Added to the root folder of the site. In addition, so that you can follow the link:

How to check robots.txt?

Yandex Webmaster

On the Tools tab, select Robots.txt Analysis and then click check

Google Search Console

On the tab Scanning choose Robots.txt file inspection tool and then click check.


The robots.txt file must be present on every website being promoted, and only its correct configuration will allow you to obtain the necessary indexing.

And finally, if you have any questions, ask them in the comments under the article and I’m also wondering, how do you write robots.txt?

