The robots.txt file is a small but powerful tool in the world of search engine optimization (SEO) and for webmasters. But what exactly is behind this term and why should every website have one? In this article you will learn everything you need to know.
As always, here is the short version:
TL;DR – The robots.txt
- A text file in the root directory of your website that tells search engine crawlers which areas may be searched.
- Controls website access and improves crawling efficiency; protects sensitive data from indexing.
- Created in a text editor, saved as “robots.txt”, add instructions and upload to the root directory.
- Syntax for complete blocking: User-agent: * and Disallow: / blocks access to the entire website.
- Avoid syntax errors, update the file regularly, use specific instructions and add a sitemap.
- Each subdomain requires its own robots.txt file, different instructions per subdomain are possible.
What is the robots.txt file?
The robots.txt file is a simple text file that is located in the root directory of your website. It contains instructions for search engine robots (also known as crawlers) about which pages or areas of your website they may and may not search.
These instructions help to guide the crawlers and ensure that only the desired content is indexed.
Why is it important for your website?
A well-configured robots.txt file can improve the efficiency of search engine crawlers and prevent unnecessary or sensitive content from being searched and indexed. This has several advantages:
- Control over website access: You decide which parts of your website should be recorded by search engines and which should not.
- Improved crawling efficiency: By excluding unimportant pages, the crawlers can make better use of their resources and concentrate on the important content.
- Protection of sensitive data: You can prevent private or security-relevant information from entering the search engine index.
The robots.txt file is therefore an essential tool for controlling the visibility and security of your website.
In the following sections, you will learn everything you need to know about creating, managing and optimizing your own robots.txt file.
Where can I find the robots.txt file?
Where exactly is this file located and how can you manage it in WordPress?
Access to the robots.txt file in WordPress
The robots.txt file is easily accessible on WordPress-based websites.By default, it is stored in the root directory of your website, i.e. directly under the main domain (e.g. https://deinewebsite.com/robots.txt).
However, if you want to create a custom robots.txt file or edit the existing one, there are several ways to do this.
Typically, you can access it via the Control Panel (e.g. cPanel) or via an FTP program such as FileZilla.
Use of plugins to manage the robots.txt file
WordPress offers numerous plugins that make it easier for you to manage the robots.txt file. Here are some popular options:
- Yoast SEO: This widely used SEO plugin allows you to edit the robots.txt file directly from the WordPress dashboard. You can find the setting under “SEO” > “Tools” > “File Editor”.
- All in One SEO Pack: Similar to Yoast, this plugin also offers a user-friendly way to edit the robots.txt file. To do this, go to “Feature Manager” in the settings and activate the “robots.txt editor”.
- WP robots.txt: A simple, specialized plugin that is used exclusively to manage the robots.txt file. It offers an intuitive user interface and allows you to customize the file according to your needs.
By using these plugins, you can quickly and easily create, edit and optimize the robots.txt file of your WordPress website. This not only saves time, but also ensures that you don’t overlook any important settings.
In the next sections you will learn how to create a robots.txt file, how to prohibit everything via robots.txt and how to read the file correctly.
How do you create a robots.txt file?
The robots.txt file is a simple text file that contains instructions for search engine crawlers. It consists of a series of instructions that define which areas of your website may and may not be searched. The basic instructions are:
User-agent: Specifies which crawler the instructions apply to (e.g. Googlebot, Bingbot). Disallow: Determines which pages or directories the crawler is not allowed to search. Allow: Allows access to certain pages or directories, even if they are in an otherwise blocked area.
A simple example of a robots.txt file could look like this:
User-agent: *
Disallow: /private/
Allow: /private/allowed-page.html
In this example, all crawlers (“*”) are denied access to the directory “/private/”, but the page “/private/allowed-page.html” is explicitly allowed.
Step-by-step instructions for creating a robots.txt file
1. create file:
– Open a text editor of your choice (e.g. Notepad, TextEdit).
– Create a new file and save it as “robots.txt”.
2. add instructions:
– Add the desired instructions for the crawlers.
3. upload file:
– Save the file and upload it to the root directory of your website. You can do this via an FTP program such as FileZilla or via the file management system of your hosting provider.
4th review:
– Make sure that the file has been uploaded correctly by navigating to “https://deinewebsite.com/robots.txt” in your browser. You should see the instructions that you have added.
How can I prohibit everything via robots.txt?
There are situations in which it may make sense to prevent search engine crawlers from accessing your entire website. This may be the case, for example, if your website is still under development or if you want to hide certain content completely from search engines for various reasons.
To prohibit all search engine crawlers from accessing your website, you only need a few lines in your robots.txt file. The basic syntax for this is:
User-agent: *
Disallow: /
User-agent: The asterisk (*) stands for all crawlers. This instruction therefore applies to all search engine robots.
Disallow: The slash (/) means that the entire website directory is blocked.
Important notes
- Remember: The robots.txt file only prevents crawlers from searching the specified areas. It does not offer any real protection against unwanted access. Sensitive data should always be additionally secured by other measures (e.g. password protection).
- Considered use: Check carefully whether complete blocking makes sense.In most cases, it is better to only block certain areas in order to ensure that search engines can index the relevant content on your website.
How does Bing deal with robots.txt?
Search engines often treat the instructions in the robots.txt file differently, and Bing is no exception. To ensure that your website is optimally crawled by Bing, it is important to understand how Bing interprets and implements the robots.txt file.
- Strict adherence to instructions:
Bing follows the instructions in the robots.txt file very closely. So if you block certain pages or directories in the file, you can be pretty sure that Bing will not search them. - Consideration of the crawl delay instruction:
Bing supports the “crawl delay” instruction in the robots.txt file, which specifies how long the crawler should wait between requests. This is useful to reduce the server load.
Example:
User-agent: Bingbot
Crawl-delay: 10
Specific guidelines and recommendations from Bing
- Using Bing Webmaster Tools:
Register your website with Bing Webmaster Tools. There you can test the robots.txt file directly and analyze how Bing searches your website. - Creating a Bing-specific sitemap:
In addition to the robots.txt file, you can also create a sitemap specifically for Bing and specify it in the robots.txt file. This helps Bing to crawl your website more efficiently.
Sitemap: https://deinewebsite.com/bing-sitemap.xml
Handling of subdomains
Each subdomain is treated as a separate unit by search engines, so each also needs its own robots.txt file.
- Separate files for each subdomain:
Each subdomain should have its own robots.txt file in the root directory. For example:
For the main domain: https://deinewebsite.com/robots.txt
For a subdomain: https://blog.deinewebsite.com/robots.txt - Specific instructions per subdomain:
You can give different instructions for each subdomain, depending on which content is to be searched and which is not.
Example of a standard setup
A standard setup for a robots.txt file allows search engine crawlers to access the entire website, while certain sensitive or unimportant areas are blocked.
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Sitemap: https://deinewebsite.com/sitemap.xml
Example e-commerce setup
User-agent: *
Disallow: /checkout/
Disallow: /cart/
Disallow: /user-profile/
Allow: /products/
Sitemap: https://shop.deinewebsite.com/sitemap.xml


