How to include or exclude links from being indexed by search engines

  • Updated

This KB article describes using Robots.txt to include or exclude links from being indexed by search engines.

NOTE : Robots.txt is not an Ektron file. The /robots.txt is a de-facto standard, and is not owned by any standards body. See  http://www.robotstxt.org/robotstxt.html  for information.

What is Robots.txt?

Robots.txt is a file that you can use to specifically include or exclude links from being indexed by search engines like Google, Bing, Yahoo, and so on. Web owners and administrators use this text file to give instructions about their site to robots (also known as Web Wanderers, Crawlers, or Spiders). 

Before a website is accessed, robots first looks at the robots.txt file to see what it can and cannot use. If you do not want certain things to be accessed or indexed by a search engine like Google, then you want to make sure it is listed in Disallow section. Each item specified should be on its own line. 

Where is Robots.txt located?

Robots.txt should always be placed in the top-level directory of your Web server. This makes it easy for the file to be found. For example, if your base domain is www.test.com , the robots.txt file should be located right in the top level directory so that it can be found via www.test.com/robots.txt . If the file is not located here, robots cannot read the file and essentially your requests will go ignored.

What does Ektron recommend in regards to Robots.txt?

Ektron recommends disallowing access to /workarea/ and /widgets/. The rest is up to you. 

Other Considerations:

In regards to eSync, text files (especially robots.txt) are already specifically excluded from synchronization so you do not need to worry about versions of this file getting moved around to different environments.

The following file types are not synched by default.

  • .config files
  • .txt files
  • .sln files

You may add these file types to a Workarea or Template profile via the Include/Exclude Files field.

To remove specific content items, you can add them to the robots text file in a format similar to the following: 

Disallow: /templatename?id=138

*info adapted from stack overflow

You can also create a metadata tag for robots which will prevent pages from being indexed as well and put this in the hands of the content authors. An example of this can be found below.

meta name="ROBOTS" content="NOINDEX, NOFOLLOW"

*info from  http://www.robotstxt.org/meta.html