It’s always interesting when I discuss with clients which pages of their website they don’t want Google to index. Eyebrows are raised; puzzled looks abound; questions are asked: “why wouldn’t we want Google to index a page?” and “how do we prevent Google indexing a page?”

The facility to prevent Google indexing a Web page has long been available, in terms of the index/noindex and follow/nofollow attributes of the ‘robots’ meta tag. You place this in the head section of the HTML; it looks like this:


With index/noindex, you instruct Google to index the content of that page, or not; with follow/nofollow, you instruct Google to follow the links from that page, or not.

(Note that the example above isn’t really required, since indexing a page and following its links is Google’s default behaviour.)

So when wouldn’t you want Google to index a page or follow the links from it?

A good example of not wanting a page to be indexed and perhaps not following the links from that page is the ‘legal notices’ page of a website. There’s really no value in such a page being indexed – indeed, sometimes there are several pages like this (legal notices, terms and conditions, privacy policy and so on).

Another good example is a page which needs to exist, but isn’t necessarily one that users might visit. Perhaps on a dedicated news website, all of the news stories are kept in a folder/news – that folder needs to exist in case visitors ‘peel back’ to that page but it’s notalways essential that Google indexes it. (You’re unlikely to make the same decision with the normal news section of a company website, though – indexing the page helps Google better understand the website’s structure.)

Finally, there are some pages which are useful to help Google/users find content but which can look like Google spam. A prime example of this is a page where you are listing lots of links – perhaps it’s a directory website and the page lists every company in the directory; there may be several such pages, organising the companies in different ways (perhaps one by location and one by type of services offered).

In this example, we definitely want Google to index the companies linked to from the page, but we don’t want Google to think that the page is spam. Indeed, we possibly don’t even want people landing on that page from a search engine search – perhaps because the page is there as a last resort for users to find content.

In this case noindex, follow will do the trick. Google will follow the links from the page, but shouldn’t see the page as a honeypot/spam page.

Sculpting how Google sees your pages, in terms of controlling which you don’t want Google to index, is almost as important as controlling which pages it does index.

To be fully sure that the pages aren’t indexed, it’s also a good idea to ensure that such pages don’t appear in the Google site map and HTML site map, if you have one.

(We build the facility to control whether a page is indexed and/or in the site maps right into our content-managed websites as standard for this reason.)

While Google typically respects the noindex/nofollowattributes of the robots tag, there is an exception. Google has stated that if enough pages link to a page marked for noindex, then it takes the view that users find the content relevant and will index it anyway. I’m not sure if I agree with this policy, I think if the tag’s there it should always be respected, but there you go.

It’s worth reviewing your website to see which pages you might not want Google to index – after all, search engine results are all about delivering relevant content.