Duplicate content caused by URL parameters

Posted on September 12th 2007 in Google, SEO & SEM

Operating a dynamic website? This might be interesting. The duplicate content issue is a well known problem when it comes to getting your pages indexed and ranked in the search engines. There are many solutions such as URL rewriting (mostly on Unix/Linux, alhtough there are ways to make it work on Windows), sessions and cookies, URL parsing which is pretty much a variation of URL rewriting…

There is a post about this on Google Webmaster Central. The thing that caught my eye in the post is the following:

1. When we detect duplicate content, such as through variations caused by URL parameters, we group the duplicate URLs into one cluster.
2. We select what we think is the “best” URL to represent the cluster in search results.
3. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL.

Item (2) is interesting. Unless you actually show different content when your page receives a different value for a parameter, which pretty much means there is no duplicate content, why would Google need to select what it thinks is the “best” representative page for a particular URL cluster. Yahoo allows webmasters to tell it what their choice for the representational URL is. By relying on Google to choose the “best” URL you actually give up any control that Google just gave you:

1. Removing unnecessary URL parameters — keep the URL as clean as possible.
2. Submitting a Sitemap with the canonical (i.e. representative) version of each URL. While we can’t guarantee that our algorithms will display the Sitemap’s URL in search results, it’s helpful to indicate the canonical preference.

So if I am not guaranteed to have my URL choice shown in search results, what’s the point of having the option to explicitly state that URL in the submitted site map? Kinda reminds me of the door dilemma - “damned if you enter, damned if you don’t enter”.

The whole story revolves around having your incoming link juice distributed over multiple variations of what’s essentially the same URL. So 100 links towards a “clean” URL will be distributed to all the URL parameter variations, and decrease incoming link value, which is not what you want, especially after all the hard work associated with link building.

In the end, it all comes down to is to try to use URL rewriting even if it takes a hosting solution change. Everything else is an unsure fix, complicates any site development, and will produce unwanted results later.

Trackback URI | Comments RSS

Leave a Reply

Close
E-mail It