The Blog That Is No More
This Blog has moved to http://www.success.grownupgeek.com
Monday, April 30, 2007
  Drupal: Help Avoid Duplicate Content

Anybody that knows anything about SEO will tell you over and over to avoid duplicate content. Usually this means don't copy and paste other people's work on your website, or don't buy turnkey sites that display feeds for article submissions, etc.

But if you use Drupal there is another way you can get hit with a duplicate content penalty without even knowing it by improper use of the dreaded "/" - That's right, the / (backslash) character. As it turns out, if you use Drupal it treats www.yoursite.com/a-page and www.yoursite.com/a-page/ as different pages. This means that if a search-engine bot comes to your site via a link with trailing "/" it could potentially index several duplicate pages or at worst a duplicate of your entire site.

I had never heard of this potential issue with Drupal so big kudos to my new friend Alex of pitumbo.com who I met at the April WEMUG meeting. Alex pointed me to this article at blamcast.net that explains it better than I ever could (please take a minute to read it and give it a Digg).

Basically the trick here is to use a 301-redirect to remove the trailing slash at the end of all your URL's. The code to put in your .HTACCESS according to Blamcast.net would look something like this:

#remove trailing slashes
RewriteCond %{HTTP_HOST} ^(www.)?yourdomain\.com$ [NC]
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]


I quickly tossed this code into my .HTACCESS file and loaded a few test pages and to my surprise it did not work. I think the WWW -> non-WWW redirect I'm using to remove WWW from all my URLs was affecting it so after some tinkering I ended up with this version which seems to work:
#remove trailing slashes
RewriteCond %{HTTP_HOST} !^\.grownupgeek\.com$ [NC]
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]


If you're using Drupal and are looking to squeeze every bit of SEO from it, I recommend this simple change. Thanks to blamcast.net for putting it out there.

Labels: , ,

 
Comments:
Thanks for the kudos ;-)

I had the same problem you did with the code, since I too had set up the www --> no www redirect before dealing with the slashy issue. If you (or your readers) have any websites using the Wordpress CMS it too has issues with the trailing slash leading to duplicate content.
 
Very interesting. I use Drupal on my site and I did not know about this issue. Thank you for sharing.
 
This is a great Little tweak , Thanks for sharing !

I just used it on my site and it worked well,!
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

__________________________