Having recently rebuilt the Zengenuity website, it just was this week that I finally got around to setting it up on our SEOmoz account. After the initial site crawl, I was surprised to discover this:
Duplicate content in Drupal is often a problem. The URL alias system causes content to show up at multiple URLs by default. I’ve posted about this problem (and it’s solutions) before. But, I thought we doing everything right, so I was surprised to see so many duplicate content errors.
The Problem: The Site Map Module’s Default Settings
After chasing down the issue, I found that the Site Map module was the culprit. Site Map creates a simple, human-readable list of your website content using your menus items and taxonomy terms. It’s a good way to ensure that both real users and search engines know about all the content on your site. However, the default settings for the taxonomy part of the module can cause a duplicate content to show up. Here’s how:
Drupal has a built in page for each taxonomy term on your site. The URL for this page looks something like this:
If you have installed and configured the Pathauto module, this URL will probably look more user friendly. Ours looks like this:
Theses aliases work great for normal taxonomy page links, like the ones that appear at the bottom of our blog posts. However, the aliases don't work with Site Map. Instead of using the friendly URLs by default, it appends “/all” to the original URLs. So, all the taxonomy links look like this:
The configuration option that controls this is here:
The reason "all" is the default is that websites with hierarchal taxonomy structures will likely want to display all content tagged with a term or any of its children. That’s what the “/all” option does. However, most sites, including this one, do not use hierarchical tags. So, this option has no effect other than to create duplicate content. When Google and other search engines index the site map page and see these "/all" links, they will index them separately from your friendly taxonomy links, and you may end up getting penalized for the content duplication.
You should change the Site Map taxonomy setting setting to “-1” instead of “all”.
Once this is done, all the links in your Site Map page will change to their Pathauto-generated values. For sites with a flat taxonomy, there is no downside to doing this. For sites that do have hierarchical taxonomies, you can replace the standard Drupal taxonomy pages with the taxonomy_term view that comes with Views. Once you do that, you can remove the depth argument and manage the depth you want to display with the term argument.
Over 30,000 sites currently use the Site Map module. I’m betting that many of them have never noticed this issue before, since as site builders we rarely actually look at the site map. Luckily, this problem is pretty easy to fix, once you know it’s there. It’s just one more thing to add to your pre-launch checklist.