Over the past few years there has been much talk about content spamming and duplicate content penalties. In fact not that long ago a malicious webmaster could essentially scrape the content from another site, claim it as their own, and begin ranking for those pages – even over and above the original site! Aside from the obvious copyright violations, they were also effectively stealing business from those honest and unsuspecting website owners.
Naturally, Google and others had to crack down on this, and crackdown they did. This is why you’ll often hear about the dreaded “duplicate content penalty” if you’re tuned in to SEO circles.
Truth is, it’s not so much a penalty as it is a filter. Unless you’re seriously and blatantly trying to abuse the system, Google will not systematically penalize or ban you from their results for having duplicate content.
Duplicate content however can and will work against your search engine rankings. But before we dive into the downsides, let’s first get a clearer understanding of what is and is not duplicate content.
What is Duplicate Content?
Duplicate content issues occur when two or more pages, each with a significant percentage of the same content as the other(s), are openly available on the web and thus available for a search engine spider to crawl. Quite simply it’s two or more web pages on the same or different domain, that have mostly the same content.
Below are eight common ways duplicate content can be inadvertently created by an otherwise honest webmaster:
1. Purposely using doorway pages that have different domains but with the same content on them. While the pages eventually direct the user to the home site, search engines don’t take the purpose of doorway pages into consideration.
2. If you syndicate your content such as submitting your articles and making them available to different sites and directories, don’t be surprised if search engines choose the copy on one of these other sites as the authoritative version.
3. If you run an affiliate program and use affiliate tracking links to track referrals and conversions, search engines might see affiliate landing pages as being duplicate content. For example, you might use http://www.yoursite.com while your affiliate uses http://www.yoursite.com?affiliateid=1234. This can also be a huge problem for so-called “canned” and “turn-key” websites because it means that your chances of ranking, as the owner of one of these “slices” of a larger website, are slim to none. I’ve seen this first hand with real estate agents who buy into a turn-key web “solution”; it is also rampant in MLM systems and across many other industries.
4. One or more pages in your site might have multiple unique URLs pointing to them even if they are the same page. This is often due to URL parameterization. For example, even though http://www.yoursite.com/yourproduct=1&type=52 and http://www.yoursite.com/type=52 might point to the same product page, search engines will still consider this duplicate content even though there was no intention to cheat. This is a common problem for many sites, particularly those built with content management systems.
5. A subdomain such as http://aboutus.yoursite.com that has the same content as http://www.yoursite.com/aboutus/page.html is also duplicate content.
6. Press releases published on the main company website might also appear on a subsidiary company’s website. Cross-publishing content across a family of brands is not uncommon. However, it can still be considered duplicate content.
7. One scenario that is often overlooked but is probably the most common duplicate content problem is http://yoursite.com being exactly the same as http://www.yoursite.com. Although every website will encounter this problem, not very many properly address it, even though it can be solved with a simple “301 redirect” to consolidate the “www” and “non-www” version of your website.
8. Another very common scenario is the inclusion of convenient “print” and “pdf” versions of a page on your website, which lead to the same content as the primary page formatted for printing or viewing in Adobe Acrobat.
Duplicate Content Filter
Search engines mainly deal with duplicate content by choosing one of the pages to be authoritative, and filtering out all the rest from the search engine results. After all, Google is in the business of providing hyper-relevant results, so why would they want to present what amounts to the same page multiple times to a searcher?
Now all those other pages are not necessarily penalized in any way, they are just not returned in search results and often they are not even indexed at all.
For example, if a search engine finds duplicate content on two or more different domains, they will only choose one of these pages to list in their index. And you can still lose out on being listed in search engine results even if you own the content.
Duplicate Content Penalty
If search engines like Google or Yahoo find that you are indeed deliberately and aggressively trying to cheat the system by using the same content to rank multiple times in their results, it can lead to a ban. Rest assured however this is less of a concern for your average website.
The Problem With Duplicate Content and How to Avoid It
It’s best to avoid duplicate content not necessarily because the penalty is as scary as many search engine myths about duplicate content say, but because it works against your search engine and online business success by keeping your pages from getting good rankings, or even showing up in search results.
Google, for example, continuously screens for duplicate content and uses a number of indicators to ultimately choose the authoritative page to index while ignoring all the other duplicate pages.
Even if the duplicate pages are all on your own site, you still end up diluting your ability to rank for that particular page by forcing those pages to compete against one another in the search engines.
Ultimately you should take the time to shut down any potential source of duplicate content by ensuring the search engines only index your authoritative page.
For pages on your own site, make sure duplicate pages are blocked from crawlers through the use of the “nofollow” link attribute and appropriate use of the “meta robots” page tag. Another more recent approach is to include a “rel=canonical” link tag in the header of the page to specify the authoritative URL for a particular page of content. This is particularly useful in some CMS systems that present the same content under a variety of different URLs.
As you add new pages to your site, it’s also important to get them indexed by the search engines as quickly as possible. This gives the search engine a strong indicator that your page is first and therefore authoritative; it’s also how scrapers and spammers can swipe your content as their own by beating you to the punch.
Content Checkers and Further Reading
If you’re worried about other sites duplicating your content, a quick check on Copyscape will let you know whether there are other sites on the web that have content significantly similar to yours.
For more information on duplicate content straight from the horse’s mouth, read this excellent Google Webmaster Central blog post on duplicate content.
What are your strategies for avoiding duplicate content? What tools do you like to use? Let us know in the comments below.