The Google Duplicate Content Penalty: the Truth


Here’s an interesting article by Peter Nisbet of Article-Writing.com on Google’s duplicate content penalty (or lack thereof). Go figure … :>

======================================================

The truth of the Google duplicate content penalty is quite simply that there is none! If that confuses you, then you have been reading too many misinformed forums or blogs where people get stuck on some popular term that they have no idea what it means, and then profess to be experts.

The only experts on the Google duplicate content penalty, and the only people who are qualified to define it, are Google, and in Google’s own words “There is no such thing as a duplicate content penalty“. This comes directly from Google’s Webmaster Central Blog.

That should be the end of this article, at precisely 96 words excluding title as I define my word count. But it is not. Why? Because even though this blog is operated by Google, and even though much the same has been stated by Matt Cutts, Google’s main software engineer, and other Google experts, people still argue and complain about the Google ‘duplicate content penalty’.

So here is the truth: you might ask who am I to know the truth, but I read all the Google blogs and their official statements, and in applying what I learn, I achieve excellent results for my web pages on Google search engine listings: and those of Yahoo, MSN and Bing. So I am coming from a sound base that my results can prove.

As a professional article writer whose customers trust to get them the best results from the articles I write, I have to be very aware of the policies and the way the algorithms work of each of the major search engines, and so I am as qualified as anybody to comment on myths such as this.

The Truth of the Google Duplicate Content Penalty

There is no duplicate content penalty. Google’s major search engine function is to provide a customer the best possible results for a search, based upon the search term (keywords) that the customer has used in the Google search box.

Google’s customers are not:

1. You, who use it to get your web pages listed.

2. Adwords advertisers that use Adwords to advertise their products.

3. Corporations or individuals that use it to have their web pages listed.

4. Internet marketers who recommend others to use Google for advertising or searching.

Google’s customers are those seeking information, whether that is to solve a problem, where to purchase a product at the cheapest price, find a sports result or to get directions to a specific location. Everybody that uses Google uses a search term to find some information that they need. That search term is what you and I refer to as a keyword.

If Google detects several web pages offering exactly the same content, its algorithms will select that which best offers the information required and list that. It might also list one or two other pages offering exactly the same content if there are good reasons for it doing so (e.g. more links to other relevant websites, more other relevant pages on the domain, and so on).

So, not all duplicate content pages will be refused a listing. If these duplicates are articles, then the algorithms that the spiders carry on their backs will take the links from these articles into consideration, the authority of the directory on which it is published, and other factors, before deciding which should be listed. It is wrong to believe that this decision has a chronological factor, but, if you include a link in your article Resource section to your web page that contains the same article, then your page is liable to be listed above the others, partially because of a greater number of links back to it from the other copies, and partially because your entire site is liable to be more relevant than these others to information being sought by Google’s customer.

This is not because yours was created first, but because it better meets Google’s criterion for authoritative back-links. However, if the rest of your website is not equally authoritative, your page might be listed behind another with the same content or even not listed at all.

All of this is designed by Google so that its customer is offered the most relevant range of results to the keywords they used. That is what Google is for, and is its ultimate objective. Google will not penalize any individual or any website for publishing what you refer to as ‘duplicate content’, and it will take your version into consideration for publication just as any other version.

What counts in the long run is which version Google’s algorithms believe to be most likely to provide the best possible information to the person seeking it, and if that means not publishing a whole host of duplicate information, then that is only fair, isn’t it? If you used Google to find some information, you wouldn’t want to find page after page saying exactly the same thing, would you?

No, and neither does Google. A Google listing comes from its indexing of billions of web pages that contain the keywords used by the searcher: both in relation to the entire phrase and to the individual words used in the search term. If you want your copy to be different, make some minor changes and perhaps change the form of the keywords, but most importantly, change the title and the introductory paragraph to which the crawlers will take special notice.

You then have a better chance of your version being listed along with some of the others, but remember: the next time you use the term ‘duplicate content’ you are using a term that does not exist in Google’s vocabulary for any reason than to deny its existence. The Google Duplicate Content Penalty does not exist: the truth!

About The Author
For more information on the mythical duplicate content penalty visit www.article-services.com where Peter Nisbet will also explain how to earn money using article marketing.

Subscribe to Building Mailing Lists

Reblog this post [with Zemanta]

SocialTwist Tell-a-Friend

Duplicate Content on Google, Bing & Yahoo


Google: Cross-Domain Canonical Tag This Year

Duplicate content is a common occurrence on the web and in many cases can hurt search engine rankings. While the search engines may not always technically penalize webmasters for duplicate content, there are still a lot of ways it can hurt.

WebProNews is covering the Search Marketing Expo (SMX) East in New York, where representatives from the three major search engines (Google, Yahoo, and Bing) discussed how their respective web properties handle duplicate content issues. Following are some takeaways from each.

Duplicate Content in Google

Duplicate Content on Google - Joachim KupkeThe way Google handles duplicate content has been discussed a lot in recent memory. This is largely due to a video Google’s Greg Grothaus uploaded, in which he discusses at length, the way Google handles a variety of different elements of the duplicate content conversation.

Joachim Kupke, Sr. Software Engineer of Google’s Indexing Team reiterated much of what Grothaus said. He also said that Google has a ton of infrastructure for content duplication elimination:

- redirects
- detection of recurrent URL patterns (the ability to ‘learn’ recurrent url patterns to find duplicated content)
- actual contents
- most recently crawled version
- earlier content
- contents minus things that don’t change on a site

Kupke said to avoid dynamic URLs when possible (although Google is “rather good” at eliminating dupes). If all else fails, use the canonical link element. Kupke calls this a “Swiss Army Knife” for duplicate content issues.

Google says the canonical link element has been tremendously successful. It didn’t even exist a year ago, and is has grown exponentially. It has had a huge impact on Google’s canonicalization decisions, and 2 out of 3 times, the canonical tag actually alters the organic decision in Google.

Google says a common mistake is designating a 404 as canonical, and this is typically caused by unnecessary relative links. So, avoid changing rel=”canonical” designations, and avoid designating permanent redirects as canonical.

Also, do not disallow directives in robots.txt to annotate duplicate content. It makes it harder to detect dupes, and disallowed 404s are a nuisance. There is an exception however, and that is that interstitial login pages may be a good candidate to “robot out,” according to Kupke.

Kupke says that canonical works, but indexing takes time. “Be patient and we WILL use your designated canonicals.” Cleaning up an existing part of the index takes even longer, and this may leave dupes serving for a while despite rel=canonical, Kupke adds.

At SMX, Google announced that cross domain rel=canonical is coming within this year. So for example, if the Chicago Tribune has an article on the New York Times, and the rel=canonical points to the Chicago Tribune then Google will only credit the Chicago Tribune with the content.

Duplicate Content in Bing

Sasi Parthasarathy

As far as how Bing views duplicate content, intention is key. If your intent is to manipulate the search engine, you will be penalized.

Sasi Parthasarathy, Program Manager of Bing says to consolidate all versions of a page under one URL. “Less is more, in terms of duplicate content.” If possible, use only one URL per piece of content.

Bing isn’t supporting the canonical link element (as a ranking factor) yet, but it is coming. They do say to use it, but it’s just not really a ranking factor in Bing yet. Bing says that there has been an increase in the usage of canonical tags in the past 6 months, but adoption issues still exist. According to Parthasarathy, 30% of canonical tags point to the same domain (which is fine), and 9% use it to point to other domains. This could be a mistake or it could be manipulative. Bing says they will look for other factors to try and determine which it is.

Bing says canonical tags are hints and not directives. “Use it with caution,” and not as an alternative to good web design.

With regards to www vs non-www, just pick one and stick with it consistently. Remove default filenames at the end of your URLs. Bing also says 301 redirects are your best friend for redirecting, use rel=”nofollow” on useless pages, and use robots.txt to keep content you don’t want crawled out.

Duplicate Content in Yahoo

Cris Pierry

If everything goes according to plan, you’re going to need to worry about how Bing handles duplicate content if you’re worried about how Yahoo handles it, but Yahoo’s Cris Pierry, Sr. Director of Search, offered a few additional tips.

Pierry says descriptive URLs should be easily readable, and it’s not a good idea to change URLs every year. In addition, use canonical, avoid case sensitivity, and avoid session IDs and parameters.

Pierry also says to use sitemaps, and submit them to Yahoo Site Explorer. Improve indexing by proper robots.txt usage, and use Site Explorer to delete URLs that you dont’ want Yahoo to index. Finally, provide feeds to Yahoo Site Explorer, and report spam sites linking to you in Site Explorer.

Yahoo says metadata and SearchMonkey are enhancing presentation.

WebProNews reporter Mike McDonald contributed to this article from SMX East.

About the author:
Chris Crum has been a part of the WebProNews team and the iEntry Network of B2B Publications since 2003. Twitter: @CCrum237

|   Subscribe to Building Mailing Lists   |   Digg

Reblog this post [with Zemanta]
SocialTwist Tell-a-Friend

About rel="canonical"


 
What is a canonical page? Why specify a canonical page?
A canonical page is the preferred version of a set of pages with highly similar content.

 
 

 
It’s common for a site to have several pages listing the same set of products. For example, one page might display products sorted in alphabetical order, while other pages display the same products listed by price or by rating. For example:

http://www.example.com/product.php?item=swedish-fish&trackingid=1234567&sort=alpha&sessionid=5678asfasdfasfd

http://www.example.com/product.php?item=swedish-fish&trackingid=1234567&sort=price&sessionid=5678asfasdfasfd

 

If Google knows that these pages have the same content, we may index only one version for our search results. Our algorithms select the page we think best answers the user’s query. Now, however, users can specify a canonical page to search engines by adding a <link> element with the attribute rel=”canonical” to the <head> section of the non-canonical version of the page. Adding this link and attribute lets site owners identify sets of identical content and suggest to Google: “Of all these pages with identical content, this page is the most useful. Please prioritize it in search results.”

How do I specify a canonical page?
To specify a canonical link to the page http://www.example.com/product.php?item=swedish-fish, create a <link> element as follows:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>

 
Copy this link into the <head> section of all non-canonical versions of the page, such as http://www.example.com/product.php?item=swedish-fish&sort=price.
If you publish content on both http://www.example.com/product.php?item=swedish-fish and https://www.example.com/product.php?item=swedish-fish, you can specify the canonical version of the page. Create the <link> element:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>

 

Add this link to the <head> section of https://www.example.comproduct.php?item=swedish-fish.

Is rel=”canonical” a suggestion or a directive?
This new option lets site owners suggest the version of a page that Google should treat as canonical. Google will take this into account, in conjunction with other signals, when determining which URL sets contain identical content, and calculating the most relevant of these pages to display in search results.

Can the link be relative or absolute?
The rel=”canonical” attribute can be used with relative or absolute links, but we recommend using absolute links to minimize potential confusion or difficulties. If your document specifies a base link, any relative links will be relative to that base link.

Must the content on a set of pages be similar to the content on the canonical version?
Yes. The rel=”canonical” attribute should be used only to specify the preferred version of many pages with identical content (although minor differences, such as sort order, are okay).

For instance, if a site has a set of pages for the same model of dance shoe, each varying only by the color of the shoe pictured, it may make sense to set the page highlighting the most popular color as the canonical version so that Google may be more likely to show that page in search results. However, rel=”canonical” would not be appropriate if that same site simply wanted a gel insole page to rank higher than the shoe page.

What happens if rel=”canonical” points to a non-existent page? Or if more than one page in a set is specified as the canonical version?
We’ll do our best to algorithmically determine an appropriate canonical page, just as we’ve done in the past.

Can Google follow a chain of rel=”canonical” designations?
Yes, to some extent, but to ensure optimal canonicalization, we strongly recommend that you update links to point to a single canonical page.

Can rel=”canonical” be used to suggest a canonical url on a completely different domain?
No. To migrate to a completely different domain, permanent (301) redirects are more appropriate. Google currently will take canonicalization suggestions into account across subdomains (or within a domain), but not across domains. So site owners can specify a canonical page on www.example.com from a set of pages on example.com or help.example.com, but not on example-widgets.com.

 

About This Article:
This article is from Google Webmaster Tools / site management – “About rel=”canonical” – Webmasters/Site owners Help” – Explained by Matt Cutts

 

|   Subscribe to Building Mailing Lists   |   Digg

Reblog this post [with Zemanta]
SocialTwist Tell-a-Friend

The Tricky Issue of Duplicate Content & What Google Says About It


Being a full-time online marketer means you have to keep a close watch on how Google is ranking pages on the web… one very serious concern is the whole issue of duplicate content. More importantly, how does having duplicate content on your site and on other people’s sites, affect your keyword rankings in Google and the other search engines?

Now, recently it seems that Google is much more open about just how it ranks content. I say “seems” because with Google there are years and years of mistrust when it comes to how they treat content and webmasters. Google’s whole “do as I say” attitude leaves a bitter taste in most webmasters’ mouths. So much so, that many have had more than enough of Google’s attitude and ignore what Google and their pundits say altogether.

This is probably very emotionally fulfilling, but is it the right route or attitude to take? Probably not!

Mainly because, regardless of whether you love or hate Google, there’s no denying they are King of online search and you must play by their rules or leave a lot of serious online revenue on the table. Now, for my major keyword content/pages even a loss of just a few places in the rankings can mean I lose hundreds of dollars in daily commissions, so anything affecting my rankings obviously gets my immediate attention.

So the whole tricky issue of duplicate content has caused me some concern and I have made an ongoing mental note to myself to find out everything I can about it. I am mainly worried about my content being ranked lower because the search engines think it is duplicate content and penalizes it.

My situation is compounded by the fact that I am heavily into article marketing – the same articles are featured on hundreds, some times thousands of sites across the web. Naturally, I am worried these articles will dilute or lower my rankings rather than accomplish their intended purpose of getting higher rankings.

I try to vary the anchor text/keyword link in the resource boxes of these articles. I don’t use the same keyword phrase over and over again, as I am nearly 99% positive Google has a “keyword use” quota – repeat the same keyword phrase too often and your highly linked content will be lowered around 50 or 60 places, basically taking it out of the search results. Been there, done that!

I even like submitting unique articles to certain popular sites so only that site has the article, thus eliminating the whole duplicate content issue. This also makes for a great SEO strategy, especially for beginning online marketers, your site will take some time to get to a PR6 or PR7, but you can place your content and links on high PR7 or PR8 authority sites immediately. This will bring in quality traffic and help your own site get established.

Another way I combat this issue is by using a 301 re-direct so that traffic and pagerank flows to the URL I want ranked. You can also use your Google Webmaster Tool account to show which version of your site you want ranked or featured: with or without the www.

The whole reason for doing any of this has to do with PageRank juice – you want to pass along this ranking juice to the appropriate page or content. This can raise your rankings, especially in Google.

Thankfully, there is the relatively new “canonical tag” you can use to tell the search engines this is the page/content you want featured or ranked. Just add this meta link tag to your content which you want ranked or featured, as in the example given below:

<link rel=”canonical” href=”place your preferred link here”>

Anyway, this whole duplicate issue has many faces and sides, so I like going directly to Google for my information. Experience has shown me that Google doesn’t always give you the full monty, but for the most part, you can follow what they say. Lately, over the last year or so, Google seems to have made a major policy change and are telling webmasters a lot more information on how they (Google) rank their index.

So if you’re concerned or interested in finding out more about duplicate content and what Google says about it try these helpful links. First one is a very informative video on the subject entitled “Duplicate Content & Multiple Site Issues” which is presented by Greg Grothaus who works for Google.

Another great link is this page from Google Webmasters Support Answers by Matt Cutts. It has a lot of helpful information, including a video on the Canonical Link Element.

In yet another post, Matt Cutts discusses the related issue of content scraping and advises webmasters not to worry about it. This is a slightly different matter, other webmasters and unmentionables may use software to scrape your site and place your content on their site. This has happened to me, countless times, including when my content has been reduced to scrambled nonsense. Cutts says not to worry about this matter as Google can usually tell the original source of the material. In fact, having links in this duplicate content may just help your rankings in Google.

“There are some people who really hate scrapers and try to crack down on them and try to get every single one deleted or kicked off their web host,” says Cutts. “I tend to be the sort of person who doesn’t really worry about it, because the vast, vast, vast majority of the time, it’s going to be you that comes up, not the scraper. If the guy is scraping and scrapes the content that has a link to you, he’s linking to you, so worst case, it won’t hurt, but in some weird cases, it might actually help a little bit.”

As a full time online marketer I am not so easily convinced, I mainly have pressing concerns about my unscrupulous competition using these scrapings and duplicate content to undermine one’s rankings in Google by triggering some keyword spam filter. Whether in fact this actually happens, only Google knows for sure, but it is just another indication, despite the very detailed and helpful information given above, duplicate content and the issues surrounding it, will still present serious concerns for online marketers and webmasters in the future.

 

About The Author
By Titus Hoskins 2009
The author is a full-time online marketer who has numerous websites. For the latest web marketing tools try: Internet Marketing Tools. If you liked the article above, why not try this Free 7 Day Marketing Course here: Marketing Tools Copyright 2009 Titus Hoskins. This article may be freely distributed if this resource box stays attached.

|   Subscribe to Building Mailing Lists   |   Digg

Reblog this post [with Zemanta]
SocialTwist Tell-a-Friend

Google Busts the Duplicate Content Myth


Talks Ways to Avoid Related Ranking Issues

While Google’s Matt Cutts has certainly provided a wealth of helpful tips via the company’s Webmaster Central YouTube channel, he is not the only one to do so. Greg Grothaus of the Search Quality Team has posted a video (along with a presentation on the Webmaster Central Blog) covering duplicate content and multiple site issues that webmasters continue to face when trying to rank well in Google.

Greg begins by clearing up a popular myth about duplicate content, and that is that Google penalizes sites for having duplicate content. This is not the case. That’s not to say that duplicate content can’t have a negative impact on your rankings, but Google itself is not penalizing you for it.

Have you believed that Google penalizes sites for having duplicate content? Comment below.

Greg says people see messages like the one below and think their content is getting omitted from Google’s results, when in fact it really may just be being omitted for that particular query. Greg stresses that duplicate content is simply a factor on a “by query” basis.

In order to show you the most relevant results, we have omitted some entries very similar to the 20 you already displayed.
If you like, you can repeat the search with the omitted results included.

“What’s actually happening, is that we’re looking at the query that the user’s doing, and we’re saying that we want diversity in the results we’re going to show a user,” says Grothaus. He says those who think their content is being omitted because it is duplicate, will likely find that if they adjust their query to more specifically reflect the missing piece, they may just find that it shows up in results after all.

Google recognizes that most duplicate content is not created to be deceptive. There are of course exceptions, which are considered spam. Grothaus says even spam sites aren’t being penalized for having duplicate content though. They’re being penalized for being spam. Just like some spammers use bold tags, he says. They don’t penalize people just for using them. And they don’t penalize people just for having duplicate content.

Duplicate Content:

  • example.com/
  • example.com/?
  • example.com/index.html
  • example.com/Home.aspx
  • www.example.com/
  • www.example.com/?
  • www.example.com/index.html
  • www.example.com/Home.aspx

The above list from Grothaus’s presentation shows examples of URLs that are different, but show the same content. Google will recognize that they’re the same, and will try to pick the right one, (although sometimes they pick the wrong one). Greg says Webmasters are the best people to know which one is best, so it helps to only use one.

You will not be penalized for using more than one, but there are some issues that can arise that may negatively affect your rankings. For one, your link popularity will be diluted. Backlinks pointing to several different URL versions of the same content, will make it harder to accumulate link juice for one URL. Greg says that user-unfriendly URLs in search results may offset branding efforts and decrease usability as well. Plus, with multiple versions of the same thing, Google will spend more time crawling the same content, meaning it will have less time to go deeper into your site, and you run the risk of having content not get indexed.

Fixing the Issues

To avoid such issues, Grothaus suggests using a “canonical” version of the URL, meaning the simplest, most significant form. He says to pick one for each page and link consistently within your site. You can also use the rel=”canonical” link element as explained by Matt Cutts in the following clip:

Rules for rel=”canonical”

There are rules for the rel=”canonical” link element to consider. For one, it should be used between pages that are on the same domain. It works across different hosts. For example, blog.webpronews.com could suggest www.webpronews.com as a canonical URL, but it doesn’t work across domains. So www.webpronews.com couldn’t suggest www.smallbusinessnewz.com.

You can use the element for protocols, such as http:// vs. https://, and you can use it for ports. Pages don’t have to be identical, but they should be similar. Slight differences are ok. You don’t have to use the rel=”canonical” link element. It is just another option, or “another tool in your arsenal,” as Grothaus says.

Another option is to make all non-canonical URLs do a permanent (301) redirect to the canonical (or preferred) URL. In addition, in Google’s Webmaster Tools, you can specify www. vs. non-www. 301 redirects are commonly used when moving sites.

Multiple Domains

Lastly, Grothaus discusses multiple domains. This is in reference to when you have content for different audiences, such as by country, language, etc.

There are concerns here. You have to consider your reputation being distributed across multiple domains, and Google will only show what it perceives to be the best page for a particular query.

One interesting factor of this to also consider, that may often go overlooked, is that with multiple domains, you’re potentially losing the advantage Google’s tabbed user interface. You know how sometimes search results are expandable and point you to different links within the site? If your content is spread out across multiple domains, you may be missing extra clicks, because Google can’t link to another domain here.

Grothaus explains all of the above and elaborates on each point in the following fifteen -minute video. The information is based on his presentation from the recent Search Engine Strategies conference in San Jose.

See a WebProNews interview from SES with Grothaus here as well:

Did this information clear up any misconceptions you had about duplicate content? Let WebProNews know.

 

 
About the Author:
Chris Crum has been a part of the WebProNews team and the iEntry Network of B2B Publications since 2003. Twitter: @CCrum237

|   Subscribe to Building Mailing Lists   |   Digg

Reblog this post [with Zemanta]
SocialTwist Tell-a-Friend