Scrapers are the bane of any blogger’s existence. Web scraping sweeps in, steals your content, claims it is their own, and sometimes there is no way of proving otherwise. Surprisingly, Google hasn’t been too smart at identifying the original content author in many cases. Very often, my Google Alerts notify me of my scraped articles rather than my original (guest) posts and I’ve seen scrapers outranking original articles for long tail searches many times.
There is occasionally a story of a blogger who managed to get back the rights to their content – but it’s more like fighting the windmills. You kill one scraping blog and dozens of them get born overnight. Therefore it is much better to try to prevent scraping (or at least get labeled as the original author) rather than rely on being one of those rare successes.
Plugins to Prevent Web Scraping
Google has been trying to fight scrapers for ages and one of its patents (which is part of AuthorRank patents) suggest using authorship to:
“. . .detect and protect against revision of content after it has been posted by a person or entity.”
Implementing Google Authorship is much easier nowadays (here’s a quick guide), but on many blog set-ups (where there’s no author byline, for example), it can still cause confusion. In these cases, this plugin will help.
It allows you to add a G+ profile picture to search results, confirm authorship, and even grant authorship to multiple authors. It works on a three step system that is very easy to follow, and there are no bugs to worry about.
2. Feed Delay
Half the risk to a small to medium sized blog is having a scraper bot picking up content, publishing it without attribution and then getting the page indexed first (weirdly enough, Google hasn’t been able to knock these sites off or even find the original owner of the content).
Since there are probably at least a couple of bots hiding in your RSS subscriptions, your best bet is to delay feed from being reposted. This plugin will do that for you.
Most scraping is done by bots, without any actual oversight from humans. So they have no control over what content is published, or how. This is a major plus for you, as you can add a link to your blog in all content, which will show up upon reposting.
Anti Feed-Scraper Message does this, showing Google and all readers where the post originally came from. It also keeps any accusations from the message, so protects you from scandal claims by the scrapers. The message reads: [Post Name] originally appeared on [Site Name] on [Post Date].
Along with the one above, this plugin can be used. It allows you to digitally certify your ownership at the time of publication, making a certificate that you can show in the case of someone stealing your content. It has a copyright, licensing and attribution license at every post, as well. There is an additional feature for anti-theft if you choose to use it.
Do you know of a good plugin for protecting content against Scrapers? What about outside of WordPress?