The more popular your blog becomes, the more likely your content is likely to be stolen by blog scrapers. This can be incredibly frustrating, particular when you have spent a lot of time researching and writing your articles. Any blog which has over a thousand subscribers is likely having it’s content scraped, it’s just something which popular WordPress sites need to deal with.
You may have read some articles on how you can stop blog scrapers. In reality, the only way you can stop people from stealing your content from your blog is to not offer a full feed (i.e. change from full text to summary in the wp-admin/options-reading.php page). And many large websites do just that, but for most people that isn’t an option as RSS subscribers would drop considerably. After all, the reason 99% of people subscribe to a blogs RSS feed is so they can read the website through their newsreader.
So if you do offer a full feed (which I believe you should), you will inevitably have your content scraped at some point. I can speak from experience when I say that there is little you can do when you find a website which is stealing your content. You can report the website to any company which they are advertising through (e.g. Google Adsense) and you can contact the website host too. Though you will rarely get a response, and if you do and the site is removed, you will find that several more sites are scraping your content. And the whole process starts again.
Put simply, it’s better to spend your time working on your website rather than chasing these content thieves 24/7 (though admittedly, if I find someone scraping my blog I usually complain via Google Adsense using their form).
Get your own back!
Do not despair! Content scrapers are, by their very nature, incredibly lazy. So whilst there might be no great way of stopping RSS scrapers, there are lots of ways to get your own back.
Blog scrapers automatically pull data from your RSS feed and rarely change any of the content, which means that it’s in your best interests to link back to your own blog heavily within your posts. It’s worth adding a copyright notice too. You would think that these measures would discourage them from scraping your content in the first place, but in practice you’ll find that most scrapers don’t even bother to check such things.
Unfortunately, you can’t stop them from hot linking your images. With any other website you can stop external websites from hot linking your article images using htaccess, however doing this would stop many feeds from displaying your images too, therefore it cannot be used.
I will now show you some great WordPress plugins which will help you reduce the damage done by blog scrapers.
WordPress Plugins to help you fight Blog Scraping
Adding a copyright to your RSS content is a quick and easy way to inform the world that the content they are reading is from your blog. It’s also useful for linking back to the original article and/or your website.
Below is a small list of copyright plugins which are available for WordPress. For WP Mods I’m currently using Yoasts RSS Footer plugin to add a link back to the article and the website at the top of every post and Simple Feed Copyright for a standard copyright notice at the end of the post.
- RSS Footer
- Simple Feed Copyright
- Blog Copyright
- Auto Copyright
- WP Copyrighted Post
- Anti Feed-Scraper Message
- Copyright Proof
- RSS License
Anti Scraper Bot Plugins
There are plugins available which try and stop known agents which scrape content. These do work well though since bots and unfriendly agents are constantly changing the ip address they are using, it is not full proof. However, it is another step which you can take to reduce RSS scraping.
- TTC WordPress Security Tool
- Shantz WordPress Prefix Suffix
More Information in your RSS Feed
It’s possible to add more information to your RSS Feed. For example, the Yet Another Related Posts Plugin adds a related posts list to your website and your RSS feed. This encourages readers to read more articles on your site and adds more incoming links to your RSS feed.
Many social media plugins also let you add voting buttons to your feed too. Though I prefer to use the FeedFlare service which Feedburner offers.
You can also expand things even further. Why not replace a text link copyright with a banner image back to your blog to increase the exposure you get on content stealing websites.
Taking it Further
If copyright infringement and plagiarism is becoming a really big problem with one or two sites in particular, you may wish to take things further and go down a legal route. You can sometimes catch scrapers as they send a pingback back to your blog. Though this doesn’t always happen so it’s worth checking your site at a plagiarism checker such as CopyGator or Copyscape.
Jonathan Bailey, who used to work with me closely on one of my other blogs, knows a lot about copyright issues and discusses them regularly on his blog Plagiarism Today. I encourage you search through his website to find out more about what you can do to stop content theft. In particular you should find the Cease and Desist letters helpful :)