Content scraping is proliferate on the web. Why be original when you can just steal someone elses work? Of course, we know this is wrong but it is rather easy to do. Just get a site’s link to their RSS feed and import it into your site and voila, instant content.
Many scrapers will pull snippets and text from websites that rank high for keywords they have targeted. This way they hope to rank highly in the search engine results pages (SERPs). Wikipedia
Scraping content on the Internet is nothing new. As a blogger you don’t want other sites using your content as their own. While you will be the originator of the content in the eyes of the search engine, and the site scraping your content would have “duplicate content” with little or no value, there is still the possibility of that site getting traffic and making money off of your content. Also, if another website or blog is linking to your blog post and it is a spam site, it can hurt you in the eyes of Google and other search engines.
The best thing to do is block the scrapers and scammers. This is a simple process and I will show you how.
1. First, you must find their IP address. If you use WordPress this is pretty easy.
In WordPress you are able to see who is linking to your blog post through the comment system.
2. Click on Comments on the left navigation bar and look for any sites that appear to be linking to you. You will see their IP address listed below the URL to their site.
3. Once you have their IP address, write it down or copy it.
4. Launch the FTP software you use to log into your site. I suggest FileZilla. It’s Free.
You will need to know your site’s FTP user name and password to log in. You can get this from your web host or even create it yourself with your site’s Control Panel.
5. Once you log into FTP, scroll the Filename list on the right hand side of the screen (the remote site section) and double-click on www to access the root of your site.
6. Look for the .htaccess file.
7. Right-click on the .htaccess file and left-click View/Edit.
The .htaccess file will open in your computer’s Notepad program.
8. Scroll down to the bottom of the file and press Enter a couple of times.
9. Type deny from and then the IP address of the scraper site.
Example:
10. Save the .htaccess file and click OK to save the changes to your server.
You will need to add a new Deny line for each IP you find that is linking to you or scraping your content.
Leave a Reply