How to Block Content Scrapers



Content scraping it proliferate on the web. Why be original when you can just steal someone elses work and it’s rather easy. Just get a site’s link to their RSS feed and import it into your site and voila, instant content.

Many scrapers will pull snippets and text from websites that rank high for keywords they have targeted. This way they hope to rank highly in the search engine results pages (SERPs). Wikipedia

Scraping content on the Internet is nothing new. As a blogger you don’t want other sites using your content as their own. While you will be the originator of the content in the eyes of the search engine, and the site scraping your content would have “duplicate content” with little or no value, there is still the possibility of that site getting traffic and making money off of your content. Also, if another website or blog is linking to your blog post and it is a spam site, it can hurt you in the eyes of Google and other search engines.

The best thing to do is block the scrapers and scammers. This is a simple process and I will show you how.

1. First, you must find their IP address. If you use WordPress this is pretty easy.

In WordPress you are able to see who is linking to your blog post through the comment system.

2. Click on Comments on the left navigation bar and look for any sites that appear to be linking to you. You will see their IP address listed below the URL to their site.

how to block content scrapers

3. Once you have their IP address, write it down or copy it.

4. Launch the FTP software you use to log into your site. I suggest FileZilla. It’s Free.

You will need to know your site’s FTP user name and password to log in. You can get this from your web host or even create it yourself with your site’s Control Panel.



5. Once you log into FTP, scroll the Filename list on the right hand side of the screen (the remote site section) and double-click on www to access the root of your site.

how to block content scrapers, modify the htaccess file

6. Look for the .htaccess file.

how to modify the htaccess file

7. Right-click on the .htaccess file and left-click View/Edit.

how to modify the htaccess file

The .htaccess file will open in your computer’s Notepad program.

8. Scroll down to the bottom of the file and press Enter a couple of times.

9. Type deny from and then the IP address of the scraper site.

Example:

how to block content scrapers

10. Save the .htaccess file and click OK to save the changes to your server.

You will need to add a new Deny line for each IP you find that is linking to you or scraping your content.

Leave A Comment

*