Your support helps keep our site running! We might earn a small referral fee when you purchase from links in this post, at no extra cost to you, which we REALLY appreciate. All opinions remain our own as always.
Trick to Bypass Bot Blockers When Web Scraping or Crawling
Gathering Competitor Data on Amazon and Ebay with Web Scraping CAN Be Done!
If you find yourself scraping sites like Amazon or eBay, you will notice that there comes a point when Amazon sends you a message that says thank you but no more. Amazon and many other sites limit the amount you can scrape their websites. These are usually called ‘bot blockers.’
DISCLAIMER: This article is written with the intention of using web scraping for the SOLE purposes of research and analytics. It is NEVER acceptable to use web scraping to copy another site’s information for personal use or any other use other than data research. Please use a strong sense of ETHICS any time you consider scraping a website for information, and be aware of the laws pertaining to how you use the data you acquire.
The problem with gathering data from websites:
Often, websites block web crawlers, which is what scraping uses to gather the information, to prevent the site from overloading, or so they say. I personally think they use bot blockers to prevent someone from gathering data for free, but that is just me.
The Master Trick revealed:
If you find yourself in this type of situation, here is a neat trick to try. This should work regardless of whether you use R (my tool of choice) or Python (another excellent choice).
The idea behind this tip is to make a text file from the HTML. That’s all HTML is, really- a bunch of text with neat little symbols around certain words. So let’s forget the symbols for a while, shall we?
Two of the common packages you can use to successfully scrape a website are:
(click the images below to explore these further)
You will need to do this inside your browser, I use Chrome and right-click and select Inspect.
Or if you are a keyboard shortcut fan, you can press Control+Shift+I if using a Windows computer.
(I’m not sure what the shortcut is for Mac.)
This will bring up the website code on the right.
Now go to the top of the code and look for the HTML tag.
This will look something like this <html> or <html lang
You want to move your mouse on this and right-click. Then select copy.
This is where you want to paste your code.
After you paste your code in the notepad, save the document as a .txt file.
And bam, you have bypassed the web crawler blocker.
Now run your script on the HTML file you just created.
I would only recommend doing this if you need a dozen or so pages scraped. If trying to scrape more pages than that, it becomes a pain in the rear as it takes quite a while.
You will also need to have your script prebuilt for the code. You can do this in conjunction with the saved HTML text file and the actual site. This is a great way to get your script ready for future pages.
And a great way to save you money and help you grow your business, providing you with a great market analysis tool.