Amazon Product Scraper – Amazon Data Extractor

What is Amazon Product Scraper

Have you decided to start a business on Amazon? Do you know what things to sell on Amazon and how to research your competitors? Lucky for you, it is not that hard to sell on Amazon and recognize the best selling products and niches for the best price. All you need to do is some research, and you are good to start selling big on Amazon.

Here is where Amazon Product Scraper fits perfectly in your business success. Of course, you can do all the research manually yourself. But time is money, and Amazon Product Scraper gives you the benefit of doing the job for you.

The Amazon.com website is shown on a laptop in the background and a person holds a bank card.
Amazon Product Scraper

Automate Data Extraction From Amazon

Amazon Product Scraper helps in automating data extraction from Amazon. It’s a software that extracts important data from products, selling on Amazon, like: 

  • Product name,
  • Price,
  • Short description and full product description,
  • Image URL,
  • Number of reviews and ratings,
  • ASINs,
  • Product’s category,
  • Quantity,
  • Shipping costs,
  • Number of sellers, and
  • Much more.

With Amazon Product Scraper, you collect information from different products and online retailers. With all information on simple dashboards, you can sell your things on Amazon faster, better, and at lower costs.

Scraper bot thin line concept vector illustration.
Data Scraping

How to Take Advantage of the Data From Scraper

Amazon product scraper gives you insights on what products you should sell on Amazon. 

Boost Amazon product positioning by optimizing description, pictures, and other details according to proven best practices.

How to Start Product Scraping

There are two ways you can start product scraping. Connect with Loginworks Softwares, and we’ll provide you the data you need: 

The second option is that you do it yourself. We’ll give you an example of an Amazon product scraper using Python 3. To start, you will need: 

  • Python Requests, to make requests and download the HTML content of the Amazon product pages and
  • SelectorLib python package to extract data using the YAML file 

Select Scraping Data

When you decide on the data, you want to extract, mark them using Selectorlib, and save the file in the same directory as the code. Let’s name the file “product.data.yml”

Selectorlib combines online tools for developers that makes marking up and extracting data from web pages easy.

t
images:
css: '.imgTagWrapper img'
type: Attribute
attribute: data-a-dynamic-image
rating:
css: span.arp-rating-out-of-text
type: Text
number_of_reviews:
css: 'a.a-link-normal h2'
type: Text
variants:
css: 'form.a-section li'
multiple: true
type: Text
children:
name:
css: ""
type: Attribute
attribute: title
asin:
css: ""
type: Attribute
attribute: data-defaultasin
product_description:
css: '#productDescription'
type: Text
sales_rank:
css: 'li#SalesRank'
type: Text
link_to_all_reviews:
css: 'div.card-padding a.a-link-emphasis'
type: Link
name:
css: '#productTitle'
type: Text
price:
css: '#price_inside_buybox'
type: Text
short_description:
css: '#featurebullets_feature_div'
type: Tex

The Code

Create a folder called amazon-scraper and paste your SelectorLib yml template file as “product.data.yml.” 

Create a file called amazon.py and paste the code below into it. It will: 

  • Read a list of Amazon Product URLs from a file called urls.txt
  • Scrape the data
  • Save the data as a JSON Lines file
# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('selectors.yml')
def scrape(url):
headers = {
'authority': 'www.amazon.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'dnt': '1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-dest': 'document',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
}
# Download the page using requests
print("Downloading %s"%url)
r = requests.get(url, headers=headers)
# Simple check to check if page was blocked (Usually 503)
if r.status_code > 500:
if "To discuss automated access to Amazon data please contact" in r.text:
print("Page %s was blocked by Amazon. Please try using better proxies\n"%url)
else:
print("Page %s must have been blocked by Amazon as the status code was %d"%(url,r.status_code))
return None
# Pass the HTML of the page and create
return e.extract(r.text)
product_data = []
with open("urls.txt",'r') as urllist, open('output.jsonl','w') as outfile:
for url in urllist.readlines():
data = scrape(url)
if data:
json.dump(data,outfile)
outfile.write("\n")
# sleep(5)

Run Amazon Product scraper 

You can get the full code from Github – https://github.com/scrapehero-code/amazon-scraper

You can start your scraper by typing the command: python3 amazon.py

Once the scrape is complete, you should see a file called output.jsonl with your data. Here is an example for the URL

https://www.amazon.com/Apple-iPhone-Locked-Carrier-Subscription/dp/B08L5MJTCP/ref=sr_1_4?crid=3VMQ0B0AH5G6S&dchild=1&keywords=iphone+12&qid=1608275049&sprefix=iphone+%2Caps%2C383&sr=8-4

{'name': 'New Apple iPhone 12 (256GB, Blue) [Locked] + Carrier Subscription',
'product_description': None, 'sales_rank': None, 'link_to_all_reviews': '/Apple-iPhone-Locked-Carrier-Subscription/product-reviews/B08L5MJTCP/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews'}
'price': None,
'short_description': None,
'images': None,
'rating': None,
'number_of_reviews': '23 customer reviews',
'variants': None,
data==>> {'name': 'New Apple iPhone 12 (256GB, Blue) [Locked] + Carrier Subscription', 'price': None, 'short_description': None, 'images': None, 'rating': None, 'number_of_reviews': '23 customer reviews', 'variants': None, 'product_description': None, 'sales_rank': None, 'link_to_all_reviews': '/Apple-iPhone-Locked-Carrier-Subscription/product-reviews/B08L5MJTCP/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews'}

Why Use Amazon Scraper and Not a Manual Search

Amazon is very likely to flag you as a “BOT” if you start scraping hundreds of products – manually or by some extension, program, or DIY tool. The idea is to avoid getting flagged as BOT while scraping and running into problems. How do we solve such challenges?

If you decide to do manual research, mimic human behavior as much as possible. Why take the chance if you can turn to Loginworks, which eliminates the risk for you and provides you over a million scraped data in a short period of time?

Are you just starting to sell on Amazon and don’t know what products you should sell on Amazon? We provide some guidance and ideas for you in the blog.

Latest posts by Rahul Huria (see all)

Leave a Comment