Getting product data from Amazon with Python 3.7

A few months ago I launched an affiliate site written in Python, using the Flask web framework to serve content. The project made for an interesting (and potentially profitable!) way to learn how to build basic web apps with Python. Here’s a little bit of that code!

There are two ways we can get bulk data from Amazon: by scraping it, or through the official Amazon Product API. In this project I’ll focus on the API, but scraping has its time and place.

To access the Product API, you’ll need to provide your access key, secret key and affiliate tag. If you don’t already have those handy, then your first step is to sign up for the Amazon Associates affiliate program.

I’m using Python 3.7+. You’ll also want the bottlenose Amazon API wrapper library to abstract away a lot of complexity of Amazon’s API. Let’s start by installing the library.

you@yourterm: pip install bottlenose

This first part is easy. You need a script. I named mine something like amazon_product_api.py. Begin by importing the bottlenose library and defining some basic variables. You’ll need some code that allows you to interface with the Product API.

import bottlenose

ACCESS_KEY = 'your-access-key'
SECRET_KEY = 'your-secret-key'
AFFILIATE_TAG = 'your-amazon-associate-tag'

amazon = bottlenose.Amazon(ACCESS_KEY, SECRET_KEY, AFFILIATE_TAG, MaxQPS=0.9)

Here the MaxQPS parameter refers to the query limit enforced by Amazon: one query per second per associate tag. I set my default value to just under one second — let’s not risk it. Avoid being sighted by the netopticon!

The ASIN is the unique identifier for all Amazon products. You’ll see it in almost any Amazon URL querystring, like this one: https://www.amazon.com/dp/1491946008/?tag=ecm04-20

The ASIN is typically in BXXXXXXXXX format. In this particular case, 1491946008 is actualy ISBN which references a book. Both ASINs and ISBNs are used by a bottlenose ItemLookup in the same way, so it’s not necessary to make the distinction now.

You can see this when you perform a simple lookup of product 1491946008.

# Initialize the Amazon API
response = amazon.ItemLookup(ItemId="1491946008")

print(response)

You should see the product XML printed in your terminal. Hello, world!

<ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2013-08-01">
	<OperationRequest>
		<HTTPHeaders>
			<Header Name="UserAgent" Value="Python-urllib/3.6"></Header>
		</HTTPHeaders>
		<RequestId>68b5acdc-ac38-43d7-ad75-61fad0592d29</RequestId>
		<Arguments>
			<Argument Name="AWSAccessKeyId" Value="censored-access-key"></Argument>
			<Argument Name="AssociateTag" Value="ecm04-20">
			
			...
			

Before before you go any further, the script should be fleshed out with rudimentary error handling. You’ll want to handle errors appropriately when something goes wrong.

Amazon will respond with a 2XX error if a required parameter is missing. A 4XX error suggests bad syntax or some other issue in the client side request. If you see 400 Bad Request errors, you should check your Amazon Associates API status. It may be as simple as submitting the wrong affiliate tag, or it could mean your API privileges have been revoked.

5XX is a server error. For example, a 503 error means you’re sending requests too fast and have been throttled. You don’t want to see that very often. By adding the below error handling code, if you hit a 503 error, bottlenose will retry every two seconds until the server stops responding with 503s.

Here’s the code so far.

import bottlenose
import random
import time
from urllib2 import HTTPError

ACCESS_KEY = 'your-access-key'
SECRET_KEY = 'your-secret-key'
AFFILIATE_TAG = 'your-amazon-associate-tag'


def error_handler(err):
    ex = err['exception']
    if isinstance(ex, HTTPError) and ex.code == 503:
        time.sleep(2)
        return True


amazon = bottlenose.Amazon(ACCESS_KEY, SECRET_KEY, AFFILIATE_TAG, MaxQPS=0.9, ErrorHandler=error_handler)

response = amazon.ItemLookup(ItemId="1491946008")

print(response)  

Okay, it’s nice, but it’s not very useful yet. You’ll need to parse that XML to make any sense of it.

To do that, you need another third party library. There are a number of XML parsing libraries for Python to be found in GitHub, but I stuck with the solid and well known BeautifulSoup.

you@yourterm: pip install beautifulsoup4

Now we can do just about anything with the XML. In my original project, I chose to save the XML data to a file in a directory. I’ll skip that step in this tutorial, but it’s left in as an exercise for the reader :p

We don’t necessarily need a named function to handle the XML. The bottlenose library enables us to inject BeautifulSoup as a parser via a lambda function.

from bs4 import BeautifulSoup

# ...

amazon = bottlenose.Amazon(
    ACCESS_KEY, SECRET_KEY, AFFILIATE_TAG, MaxQPS=0.9,
    ErrorHandler=error_handler,
    Parser=lambda text_response: BeautifulSoup(text_response, 'xml'))

# ...

You can pull out individual items from the XML response.

response = amazon.ItemLookup(ItemId="1491946008")

print(response.Title.string)
print(response.ASIN.string)

At this point you can continue to display these results on your site or insert them into a database.

Here’s the full code sample:

import random
import time
from urllib2 import HTTPError
from bs4 import BeautifulSoup

ACCESS_KEY = 'your-access-key'
SECRET_KEY = 'your-secret-key'
AFFILIATE_TAG = 'your-amazon-associate-tag'

def error_handler(err):
    ex = err['exception']
    if isinstance(ex, HTTPError) and ex.code == 503:
        time.sleep(2)
        return True

amazon = bottlenose.Amazon(
    ACCESS_KEY, SECRET_KEY, AFFILIATE_TAG, MaxQPS=0.9,
    ErrorHandler=error_handler,
    Parser=lambda text_response: BeautifulSoup(
        text_response,
        'xml'))

response = amazon.ItemLookup(ItemId="1491946008")

print(response.Title.string)
print(response.ASIN.string)
print(response.FormattedPrice.string)

Okay, this is a good starting point for a script to pull product data to use in affiliate marketing efforts. But there’s really not much to it yet — it needs a lot of work to be truly useful.

In a future post, I’ll show more in-depth usage of Amazon’s API to search for a larger number of products by title. This will enable us to acquire bulk product data from Amazon. We’ll use SQLAlchemy to populate a database with this information.

Latest Posts

Leave a Reply