How to use urllib2 Python module

urllib2 is a Python module for fetching URLs. It offers a very simple interface, in the form of the urlopen function. The urllib2 is capable of fetching URLs using a variety of different protocols and it also supports slightly complex interface for handling common situations – like basic authentication, cookies, proxies and so on.

With urllib2 we can write small web clients to automate the web tasks or testing the services or just for debugging purposes. The urllib2 supports fetching URLs for many URL schemes like ‘http’, ‘ftp’, ‘file’ etc. Now lets see the basic examples of using urllib2 :

import urllib2

response = urllib2.urlopen('http://sec-art.net')
print response.info()
response.getcode()
html = response.read()
print html
response.geturl()
response.close()

The above code will send GET request, where urlopen() method send request to the provided url and returns a reponse object for the requested url. This response is a file-like object, we can use read(), readlines() like methods in it. The info() method return the meta-information of the page, such as headers. And also to print the various headers saparately use :

print response.info()['Server']
print response.info()['Date']
print response.info()['Content-Type']

The getcode() method returns the status code of response, or we can also use just response.code. At the above html = response.read() stores the response data into html variable. Now with print html we can print all the returned data. The geturl() returned the source url. The close() method will close the file.

generally the urlopen() method will send the request to the web source, but if we need to add some data or send custom request then there is another class object called Request is available , which gives more control over what request will sent. The request function under the urllib2 class accepts both url and parameter. Also note that at urllib2 library when we send a request with any data or parameter then it is going to be GET request and when we send request with data or parameters then it will be going to POST request. Now lets see some examples :

import urllib2
req = urllib2.Reqest('http://sec-art.net')
response = urllib.urlopen(req)
html = response.read()
headers = response.info()
print headers['Content-Type']

In the above code the last two lines are store all headers data in headers variable and than prints the ‘Content-Type’.

Sending POST Request :

In order to send post rquest we need to store the post data in a dictonary (key, value) format, then encode that data and after then sent it along with the request. To encode the data we use urlencode() method which is availabe in urllib module. So the code will be :

import urllib
import urllib2
url = 'http://httpbin.org/post'
values ={   
  'name' : 'Ajay',
  'Location' : 'Earth',
  'Activity' : 'unknown',
  'Language'  : 'Binary',
  'Age' : '30+'
}

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
html = response.read()
ptint html

We can also send POST request without using Request method. For example :

import urllib
import urllib2
url = 'http://httpbin.org/post'
values ={   
  'name' : 'Ajay',
  'Location' : 'Earth',
  'Activity' : 'unknown',
  'Language'  : 'Binary',
  'Age' : '30+'
}

data = urllib.urlencode(values)
response = urllib2.urlopen(url, data)

Sending GET Request :

Sending GET request is very easy in urllib2. But for supplying arguments we need to encode the arguments with urlencode as we above did. For

import urllib
import urllib2

args = {'name' : 'Ajay', 'class' : '3rd year'}
encoded_args = urllib.urlencode(args)
url = 'https://httpbin.org/get?' + encoded_args
print urllib2.urlopen(url).read()

Sending GET requests by using Request function’s add_data

import urllib
import urllib2

args = {'name' : 'Ajay', 'class' : '3rd year'}

req = urllib2.Request('http://httpbin.org/get')
req.add_data(urllib.urlencode(args))
response = urllib2.urlopen(req)
print response.read()

Custom Headers :

Sending requests with custom headers :

import urllib2

req = urllib2.Request('http://httpbin.org')
req.add_header('Referer', 'http://sec-art.net')
req.add_header('User-Agent', 'Mozilla/58.0')
response = urllib2.urlopen(req)
print response.read()

Sending headers with post data :

import urllib
import urllib2

url = 'http://httpbin.org/post'

user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
headers = {'User-Agent': user_agent}

values = {
    'name': 'Michael Foord',
    'location': 'Northampton',
    'language': 'Python' 
}
data = urllib.urlencode(values)

req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
print response.read()

For further reading please visit below links :

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.