Using proxies with Python Requests is essential for web scraping and automating workflows. Proxies allow you to hide your real IP address, bypass geographic restrictions, rotate IPs to avoid getting blocked, and more.
In this comprehensive guide, you‘ll learn how to configure, authenticate, and rotate proxies with the Python Requests library.
Contents
Why Use Proxies with Python Requests?
Here are some of the main reasons you may want to use proxy servers with Requests:
-
Avoid getting blocked – Websites have bot detection systems that can blacklist your IP if you send too many requests. Proxies allow you to rotate IPs and avoid getting blocked.
-
Bypass geographic restrictions – Some sites restrict content based on location. Proxies let you appear from different countries and access geo-restricted content.
-
Hide your identity – Your real IP reveals your location and identity. Proxies add a layer of anonymity to your web requests.
-
Access the internet from anywhere – If you are in a network with firewalls or internet restrictions, proxies allow you to bypass them.
-
Improve performance – Proxies can help distribute requests to multiple IPs, allowing faster scraping and data collection.
-
Debug requests – Logging requests through proxies makes it easier to monitor and troubleshoot your scripts.
Prerequisites
Before using proxies with Python Requests, you‘ll need:
-
Python 3 – The latest Python 3 version. Python 2 won‘t work.
-
Requests Module – Install it via
pip install requests
-
A code editor – Any editor like VS Code, Atom, Sublime, etc.
-
Proxy addresses – A list of proxy IPs and credentials. You can get free public proxies or purchase private ones.
Basic Proxy Setup
Setting up a proxy with Requests involves just passing the proxy URL with your request:
import requests
proxies = {
‘http‘: ‘http://192.168.1.1:8000‘,
‘https‘: ‘http://192.168.1.1:8000‘,
}
response = requests.get(‘https://example.com‘, proxies=proxies)
Here we define a dictionary called proxies
with the proxy URLs for HTTP and HTTPS traffic. The IP address and port will depend on your specific proxy server.
Then we pass proxies
as a parameter in the requests.get()
method to route the request through the proxy.
This will route all calls through the proxy server instead of your own IP.
Using SOCKS Proxies
Besides basic HTTP proxies, Requests also supports SOCKS protocols like SOCKS4 and SOCKS5.
Here‘s how to use a SOCKS5 proxy:
proxies = {
‘http‘: ‘socks5://192.168.1.1:8000‘,
‘https‘: ‘socks5://192.168.1.1:8000‘
}
requests.get(‘https://example.com‘, proxies=proxies)
Notice the URL scheme is now socks5://
instead of http://
. The rest of the code remains the same.
Authenticating Proxies
Some proxies require authentication with a username and password before use.
To authenticate proxies, include the credentials in the proxy URL like so:
proxies = {
‘http‘: ‘http://user:[email protected]:8000‘,
‘https‘: ‘http://user:[email protected]:8000‘
}
Make sure to use your actual username and password configured on the proxy server.
This will authenticate all requests made through this proxy.
Using Proxy Sessions
When you need to make multiple requests with the same proxy, it‘s better to use sessions instead of passing the proxies
dict every time.
Sessions allow connection reuse and are faster:
session = requests.Session()
proxies = {
‘http‘: ‘http://192.168.1.1:8000‘,
‘https‘: ‘http://192.168.1.1:8000‘,
}
session.proxies = proxies
response = session.get(‘https://example.com‘)
Here we create a new Session()
object and set the proxies on it. All further requests made through this session will route through the assigned proxy.
This avoids reconfiguring the proxy on each request.
Setting Proxy Environment Variables
Hardcoding proxy credentials in your script isn‘t ideal, especially in teams. A better practice is using environment variables.
Here‘s how to configure proxy environment variables:
On Windows:
set http_proxy=http://username:[email protected]:8000
set https_proxy=http://username:[email protected]:8000
On Linux/MacOS:
export http_proxy=http://username:[email protected]:8000
export https_proxy=http://username:[email protected]:8000
Then access them in Python:
import os
proxies = {
‘http‘: os.environ[‘http_proxy‘],
‘https‘: os.environ[‘https_proxy‘]
}
requests.get(‘https://example.com‘, proxies=proxies)
This keeps the proxy credentials separate from code.
Rotating Proxies with Python Requests
Rotating proxies helps distribute requests across multiple IPs and avoid getting blocked by websites.
To rotate proxies:
-
Create a list of proxy URLs you want to rotate through:
proxy_list = [ ‘http://user:[email protected]:8000‘, ‘http://user:[email protected]:8000‘, ‘socks5://user:[email protected]:8000‘ ]
-
On each request, randomly select a proxy:
import random # Select random proxy proxy = random.choice(proxy_list) proxies = {‘http‘: proxy, ‘https‘: proxy} response = requests.get(url, proxies=proxies)
-
Alternatively, you can iterate through the list sequentially:
proxy_iter = iter(proxy_list) for url in url_list: proxy = next(proxy_iter) # Code to make request
This way your requests use a different proxy every time, making it harder to detect and block your traffic.
Make sure to use a large pool of proxies and implement automatic proxy rotation in your code for seamless scraping at scale.
Conclusion
Configuring proxies in Python Requests is easy and offers many advantages like anonymity, geo-unblocking, and preventing bans.
The key steps are:
-
Define a proxy dictionary with URLs
-
Pass the proxies parameter to route requests through them
-
Use proxy sessions for multiple requests
-
Authenticate if required
-
Rotate proxies randomly or sequentially
With this guide, you should be able to start using proxies for Python web scraping and automation. Combine Requests with a good proxy service to extract data faster and more reliably.