Data Science Portfolio

Scrape Proxy Table with pandas

Overview

proxy-table-1.png

Import libraries

import requests
import pandas as pd

Set user agent header and get website response

url = 'https://hidemy.name/en/proxy-list/'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

Get all tables from the HTML

tables = pd.read_html(r.text)

print(f'Total tables: {len(tables)}')
Total tables: 1

Store proxy list table in pandas DataFrame

df = tables[0]

df.head()
IP address Port Country, City Speed Type Anonymity Latest update
0 180.180.170.188 8080 Thailand 4560 ms HTTP no 41 seconds
1 223.27.194.66 63141 Thailand 1900 ms HTTP no 43 seconds
2 101.51.55.153 8080 Thailand Don Chedi 3040 ms HTTP no 43 seconds
3 122.154.35.190 8080 Thailand Panare 1880 ms HTTP no 43 seconds
4 203.23.106.190 80 Cyprus 480 ms HTTP no 1 minutes