read_html() method will be used to extract all tables from the HTML
import requests
import pandas as pd
get() method from the requests libraryurl = 'https://hidemy.name/en/proxy-list/'
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
r = requests.get(url, headers=header)
r.text (the text of the response) as our input to the read_html() methodread_html() method returns a list of all tables from the HTML stored in the response rtables = pd.read_html(r.text)
print(f'Total tables: {len(tables)}')
Total tables: 1
tables will only contain one elementdf.head() and comparing with the screenshot above confirms that the table has been extracted successfullydf = tables[0]
df.head()
| IP address | Port | Country, City | Speed | Type | Anonymity | Latest update | |
|---|---|---|---|---|---|---|---|
| 0 | 180.180.170.188 | 8080 | Thailand | 4560 ms | HTTP | no | 41 seconds |
| 1 | 223.27.194.66 | 63141 | Thailand | 1900 ms | HTTP | no | 43 seconds |
| 2 | 101.51.55.153 | 8080 | Thailand Don Chedi | 3040 ms | HTTP | no | 43 seconds |
| 3 | 122.154.35.190 | 8080 | Thailand Panare | 1880 ms | HTTP | no | 43 seconds |
| 4 | 203.23.106.190 | 80 | Cyprus | 480 ms | HTTP | no | 1 minutes |