In this tutorial, I will show you how to get information for any domain name or IP address using the whois library in Python. WHOIS is a popular internet record listing that contains all contact information for the person, group, or company that registered a particular domain. The whois library simply queries the WHOIS database directly to retrieve domain information. You can install the library with the following command:
pip install python-whois
In a previous tutorial, I demonstrated how to scrape a list of free proxies from the spys.one website using Selenium. Let’s use one of the proxies we scraped in this tutorial, which were saved in a CSV file. First, we’ll write a function to check if a domain name is registered or not.
import whois
import pandas as pd
import numpy as np
# Return a boolean indicating whether a domain is registered
def is_registered(domain_name):
try:
w = whois.whois(domain_name)
except Exception:
return False
else:
return bool(w.domain_name)
# Load proxies into DataFrame
df = pd.read_csv("spys-proxy-list-30.csv")
# Add new columns
df["Registered"] = ""
df["Registrar"] = ""
df["WHOIS server"] = ""
df["Creation date"] = ""
df["Expiration date"] = ""
# Iterate over rows
for i in range(len(df)):
domain_name = df.loc[i, "Proxy address"].split(":")[0]
# Check if proxy is registered and assign values to DataFrame accordingly
if is_registered(domain_name):
whois_info = whois.whois(domain_name)
# Assign values to DataFrame
df.loc[i, "Registered"] = "yes"
df.loc[i, "Registrar"] = str(whois_info.registrar)
df.loc[i, "WHOIS server"] = str(whois_info.whois_server)
# For creation_date, check if a list is returned in any cases
if type(whois_info.creation_date) == list:
df.loc[i, "Creation date"] = whois_info.creation_date[-1].strftime("%Y-%m-%d %H:%M:%S")
else:
df.loc[i, "Creation date"] = str(whois_info.creation_date)
# For expiration_date, check if a list is returned in any cases
if type(whois_info.expiration_date) == list:
df.loc[i, "Expiration date"] = whois_info.expiration_date[-1].strftime("%Y-%m-%d %H:%M:%S")
else:
df.loc[i, "Expiration date"] = str(whois_info.expiration_date)
# If domain is not registered, leave everything else blank
else:
df.loc[i, "Registered"] = "no"
df.loc[i, "Registrar"] = None
df.loc[i, "WHOIS server"] = None
df.loc[i, "Creation date"] = None
df.loc[i, "Expiration date"] = None
df.head()
Error trying to connect to socket: closing socket
Proxy address | Proxy type | Registered | Registrar | WHOIS server | Creation date | Expiration date | |
---|---|---|---|---|---|---|---|
0 | 122.99.125.85:80 | http | no | None | None | None | None |
1 | 220.116.226.105:80 | http | no | None | None | None | None |
2 | 103.219.194.13:80 | http | no | None | None | None | None |
3 | 110.170.126.13:3128 | https | yes | THNIC | None | None | None |
4 | 91.150.189.122:30389 | http | yes | home.pl S.A. | None | 2003-06-14 08:45:04 | 2025-06-13 14:00:00 |
df.tail()
Proxy address | Proxy type | Registered | Registrar | WHOIS server | Creation date | Expiration date | |
---|---|---|---|---|---|---|---|
25 | 47.241.245.186:80 | http | no | None | None | None | None |
26 | 185.91.116.140:80 | http | yes | None | None | 2020-06-16 00:00:00 | 2022-11-23 00:00:00 |
27 | 185.72.27.98:8080 | http | no | None | None | None | None |
28 | 45.77.233.110:80 | http | yes | ENOM, INC. | WHOIS.ENOM.COM | 2022-03-10 16:58:11 | 2027-03-10 16:58:11 |
29 | 80.179.140.189:80 | http | yes | Domain The Net Technologies Ltd | None | None | 2023-04-11 00:00:00 |
The safest proxies to use would be those that were registered a long time ago (at least 10 years is a good amount). We can filter out all the proxies that are less than 10 years old.
df2 = df[df["Creation date"] < "2012"]
df2 = df2.reset_index(drop=True)
df2.head()
Proxy address | Proxy type | Registered | Registrar | WHOIS server | Creation date | Expiration date | |
---|---|---|---|---|---|---|---|
0 | 91.150.189.122:30389 | http | yes | home.pl S.A. | None | 2003-06-14 08:45:04 | 2025-06-13 14:00:00 |
1 | 50.233.42.98:51696 | http | yes | CSC CORPORATE DOMAINS, INC. | whois.corporatedomains.com | 2000-07-27 17:53:12 | 2025-07-27 17:53:12 |
2 | 5.167.141.239:3128 | http | yes | RU-CENTER-RU | None | 2001-03-13 21:00:00 | 2023-03-14 21:00:00 |
We can see that only three of the proxies are more than 10 years old. Finally, let’s output our filtered list to a CSV file.
df2.to_csv("spys-proxy-list-30-filtered.csv")