Data Science Portfolio

Get Domain Information with WHOIS

In this tutorial, I will show you how to get information for any domain name or IP address using the whois library in Python. WHOIS is a popular internet record listing that contains all contact information for the person, group, or company that registered a particular domain. The whois library simply queries the WHOIS database directly to retrieve domain information. You can install the library with the following command:

pip install python-whois

In a previous tutorial, I demonstrated how to scrape a list of free proxies from the spys.one website using Selenium. Let’s use one of the proxies we scraped in this tutorial, which were saved in a CSV file. First, we’ll write a function to check if a domain name is registered or not.

import whois
import pandas as pd
import numpy as np

# Return a boolean indicating whether a domain is registered
def is_registered(domain_name):
    try:
        w = whois.whois(domain_name)
    except Exception:
        return False
    else:
        return bool(w.domain_name)
# Load proxies into DataFrame
df = pd.read_csv("spys-proxy-list-30.csv")

# Add new columns
df["Registered"] = ""
df["Registrar"] = ""
df["WHOIS server"] = ""
df["Creation date"] = ""
df["Expiration date"] = ""

# Iterate over rows
for i in range(len(df)):
    domain_name = df.loc[i, "Proxy address"].split(":")[0]
    
    # Check if proxy is registered and assign values to DataFrame accordingly
    if is_registered(domain_name):
        whois_info = whois.whois(domain_name)
        
        # Assign values to DataFrame
        df.loc[i, "Registered"] = "yes"
        df.loc[i, "Registrar"] = str(whois_info.registrar)
        df.loc[i, "WHOIS server"] = str(whois_info.whois_server)
        
        # For creation_date, check if a list is returned in any cases
        if type(whois_info.creation_date) == list:
            df.loc[i, "Creation date"] = whois_info.creation_date[-1].strftime("%Y-%m-%d %H:%M:%S")
        else:
            df.loc[i, "Creation date"] = str(whois_info.creation_date)
            
        # For expiration_date, check if a list is returned in any cases
        if type(whois_info.expiration_date) == list:
            df.loc[i, "Expiration date"] = whois_info.expiration_date[-1].strftime("%Y-%m-%d %H:%M:%S")
        else:
            df.loc[i, "Expiration date"] = str(whois_info.expiration_date)
    # If domain is not registered, leave everything else blank
    else:
        df.loc[i, "Registered"] = "no"
        df.loc[i, "Registrar"] = None
        df.loc[i, "WHOIS server"] = None
        df.loc[i, "Creation date"] = None
        df.loc[i, "Expiration date"] = None

df.head()
Error trying to connect to socket: closing socket
Proxy address Proxy type Registered Registrar WHOIS server Creation date Expiration date
0 122.99.125.85:80 http no None None None None
1 220.116.226.105:80 http no None None None None
2 103.219.194.13:80 http no None None None None
3 110.170.126.13:3128 https yes THNIC None None None
4 91.150.189.122:30389 http yes home.pl S.A. None 2003-06-14 08:45:04 2025-06-13 14:00:00
df.tail()
Proxy address Proxy type Registered Registrar WHOIS server Creation date Expiration date
25 47.241.245.186:80 http no None None None None
26 185.91.116.140:80 http yes None None 2020-06-16 00:00:00 2022-11-23 00:00:00
27 185.72.27.98:8080 http no None None None None
28 45.77.233.110:80 http yes ENOM, INC. WHOIS.ENOM.COM 2022-03-10 16:58:11 2027-03-10 16:58:11
29 80.179.140.189:80 http yes Domain The Net Technologies Ltd None None 2023-04-11 00:00:00

The safest proxies to use would be those that were registered a long time ago (at least 10 years is a good amount). We can filter out all the proxies that are less than 10 years old.

df2 = df[df["Creation date"] < "2012"]
df2 = df2.reset_index(drop=True)
df2.head()
Proxy address Proxy type Registered Registrar WHOIS server Creation date Expiration date
0 91.150.189.122:30389 http yes home.pl S.A. None 2003-06-14 08:45:04 2025-06-13 14:00:00
1 50.233.42.98:51696 http yes CSC CORPORATE DOMAINS, INC. whois.corporatedomains.com 2000-07-27 17:53:12 2025-07-27 17:53:12
2 5.167.141.239:3128 http yes RU-CENTER-RU None 2001-03-13 21:00:00 2023-03-14 21:00:00

We can see that only three of the proxies are more than 10 years old. Finally, let’s output our filtered list to a CSV file.

df2.to_csv("spys-proxy-list-30-filtered.csv")