Why Automate
The WhoisExtractor API refreshes data once per day at 6:00 AM IST (UTC+5:30). If your research or sales process depends on fresh domain, WHOIS, or website data, manually downloading files each morning is error-prone and easy to forget. A scheduled automated job handles it reliably while your team focuses on using the data.
This article shows how to set up automated downloads on Linux/macOS (cron), Windows (Task Scheduler), and in Python and PHP scripts.
Before setting up automation, read Getting Started with the API to confirm your API key and product IDs.
When to Schedule the Job
Data is available after 6:00 AM IST each day. To avoid running the job before data is ready, schedule it for 7:00 AM IST or later. In UTC, that is 01:30 AM UTC.
If you are in a different timezone:
| Your timezone | Schedule at |
|---|---|
| UTC | 01:30 |
| US Eastern (EST) | 8:30 PM previous day |
| US Pacific (PST) | 5:30 PM previous day |
| UK (GMT) | 01:30 |
| Australia Eastern (AEST) | 11:30 AM |
Linux / macOS - Cron
Open your crontab:
crontab -e
Add a line to run a download script every day at 02:00 UTC (safely after data is ready):
0 2 * * * /usr/bin/bash /home/user/scripts/download-whois.sh >> /home/user/logs/whois-download.log 2>&1
Example Shell Script
#!/usr/bin/env bash
set -euo pipefail
API_KEY="YOUR_API_KEY"
PID="3"
DATE=$(date -u +%Y-%m-%d)
OUTPUT_DIR="/data/whois"
OUTPUT_FILE="${OUTPUT_DIR}/whois-${DATE}.zip"
mkdir -p "$OUTPUT_DIR"
echo "[$(date -u)] Downloading WHOIS data for ${DATE}..."
curl --silent --show-error --fail \
"https://get.whoisextractor.com/?api=${API_KEY}&pid=${PID}&date=${DATE}" \
--output "$OUTPUT_FILE"
echo "[$(date -u)] Saved to ${OUTPUT_FILE} ($(du -sh "$OUTPUT_FILE" | cut -f1))"
# Optional: unzip immediately
unzip -o "$OUTPUT_FILE" -d "${OUTPUT_DIR}/csv/${DATE}/"
echo "[$(date -u)] Done."
Make it executable:
chmod +x /home/user/scripts/download-whois.sh
Windows - Task Scheduler
PowerShell Script
Save this as C:\Data\WhoisDownload\download-whois.ps1:
$ApiKey = "YOUR_API_KEY"
$Pid = "3"
$Date = (Get-Date).ToString("yyyy-MM-dd")
$OutDir = "C:\Data\WhoisDownload"
$OutFile = "$OutDir\whois-$Date.zip"
if (-not (Test-Path $OutDir)) { New-Item -ItemType Directory -Path $OutDir | Out-Null }
$Url = "https://get.whoisextractor.com/?api=$ApiKey&pid=$Pid&date=$Date"
Write-Host "Downloading WHOIS data for $Date..."
Invoke-WebRequest -Uri $Url -OutFile $OutFile
Write-Host "Saved to $OutFile"
# Unzip
Expand-Archive -Path $OutFile -DestinationPath "$OutDir\csv\$Date" -Force
Write-Host "Done."
Creating the Task Scheduler Job
- Open Task Scheduler (search in Start menu)
- Click Create Basic Task
- Set the trigger to Daily at 07:30 AM IST (or converted to your local time)
- Set the action to Start a program:
powershell.exe - Add arguments:
-NonInteractive -File "C:\Data\WhoisDownload\download-whois.ps1" - Finish and enable the task
Python Automation Script
This script downloads multiple product IDs in one run, useful if you subscribe to more than one database:
import os
import requests
from datetime import date, timedelta
API_KEY = "YOUR_API_KEY"
OUTPUT_DIR = "/data/whois"
PRODUCTS = {
"whois-global": 3,
"domains-daily": 10,
"website-daily": 11,
}
today = date.today().strftime("%Y-%m-%d")
os.makedirs(OUTPUT_DIR, exist_ok=True)
for name, pid in PRODUCTS.items():
url = f"https://get.whoisextractor.com/?api={API_KEY}&pid={pid}&date={today}"
out_path = os.path.join(OUTPUT_DIR, f"{name}-{today}.zip")
print(f"Downloading {name} for {today}...")
response = requests.get(url, stream=True, timeout=300)
if response.status_code == 200:
with open(out_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
size_mb = os.path.getsize(out_path) / (1024 * 1024)
print(f" Saved {out_path} ({size_mb:.1f} MB)")
else:
print(f" ERROR {response.status_code}: {response.text}")
Handling Errors and Retries
The API occasionally returns a 404 if called before that day's data generation is complete. Add a retry:
import time
MAX_RETRIES = 3
RETRY_DELAY_SECONDS = 600 # wait 10 minutes before retrying
for attempt in range(1, MAX_RETRIES + 1):
response = requests.get(url, stream=True, timeout=300)
if response.status_code == 200:
# save file...
break
elif response.status_code == 404 and attempt < MAX_RETRIES:
print(f" Data not ready yet - retrying in {RETRY_DELAY_SECONDS // 60} minutes...")
time.sleep(RETRY_DELAY_SECONDS)
else:
print(f" Failed after {attempt} attempt(s): {response.status_code}")
break
Keeping Only Recent Files
To avoid filling up disk space, clean up files older than 7 days:
# Linux/macOS
find /data/whois -name "*.zip" -mtime +7 -delete
# Windows PowerShell
Get-ChildItem "C:\Data\WhoisDownload\*.zip" |
Where-Object { $_.LastWriteTime -lt (Get-Date).AddDays(-7) } |
Remove-Item