Overview
The Website Database is built from a daily crawl of live business websites. It extracts email addresses and phone numbers that businesses have actively published on their own websites - the same contact information a prospective customer or partner would find by visiting those pages.
This article describes every column in the Website Database CSV so you can confidently import, filter, and act on the data.
For access and pricing, see the Website Database page.
How the Data Is Collected
The crawler visits each domain's website and extracts contact information from publicly accessible pages. Only information that is intentionally published by the business is included - the crawler does not access password-protected pages, bypass authentication, or store data from non-public sources.
Contact information extracted includes:
- Email addresses found in page content,
mailto:links, and structured contact sections - Phone numbers found in page content,
tel:links, and contact sections
Multiple email addresses and phone numbers may be collected per domain if the website publishes more than one.
CSV Column Reference
domain
The registered domain name whose website was crawled.
- Type: String
- Example:
acmeconsulting.com - Always present: Yes
website_url
The specific URL that was crawled to collect the data (may differ from the bare domain if the website redirects).
- Type: String (URL)
- Example:
https://www.acmeconsulting.com - Always present: Yes
emails
A pipe-delimited (|) list of email addresses found on the website.
- Type: String (pipe-separated)
- Example:
[email protected]|[email protected] - Always present: No. Empty if no email addresses were found on publicly accessible pages.
- Notes: Split on
|to get an array. Addresses are stored in lowercase. Deduplication is applied per domain.
phones
A pipe-delimited (|) list of phone numbers found on the website.
- Type: String (pipe-separated)
- Example:
+1-800-555-0199|+1-212-555-0188 - Always present: No. Empty if no phone numbers were found.
- Notes: Numbers are stored as-found (including formatting characters like
+,-, spaces, and parentheses). Normalise before importing into a CRM if your CRM requires E.164 format.
crawled_at
The date the website was crawled and this record was collected.
- Type: Date string
YYYY-MM-DD - Example:
2026-04-15 - Always present: Yes
country
The country associated with the domain, where detectable.
- Type: String (ISO 3166-1 alpha-2 country code)
- Example:
US,IN,AU,GB - Always present: No. Empty when the country cannot be determined.
Working With the Data
Extracting All Emails Into One Row Per Email
The pipe-separated format stores multiple emails in one cell. To work with one email per row in Python:
import pandas as pd
df = pd.read_csv("website-data-2026-04-15.csv")
# Explode emails so each email gets its own row
df["emails"] = df["emails"].str.split("|")
df_expanded = df.explode("emails").dropna(subset=["emails"])
df_expanded = df_expanded[df_expanded["emails"].str.strip() != ""]
print(df_expanded[["domain", "emails"]].head(10))
Filtering Out Generic Email Addresses
Outreach performs better when targeted at role-specific or personal addresses. Filter out generic addresses before importing into a CRM:
generic_prefixes = ("info@", "contact@", "support@", "admin@", "noreply@", "no-reply@", "hello@", "sales@")
df_expanded = df_expanded[
~df_expanded["emails"].str.lower().str.startswith(generic_prefixes)
]
Filtering by Country
# Australia only
au_data = df[df["country"] == "AU"]
Combining With Domain Registration Data
The Website Database can be combined with the Domains Database or WHOIS Database using the domain field as the join key. This lets you:
- Filter website contacts by the date the domain was registered (to prioritise new businesses)
- Add registrant name or organisation from WHOIS to supplement website contact data
- Cross-reference nameservers to understand hosting infrastructure
Data Freshness
The daily details feed (pid=11) is updated once per day at 6:00 AM IST. Each day's file includes newly crawled websites and updated records for sites where contact information has changed since the last crawl.
For a snapshot of all domains currently in the database, see the all domains list. For historical data, see the archived website database.