Data & Formats

Website Database CSV Format Reference

A complete column-by-column reference for the Website Database CSV file - what each field contains, how contact data is extracted from live websites, and how to use the data for B2B outreach.

Overview How the Data Is Collected CSV Column Reference domain website_url emails phones crawled_at country Working With the Data Extracting All Emails Into One Row Per Email Filtering Out Generic Email Addresses Filtering by Country Combining With Domain Registration Data Data Freshness Related Articles

April 15, 2026 Data & Formats WhoisExtractor Team

Overview

The Website Database is built from a daily crawl of live business websites. It extracts email addresses and phone numbers that businesses have actively published on their own websites - the same contact information a prospective customer or partner would find by visiting those pages.

This article describes every column in the Website Database CSV so you can confidently import, filter, and act on the data.

For access and pricing, see the Website Database page.

How the Data Is Collected

The crawler visits each domain's website and extracts contact information from publicly accessible pages. Only information that is intentionally published by the business is included - the crawler does not access password-protected pages, bypass authentication, or store data from non-public sources.

Contact information extracted includes:

Email addresses found in page content, mailto: links, and structured contact sections
Phone numbers found in page content, tel: links, and contact sections

Multiple email addresses and phone numbers may be collected per domain if the website publishes more than one.

CSV Column Reference

`domain`

The registered domain name whose website was crawled.

Type: String
Example: acmeconsulting.com
Always present: Yes

`website_url`

The specific URL that was crawled to collect the data (may differ from the bare domain if the website redirects).

Type: String (URL)
Example: https://www.acmeconsulting.com
Always present: Yes

`emails`

A pipe-delimited (|) list of email addresses found on the website.

Type: String (pipe-separated)
Example: [email protected]|[email protected]
Always present: No. Empty if no email addresses were found on publicly accessible pages.
Notes: Split on | to get an array. Addresses are stored in lowercase. Deduplication is applied per domain.

`phones`

A pipe-delimited (|) list of phone numbers found on the website.

Type: String (pipe-separated)
Example: +1-800-555-0199|+1-212-555-0188
Always present: No. Empty if no phone numbers were found.
Notes: Numbers are stored as-found (including formatting characters like +, -, spaces, and parentheses). Normalise before importing into a CRM if your CRM requires E.164 format.

`crawled_at`

The date the website was crawled and this record was collected.

Type: Date string YYYY-MM-DD
Example: 2026-04-15
Always present: Yes

`country`

The country associated with the domain, where detectable.

Type: String (ISO 3166-1 alpha-2 country code)
Example: US, IN, AU, GB
Always present: No. Empty when the country cannot be determined.

Working With the Data

Extracting All Emails Into One Row Per Email

The pipe-separated format stores multiple emails in one cell. To work with one email per row in Python:

import pandas as pd

df = pd.read_csv("website-data-2026-04-15.csv")

# Explode emails so each email gets its own row
df["emails"] = df["emails"].str.split("|")
df_expanded = df.explode("emails").dropna(subset=["emails"])
df_expanded = df_expanded[df_expanded["emails"].str.strip() != ""]

print(df_expanded[["domain", "emails"]].head(10))

Filtering Out Generic Email Addresses

Outreach performs better when targeted at role-specific or personal addresses. Filter out generic addresses before importing into a CRM:

generic_prefixes = ("info@", "contact@", "support@", "admin@", "noreply@", "no-reply@", "hello@", "sales@")

df_expanded = df_expanded[
    ~df_expanded["emails"].str.lower().str.startswith(generic_prefixes)
]

Filtering by Country

# Australia only
au_data = df[df["country"] == "AU"]

Combining With Domain Registration Data

The Website Database can be combined with the Domains Database or WHOIS Database using the domain field as the join key. This lets you:

Filter website contacts by the date the domain was registered (to prioritise new businesses)
Add registrant name or organisation from WHOIS to supplement website contact data
Cross-reference nameservers to understand hosting infrastructure

Data Freshness

The daily details feed (pid=11) is updated once per day at 6:00 AM IST. Each day's file includes newly crawled websites and updated records for sites where contact information has changed since the last crawl.

For a snapshot of all domains currently in the database, see the all domains list. For historical data, see the archived website database.

website database csv data format reference email extraction b2b leads

Getting Started with the WhoisExtractor API Understanding WHOIS Data Fields Automating Daily Data Delivery With the API Domains Database CSV Format Reference How to Download Your Database Files

Was this article helpful?

Thanks for the feedback! Glad this helped. Thanks for letting us know. Open a support ticket if you need more help.

Back to Knowledge Base

Whois Database

Domains Database