Data & Formats

Website Database CSV Format Reference

A complete column-by-column reference for the Website Database CSV file - what each field contains, how contact data is extracted from live websites, and how to use the data for B2B outreach.

Website Database CSV Format Reference
April 15, 2026 Data & Formats WhoisExtractor Team

Overview

The Website Database is built from a daily crawl of live business websites. It extracts email addresses and phone numbers that businesses have actively published on their own websites - the same contact information a prospective customer or partner would find by visiting those pages.

This article describes every column in the Website Database CSV so you can confidently import, filter, and act on the data.

For access and pricing, see the Website Database page.

How the Data Is Collected

The crawler visits each domain's website and extracts contact information from publicly accessible pages. Only information that is intentionally published by the business is included - the crawler does not access password-protected pages, bypass authentication, or store data from non-public sources.

Contact information extracted includes:

  • Email addresses found in page content, mailto: links, and structured contact sections
  • Phone numbers found in page content, tel: links, and contact sections

Multiple email addresses and phone numbers may be collected per domain if the website publishes more than one.

CSV Column Reference

domain

The registered domain name whose website was crawled.

  • Type: String
  • Example: acmeconsulting.com
  • Always present: Yes

website_url

The specific URL that was crawled to collect the data (may differ from the bare domain if the website redirects).

  • Type: String (URL)
  • Example: https://www.acmeconsulting.com
  • Always present: Yes

emails

A pipe-delimited (|) list of email addresses found on the website.

  • Type: String (pipe-separated)
  • Example: [email protected]|[email protected]
  • Always present: No. Empty if no email addresses were found on publicly accessible pages.
  • Notes: Split on | to get an array. Addresses are stored in lowercase. Deduplication is applied per domain.

phones

A pipe-delimited (|) list of phone numbers found on the website.

  • Type: String (pipe-separated)
  • Example: +1-800-555-0199|+1-212-555-0188
  • Always present: No. Empty if no phone numbers were found.
  • Notes: Numbers are stored as-found (including formatting characters like +, -, spaces, and parentheses). Normalise before importing into a CRM if your CRM requires E.164 format.

crawled_at

The date the website was crawled and this record was collected.

  • Type: Date string YYYY-MM-DD
  • Example: 2026-04-15
  • Always present: Yes

country

The country associated with the domain, where detectable.

  • Type: String (ISO 3166-1 alpha-2 country code)
  • Example: US, IN, AU, GB
  • Always present: No. Empty when the country cannot be determined.

Working With the Data

Extracting All Emails Into One Row Per Email

The pipe-separated format stores multiple emails in one cell. To work with one email per row in Python:

import pandas as pd

df = pd.read_csv("website-data-2026-04-15.csv")

# Explode emails so each email gets its own row
df["emails"] = df["emails"].str.split("|")
df_expanded = df.explode("emails").dropna(subset=["emails"])
df_expanded = df_expanded[df_expanded["emails"].str.strip() != ""]

print(df_expanded[["domain", "emails"]].head(10))

Filtering Out Generic Email Addresses

Outreach performs better when targeted at role-specific or personal addresses. Filter out generic addresses before importing into a CRM:

generic_prefixes = ("info@", "contact@", "support@", "admin@", "noreply@", "no-reply@", "hello@", "sales@")

df_expanded = df_expanded[
    ~df_expanded["emails"].str.lower().str.startswith(generic_prefixes)
]

Filtering by Country

# Australia only
au_data = df[df["country"] == "AU"]

Combining With Domain Registration Data

The Website Database can be combined with the Domains Database or WHOIS Database using the domain field as the join key. This lets you:

  • Filter website contacts by the date the domain was registered (to prioritise new businesses)
  • Add registrant name or organisation from WHOIS to supplement website contact data
  • Cross-reference nameservers to understand hosting infrastructure

Data Freshness

The daily details feed (pid=11) is updated once per day at 6:00 AM IST. Each day's file includes newly crawled websites and updated records for sites where contact information has changed since the last crawl.

For a snapshot of all domains currently in the database, see the all domains list. For historical data, see the archived website database.

website database csv data format reference email extraction b2b leads

Was this article helpful?

Thanks for the feedback! Glad this helped. Thanks for letting us know. Open a support ticket if you need more help.