YellowPages Quickstart

Automated data catalog and metadata discovery.

Status: YellowPages is currently in active development. The architecture and core features described below are the target design. Early access will be available soon.

What It Does

YellowPages connects to your existing data infrastructure and builds a searchable catalog automatically. No manual tagging, no spreadsheets, no Confluence pages that go stale in a week.

How It Works

Connect your sources — PostgreSQL, MySQL, S3, BigQuery, Snowflake, and more
Automatic discovery — schemas, tables, columns, and relationships are cataloged on a schedule
Lineage tracking — see how data flows from source to destination
Quality scoring — freshness, completeness, and uniqueness checks built in
Search everything — full-text search across your entire data estate

Architecture

YellowPages runs as a single container (or binary) that connects to your data sources and serves a web UI. Configuration is done via a YAML file.

# Example configuration (target format)
sources:
  - name: production-db
    type: postgresql
    connection:
      host: localhost
      port: 5432
      database: production
      user: readonly_user
      password: $${YP_DB_PASSWORD$}

catalog:
  refresh_interval: 1h
  quality_checks: true

server:
  port: 8080

Planned Features

Automated schema discovery and change detection
Data lineage visualization
Built-in quality checks (freshness, completeness, uniqueness)
Full-text search across all metadata
Collaboration — descriptions, tags, ownership
REST and GraphQL APIs for integrations

Get Early Access

We're looking for teams who want to shape the product. If you're dealing with data catalog problems today, reach out and we'll get you on the early access list.

Want us to handle it? Our Managed Service team can deploy and operate YellowPages for you once it ships.