YellowPages Quickstart

Automated data catalog and metadata discovery.

Status: YellowPages is currently in active development. The architecture and core features described below are the target design. Early access will be available soon.

What It Does

YellowPages connects to your existing data infrastructure and builds a searchable catalog automatically. No manual tagging, no spreadsheets, no Confluence pages that go stale in a week.

How It Works

  1. Connect your sources — PostgreSQL, MySQL, S3, BigQuery, Snowflake, and more
  2. Automatic discovery — schemas, tables, columns, and relationships are cataloged on a schedule
  3. Lineage tracking — see how data flows from source to destination
  4. Quality scoring — freshness, completeness, and uniqueness checks built in
  5. Search everything — full-text search across your entire data estate

Architecture

YellowPages runs as a single container (or binary) that connects to your data sources and serves a web UI. Configuration is done via a YAML file.

# Example configuration (target format)
sources:
  - name: production-db
    type: postgresql
    connection:
      host: localhost
      port: 5432
      database: production
      user: readonly_user
      password: $${YP_DB_PASSWORD$}

catalog:
  refresh_interval: 1h
  quality_checks: true

server:
  port: 8080

Planned Features

  • Automated schema discovery and change detection
  • Data lineage visualization
  • Built-in quality checks (freshness, completeness, uniqueness)
  • Full-text search across all metadata
  • Collaboration — descriptions, tags, ownership
  • REST and GraphQL APIs for integrations

Get Early Access

We're looking for teams who want to shape the product. If you're dealing with data catalog problems today, reach out and we'll get you on the early access list.

Want us to handle it? Our Managed Service team can deploy and operate YellowPages for you once it ships.