YellowPages Quickstart
Automated data catalog and metadata discovery.
Status: YellowPages is currently in active development. The architecture and core features described below are the target design. Early access will be available soon.
What It Does
YellowPages connects to your existing data infrastructure and builds a searchable catalog automatically. No manual tagging, no spreadsheets, no Confluence pages that go stale in a week.
How It Works
- Connect your sources — PostgreSQL, MySQL, S3, BigQuery, Snowflake, and more
- Automatic discovery — schemas, tables, columns, and relationships are cataloged on a schedule
- Lineage tracking — see how data flows from source to destination
- Quality scoring — freshness, completeness, and uniqueness checks built in
- Search everything — full-text search across your entire data estate
Architecture
YellowPages runs as a single container (or binary) that connects to your data sources and serves a web UI. Configuration is done via a YAML file.
# Example configuration (target format)
sources:
- name: production-db
type: postgresql
connection:
host: localhost
port: 5432
database: production
user: readonly_user
password: $${YP_DB_PASSWORD$}
catalog:
refresh_interval: 1h
quality_checks: true
server:
port: 8080 Planned Features
- Automated schema discovery and change detection
- Data lineage visualization
- Built-in quality checks (freshness, completeness, uniqueness)
- Full-text search across all metadata
- Collaboration — descriptions, tags, ownership
- REST and GraphQL APIs for integrations
Get Early Access
We're looking for teams who want to shape the product. If you're dealing with data catalog problems today, reach out and we'll get you on the early access list.
Want us to handle it? Our Managed Service team can deploy and operate YellowPages for you once it ships.