Agent Context Kit
The DataHub Agent Context Kit is a set of guides, SDKs, and an MCP server that help you build AI agents with access to the capabilities and context in your DataHub instance — business definitions, context documents, data ownership, lineage, quality signals, sample queries, and more.
What Can You Build?
Data Analytics Agent (Text-to-SQL)
An agent that answers business questions by finding the right data, then querying it.
- Find & understand trustworthy data — Search DataHub for relevant datasets, read descriptions, check glossary terms, review sample queries, and look at usage stats to pick the right table.
- Execute SQL — Generate and run SQL against your data warehouse (Snowflake, BigQuery, Databricks, etc.).
"What was our revenue by region last quarter?" → finds the revenue table in DataHub, confirms it's the certified source, generates the SQL, runs it.
Data Quality Agent
An agent that provisions data quality checks and reports on the health of your data estate.
- Find important tables — Identify high-usage or business-critical datasets using usage stats and ownership from DataHub.
- Add assertions — Provision data quality assertions in DataHub (and in external tools).
- Generate health reports — Daily or on-demand, broken down by domain or owning team, using assertion results and incident data.
"Set up freshness checks on all tables owned by the Finance team, and send me a weekly health summary."
Data Steward / Governance Agent
An agent that applies descriptions and compliance-related glossary terms to tables and columns, then reports on coverage.
- Find target tables — Filter by platform, domain, or ownership to find datasets that need attention.
- Apply context — Cross-reference schema information against critical glossary terms, then apply glossary terms, descriptions, and tags.
- Report on compliance — Generate reports on where sensitive data lives, PII usage across the organization, or gaps in documentation.
"Tag all columns containing email addresses with the PII glossary term across our Snowflake datasets, then show me a coverage report."
Where Do Your Agents Run?
AI Coding Assistants
| Platform | Guide |
|---|---|
| Cursor | Guide |
| Claude (Code & Desktop) | Guide |
| Google Gemini CLI | Guide |
| Snowflake Cortex Code | Guide |
Agent Frameworks (SDK)
| Platform | Guide |
|---|---|
| LangChain | Guide |
| Google ADK | Guide |
| Custom / Direct MCP | See Getting Started below |
Managed Agent Platforms
| Platform | Guide |
|---|---|
| Databricks Genie Code | Guide |
| Databricks Agent Bricks | Guide |
| Snowflake Cortex Agents | Guide |
| Google Vertex AI | Guide |
| Microsoft Copilot Studio | Guide |
Getting Started
The fastest way to connect any agent to DataHub is through the MCP server — just point your agent at the endpoint:
- DataHub Cloud:
https://<tenant>.acryl.io/integrations/ai/mcp - Self-hosted:
http://<gms-host>:8080/mcp
See the MCP Server Guide for authentication and setup.
For Python SDK usage (LangChain, Google ADK, etc.):
pip install datahub-agent-context
Requirements: Python 3.10+, a DataHub instance, and a personal access token.
Available Tools
Agents discover these automatically via MCP. See the MCP Server Guide for details.
| Tool | What it does |
|---|---|
search | Find datasets, dashboards, and other entities by keyword |
get_entities | Get full details for a specific entity (schema, ownership, docs, tags) |
get_lineage | Trace upstream or downstream lineage |
list_schema_fields | List columns for a dataset |
get_dataset_queries | Get SQL queries associated with a dataset |
search_documents / grep_documents | Search knowledge base articles and docs |
add_tags / remove_tags | Manage tags |
update_description | Set or update descriptions |
add_glossary_terms / remove_glossary_terms | Link to glossary terms |
set_domains / add_owners / save_document | Manage domains, ownership, and documents |
Need help? DataHub Slack · GitHub Issues · DataHub Cloud: support@acryl.io