Skip to main content

Agent Context Kit

The DataHub Agent Context Kit is a set of guides, SDKs, and an MCP server that help you build AI agents with access to the capabilities and context in your DataHub instance — business definitions, context documents, data ownership, lineage, quality signals, sample queries, and more.

What Can You Build?

Data Analytics Agent (Text-to-SQL)

An agent that answers business questions by finding the right data, then querying it.

  1. Find & understand trustworthy data — Search DataHub for relevant datasets, read descriptions, check glossary terms, review sample queries, and look at usage stats to pick the right table.
  2. Execute SQL — Generate and run SQL against your data warehouse (Snowflake, BigQuery, Databricks, etc.).

"What was our revenue by region last quarter?" → finds the revenue table in DataHub, confirms it's the certified source, generates the SQL, runs it.

Data Quality Agent

An agent that provisions data quality checks and reports on the health of your data estate.

  1. Find important tables — Identify high-usage or business-critical datasets using usage stats and ownership from DataHub.
  2. Add assertions — Provision data quality assertions in DataHub (and in external tools).
  3. Generate health reports — Daily or on-demand, broken down by domain or owning team, using assertion results and incident data.

"Set up freshness checks on all tables owned by the Finance team, and send me a weekly health summary."

Data Steward / Governance Agent

An agent that applies descriptions and compliance-related glossary terms to tables and columns, then reports on coverage.

  1. Find target tables — Filter by platform, domain, or ownership to find datasets that need attention.
  2. Apply context — Cross-reference schema information against critical glossary terms, then apply glossary terms, descriptions, and tags.
  3. Report on compliance — Generate reports on where sensitive data lives, PII usage across the organization, or gaps in documentation.

"Tag all columns containing email addresses with the PII glossary term across our Snowflake datasets, then show me a coverage report."

Where Do Your Agents Run?

AI Coding Assistants

PlatformGuide
CursorGuide
Claude (Code & Desktop)Guide
Google Gemini CLIGuide
Snowflake Cortex CodeGuide

Agent Frameworks (SDK)

PlatformGuide
LangChainGuide
Google ADKGuide
Custom / Direct MCPSee Getting Started below

Managed Agent Platforms

PlatformGuide
Databricks Genie CodeGuide
Databricks Agent BricksGuide
Snowflake Cortex AgentsGuide
Google Vertex AIGuide
Microsoft Copilot StudioGuide

Getting Started

The fastest way to connect any agent to DataHub is through the MCP server — just point your agent at the endpoint:

  • DataHub Cloud: https://<tenant>.acryl.io/integrations/ai/mcp
  • Self-hosted: http://<gms-host>:8080/mcp

See the MCP Server Guide for authentication and setup.

For Python SDK usage (LangChain, Google ADK, etc.):

pip install datahub-agent-context

Requirements: Python 3.10+, a DataHub instance, and a personal access token.

Available Tools

Agents discover these automatically via MCP. See the MCP Server Guide for details.

ToolWhat it does
searchFind datasets, dashboards, and other entities by keyword
get_entitiesGet full details for a specific entity (schema, ownership, docs, tags)
get_lineageTrace upstream or downstream lineage
list_schema_fieldsList columns for a dataset
get_dataset_queriesGet SQL queries associated with a dataset
search_documents / grep_documentsSearch knowledge base articles and docs
add_tags / remove_tagsManage tags
update_descriptionSet or update descriptions
add_glossary_terms / remove_glossary_termsLink to glossary terms
set_domains / add_owners / save_documentManage domains, ownership, and documents

Need help? DataHub Slack · GitHub Issues · DataHub Cloud: support@acryl.io