Skip to main content
Version: Next

DataHub GraphQL CLI

The datahub graphql command provides a powerful interface to interact with DataHub's GraphQL API directly from the command line. This enables you to query metadata, perform mutations, and explore the GraphQL schema without writing custom applications.

Quick Start

# Get current user info
datahub graphql --operation me

# Search for datasets
datahub graphql --operation searchAcrossEntities --variables '{"input": {"query": "users", "types": ["DATASET"]}}'

# Execute raw GraphQL
datahub graphql --query "query { me { username } }"

Core Features

1. Schema Discovery

Discover available operations and understand their structure:

# List all available operations
datahub graphql --list-operations

# List only queries or mutations
datahub graphql --list-queries
datahub graphql --list-mutations

2. Smart Description

The --describe command intelligently searches for both operations and types:

# Describe an operation
datahub graphql --describe searchAcrossEntities

# Describe a GraphQL type
datahub graphql --describe SearchInput

# Describe enum types to see allowed values
datahub graphql --describe FilterOperator

When both operation and type exist with same name:

datahub graphql --describe someConflictingName
# Output:
# === OPERATION ===
# Operation: someConflictingName
# Type: Query
# ...
#
# === TYPE ===
# Type: someConflictingName
# Kind: INPUT_OBJECT
# ...

3. Recursive Type Exploration

Use --recurse with --describe to explore all nested types:

# Explore operation with all its input types
datahub graphql --describe searchAcrossEntities --recurse

# Explore type with all nested dependencies
datahub graphql --describe SearchInput --recurse

Example recursive output:

Operation: searchAcrossEntities
Type: Query
Description: Search across all entity types
Arguments:
- input: SearchInput!

Input Type Details:

SearchInput:
query: String
types: [EntityType!]
filters: SearchFilter

SearchFilter:
criteria: [FacetFilterInput!]

FacetFilterInput:
field: String! - Name of field to filter by
values: [String!]! - Values, one of which the intended field should match
condition: FilterOperator - Condition for the values

FilterOperator:
EQUAL - Represents the relation: field = value
GREATER_THAN - Represents the relation: field > value
LESS_THAN - Represents the relation: field < value

4. Operation Execution

Execute operations by name without writing full GraphQL:

# Execute operation by name
datahub graphql --operation me

# Execute with variables
datahub graphql --operation searchAcrossEntities --variables '{"input": {"query": "datasets", "types": ["DATASET"]}}'

# Execute with variables from file
datahub graphql --operation createGroup --variables ./group-data.json

5. Raw GraphQL Execution

Execute any custom GraphQL query or mutation:

# Simple query
datahub graphql --query "query { me { username } }"

# Query with variables
datahub graphql --query "query GetUser($urn: String!) { corpUser(urn: $urn) { info { email } } }" --variables '{"urn": "urn:li:corpuser:john"}'

# Query from file
datahub graphql --query ./complex-query.graphql --variables ./variables.json

# Mutation
datahub graphql --query "mutation { addTag(input: {resourceUrn: \"urn:li:dataset:...\", tagUrn: \"urn:li:tag:Important\"}) }"

6. File Support

Both queries and variables can be loaded from files:

# Load query from file
datahub graphql --query ./queries/search-datasets.graphql

# Load variables from file
datahub graphql --operation searchAcrossEntities --variables ./variables/search-params.json

# Both from files
datahub graphql --query ./query.graphql --variables ./vars.json

7. LLM-Friendly JSON Output

Use --format json to get structured JSON output perfect for LLM consumption:

# Get operations as JSON for LLM processing
datahub graphql --list-operations --format json

# Describe operation with complete type information
datahub graphql --describe searchAcrossEntities --recurse --format json

# Get type details in structured format
datahub graphql --describe SearchInput --format json

Example JSON output for --list-operations --format json:

{
"schema": {
"queries": [
{
"name": "me",
"type": "Query",
"description": "Get current user information",
"arguments": []
},
{
"name": "searchAcrossEntities",
"type": "Query",
"description": "Search across all entity types",
"arguments": [
{
"name": "input",
"type": {
"kind": "NON_NULL",
"ofType": {
"name": "SearchInput",
"kind": "INPUT_OBJECT"
}
},
"required": true,
"description": "Search input parameters"
}
]
}
],
"mutations": [...]
}
}

Example JSON output for --describe searchAcrossEntities --recurse --format json:

{
"operation": {
"name": "searchAcrossEntities",
"type": "Query",
"description": "Search across all entity types",
"arguments": [...]
},
"relatedTypes": {
"SearchInput": {
"name": "SearchInput",
"kind": "INPUT_OBJECT",
"fields": [
{
"name": "query",
"type": {"name": "String", "kind": "SCALAR"},
"description": "Search query string"
},
{
"name": "filters",
"type": {"name": "SearchFilter", "kind": "INPUT_OBJECT"},
"description": "Optional filters"
}
]
},
"SearchFilter": {...},
"FilterOperator": {
"name": "FilterOperator",
"kind": "ENUM",
"values": [
{
"name": "EQUAL",
"description": "Represents the relation: field = value",
"deprecated": false
}
]
}
},
"meta": {
"query": "searchAcrossEntities",
"recursive": true
}
}

8. Custom Schema Path

When introspection is disabled or for local development:

# Use local GraphQL schema files
datahub graphql --list-operations --schema-path ./local-schemas/

# Describe with custom schema
datahub graphql --describe searchAcrossEntities --schema-path ./graphql-schemas/

# Get JSON format with custom schema
datahub graphql --list-operations --schema-path ./schemas/ --format json

Command Reference

Global Options

OptionTypeDescription
--querystringGraphQL query/mutation string or path to .graphql file
--variablesstringVariables as JSON string or path to .json file
--operationstringExecute named operation from DataHub's schema
--describestringDescribe operation or type (searches both)
--recurseflagRecursively explore nested types with --describe
--list-operationsflagList all available operations
--list-queriesflagList available query operations
--list-mutationsflagList available mutation operations
--schema-pathstringPath to GraphQL schema files directory
--no-prettyflagDisable pretty-printing of JSON output (default: pretty-print)
--formatchoiceOutput format: human (default) or json for LLM consumption

Usage Patterns

# Discovery
datahub graphql --list-operations
datahub graphql --describe <name> [--recurse]

# Execution
datahub graphql --operation <name> [--variables <json>]
datahub graphql --query <graphql> [--variables <json>]

Advanced Examples

Complex Search with Filters

datahub graphql --operation searchAcrossEntities --variables '{
"input": {
"query": "customer",
"types": ["DATASET", "DASHBOARD"],
"filters": [{
"field": "platform",
"values": ["mysql", "postgres"]
}],
"start": 0,
"count": 20
}
}'

Adding Tags to Multiple Entities

# Add Important tag to a dataset
datahub graphql --query 'mutation AddTag($input: TagAssociationInput!) {
addTag(input: $input)
}' --variables '{
"input": {
"resourceUrn": "urn:li:dataset:(urn:li:dataPlatform:mysql,db.users,PROD)",
"tagUrn": "urn:li:tag:Important"
}
}'

Batch User Queries

# Get multiple users using raw GraphQL
datahub graphql --query 'query GetUsers($urns: [String!]!) {
users: batchGet(urns: $urns) {
... on CorpUser {
urn
username
properties {
email
displayName
}
}
}
}' --variables '{"urns": ["urn:li:corpuser:alice", "urn:li:corpuser:bob"]}'

Schema Introspection

DataHub's GraphQL CLI provides two modes for schema discovery:

Schema Discovery Modes

  1. Live Introspection (default): Queries the live GraphQL endpoint when no --schema-path is provided
  2. Local Schema Files: Uses .graphql files from the specified directory when --schema-path is provided

Note: These modes are mutually exclusive with no fallback between them. If introspection fails, the command will fail with an error. If local schema files are invalid, the command will fail with an error.

Schema File Structure

When using --schema-path, the directory should contain .graphql files with:

# queries.graphql
extend type Query {
me: AuthenticatedUser
searchAcrossEntities(input: SearchInput!): SearchResults
}

# mutations.graphql
extend type Mutation {
addTag(input: TagAssociationInput!): String
deleteEntity(urn: String!): String
}

Error Handling

The CLI provides clear error messages for common issues:

# Operation not found
datahub graphql --describe nonExistentOp
# Error: 'nonExistentOp' not found as an operation or type. Use --list-operations to see available operations or try a specific type name.

# Missing required arguments
datahub graphql --operation searchAcrossEntities
# Error: Operation 'searchAcrossEntities' requires arguments: input. Provide them using --variables '{"input": "value", ...}'

# Invalid JSON variables
datahub graphql --operation me --variables '{invalid json}'
# Error: Invalid JSON in variables: Expecting property name enclosed in double quotes

Output Formats

Pretty Printing (Default)

{
"me": {
"corpUser": {
"urn": "urn:li:corpuser:datahub",
"username": "datahub"
}
}
}

Compact Output

datahub graphql --operation me --no-pretty
{"me":{"corpUser":{"urn":"urn:li:corpuser:datahub","username":"datahub"}}}

Integration Examples

Shell Scripts

#!/bin/bash
# Get all datasets for a platform
PLATFORM="mysql"
RESULTS=$(datahub graphql --operation searchAcrossEntities --variables "{
\"input\": {
\"query\": \"*\",
\"types\": [\"DATASET\"],
\"filters\": [{\"field\": \"platform\", \"values\": [\"$PLATFORM\"]}]
}
}" --no-pretty)

echo "Found $(echo "$RESULTS" | jq '.searchAcrossEntities.total') datasets"

CI/CD Pipelines

# GitHub Actions example
- name: Tag Important Datasets
run: |
datahub graphql --operation addTag --variables '{
"input": {
"resourceUrn": "${{ env.DATASET_URN }}",
"tagUrn": "urn:li:tag:Production"
}
}'

LLM Integration

The --format json option makes the CLI perfect for LLM integration:

Benefits for AI Assistants

  1. Schema Understanding: LLMs can parse the complete GraphQL schema structure
  2. Query Generation: AI can generate accurate GraphQL queries based on available operations
  3. Type Validation: LLMs understand required vs optional arguments and their types
  4. Documentation: Rich descriptions and examples help AI provide better user assistance

Use Cases

# AI assistant gets complete schema knowledge
datahub graphql --list-operations --format json | ai-assistant process-schema

# Generate queries for user requests
datahub graphql --describe searchAcrossEntities --recurse --format json | ai-helper generate-query --user-intent "find mysql tables"

# Validate user input against schema
datahub graphql --describe createGroup --format json | validate-user-input

JSON Schema Benefits

  • Structured data: No parsing of human-readable text required
  • Complete type information: Includes GraphQL type wrappers (NON_NULL, LIST)
  • Rich metadata: Descriptions, deprecation info, argument requirements
  • Consistent format: Predictable structure across all operations and types
  • Recursive exploration: Complete dependency graphs for complex types

Tips and Best Practices

  1. Start with Discovery: Use --list-operations and --describe to understand available operations
  2. Use --recurse: When learning about complex operations, --describe --recurse shows the complete type structure
  3. LLM Integration: Use --format json when building AI assistants or automation tools
  4. File-based Variables: For complex variables, use JSON files instead of inline JSON
  5. Error Handling: The CLI provides detailed error messages - read them carefully for debugging
  6. Schema Evolution: Operations and types can change between DataHub versions - use discovery commands to stay current

Troubleshooting

Common Issues

"Introspection not available": Use --schema-path to point to local GraphQL schema files

"Operation not found": Check spelling and use --list-operations to see available operations

"Type not found": Verify type name casing (GraphQL types are case-sensitive)

Environment issues: Ensure DataHub server is running and accessible at the configured endpoint