Matillion DPC

Overview

Matillion Data Productivity Cloud (DPC) is a cloud-native data integration platform for building, orchestrating, and monitoring data pipelines. Learn more in the official Matillion documentation.

The DataHub integration for Matillion DPC ingests pipelines, streaming pipelines, projects, and environments as DataHub entities. It captures table- and column-level lineage via the Matillion OpenLineage API, pipeline execution history as operational metadata, and child pipeline dependency relationships for end-to-end orchestration visibility.

Concept Mapping

Source Concept	DataHub Concept	Notes
Project	Container	Top-level grouping of pipelines within a Matillion account.
Environment	Container	Deployment environment within a project (e.g. Production, Staging).
Pipeline	DataFlow	An orchestration pipeline that transforms or moves data.
Pipeline Component / Step	DataJob	An individual step within a pipeline.
Streaming Pipeline	DataFlow	A CDC or streaming pipeline, emitted with `pipeline_type=streaming`.
Pipeline Execution	DataProcessInstance	A single run of a pipeline, including status and timing.
OpenLineage table reference	Dataset	Upstream or downstream dataset referenced via OpenLineage events.
Table/column lineage edge	Lineage edge	Extracted from OpenLineage events; column-level via SQL parsing.

Module `matillion-dpc`

Important Capabilities

Capability	Status	Notes
Column-level Lineage	✅	Enabled by default, can be disabled via configuration `parse_sql_for_lineage`.
Detect Deleted Entities	✅	Enabled via stateful ingestion.
Platform Instance	✅	Enabled by default.
Table-Level Lineage	✅	Enabled by default via OpenLineage data from pipeline executions.

Overview

The matillion-dpc module ingests metadata from Matillion Data Productivity Cloud (DPC) into DataHub. It extracts pipelines, streaming pipelines, projects, environments, execution history, and table and column-level lineage via the Matillion OpenLineage API.

Prerequisites

Obtain API Credentials

The connector uses OAuth2 client credentials and automatically handles token generation and refresh.

Log into Matillion Data Productivity Cloud as a Super Admin
Navigate to Profile & Account → API credentials
Click Set an API Credential
Provide a descriptive name (e.g., "DataHub Integration")
Assign an Account Role with read permissions to required APIs
Click Save and immediately copy the Client Secret (not shown again)
Note the Client ID (remains visible)

For detailed instructions, see Matillion API Authentication.

Required Permissions

The API credentials must have an Account Role with Read permissions to:

Projects (/v1/projects)
Environments (/v1/environments)
Pipelines (/v1/pipelines)
Schedules (/v1/schedules)
Lineage Events (/v1/lineage/events)
Pipeline Executions (/v1/pipeline-executions) - optional
Streaming Pipelines (/v1/streaming-pipelines) - optional

If using an account role other than Super Admin, grant project and environment-level roles as needed.

See Matillion RBAC documentation for details.

Lineage and Dependencies

The connector automatically extracts:

Table and Column-Level Lineage - From OpenLineage Events API (/v1/lineage/events) (docs)
Operational Metadata - Pipeline execution history from Pipeline Executions API (/v1/pipeline-executions) emitted as DataProcessInstance entities (docs)
Child Pipeline Dependencies - Automatically tracks when pipelines call other pipelines, creating step-to-step dependency relationships for comprehensive pipeline orchestration visibility

OpenLineage Namespace Mapping (Optional)

Optional: Map OpenLineage namespace URIs to DataHub platform instances for lineage connections. If not configured, the connector extracts platform type from URIs (e.g., postgresql://... → postgres) with default environment (PROD).

When to use: Configure this when you need lineage to connect to existing datasets with platform instances.

Example namespaces: postgresql://host:5432, snowflake://account.snowflakecomputing.com, bigquery://project

namespace_to_platform_instance:
  "postgresql://prod-db.us-east-1.rds.amazonaws.com:5432":
    platform_instance: postgres_prod
    env: PROD
    database: analytics
    schema: public

  "snowflake://prod-account.snowflakecomputing.com":
    platform_instance: snowflake_prod
    env: PROD
    convert_urns_to_lowercase: true

Platform instances must match those used when ingesting the source data platforms.

Install the Plugin

pip install 'acryl-datahub[matillion-dpc]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: matillion-dpc
  config:
    api_config:
      client_id: "${MATILLION_CLIENT_ID}"
      client_secret: "${MATILLION_CLIENT_SECRET}"
      region: "EU1" # EU1 or US1

    env: "PROD"

    # Optional: Map OpenLineage namespaces to DataHub platform instances
    # Required if existing datasets use platform instances
    namespace_to_platform_instance:
      "postgresql://prod-db.us-east-1.rds.amazonaws.com:5432":
        platform_instance: postgres_prod
        env: PROD
        database: analytics
        schema: public

      "snowflake://prod-account.snowflakecomputing.com":
        platform_instance: snowflake_prod
        env: PROD
        convert_urns_to_lowercase: true

      "bigquery://my-gcp-project":
        platform_instance: bigquery_prod
        env: PROD

    include_streaming_pipelines: true
    include_unpublished_pipelines: true
    max_executions_per_pipeline: 10
    extract_projects_to_containers: true

    # Optional: Filter projects, environments, pipelines using regex patterns
    # project_patterns:
    #   allow: ["^prod-.*", "^staging-.*"]
    #   deny: [".*-deprecated$", ".*-archived$"]

    # environment_patterns:
    #   allow: ["^production$", "^staging$"]

    # pipeline_patterns:
    #   deny: ["^test_.*", ".*_backup$"]

    # streaming_pipeline_patterns:
    #   allow: ["^cdc_.*"]

    stateful_ingestion:
      enabled: true

sink:
  type: datahub-rest
  config:
    server: "http://localhost:8080"

Config Details

Options
Schema

Note that a . is used to denote nested fields in the YAML recipe.

Field	Description
api_config ✅ MatillionAPIConfig
api_config.client_id ❓ string(password)	Matillion API Client ID for OAuth2 authentication.
api_config.client_secret ❓ string(password)	Matillion API Client Secret for OAuth2 authentication.
api_config.custom_base_url One of string, null	Custom API base URL for VPC endpoints or on-premise installations. Default: None
api_config.custom_oauth_token_url One of string, null	Custom OAuth2 token endpoint URL for VPC endpoints or on-premise installations. Default: None
api_config.region Enum	One of: "EU1", "US1"
api_config.request_timeout_sec integer	Request timeout in seconds Default: 30
bucket_duration Enum	One of: "DAY", "HOUR"
end_time string(date-time)	Latest date of lineage/usage to consider. Default: Current time in UTC
extract_projects_to_containers boolean	Whether to extract Matillion projects as DataHub containers. When enabled, pipelines are organized under project containers, providing hierarchical navigation. Default: True
include_streaming_pipelines boolean	Whether to ingest Matillion streaming pipelines (CDC pipelines). Streaming pipelines are emitted as separate DataFlows with pipeline_type='streaming'. Default: True
include_unpublished_pipelines boolean	Whether to discover and ingest unpublished pipelines from recent execution history. When enabled, the connector will discover pipelines that have been executed but not yet published. Disable this to only ingest published pipelines from the published-pipelines API. Default: True
lineage_platform_mapping One of string, null	Override platform name mappings from OpenLineage namespaces to DataHub platforms. Only needed for non-standard platforms. See documentation for list of pre-mapped platforms. Example: {"customdb": "postgres", "mywarehouse": "snowflake"} Default: None
max_executions_per_pipeline integer	Maximum number of recent pipeline executions to ingest per pipeline. Set to 0 to disable execution ingestion. Default: 10
parse_sql_for_lineage boolean	Whether to parse SQL from OpenLineage events to extract additional column-level lineage. Requires DataHub graph access. When enabled, SQL queries are parsed to infer lineage beyond what's explicitly provided in OpenLineage column mappings. Default: True
platform_instance One of string, null	The instance of the platform that all assets produced by this recipe belong to Default: None
start_time string(date-time)	Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on `bucket_duration`). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'. Default: None
env string	The environment that all assets produced by DataHub platform ingestion source belong to Default: PROD
environment_patterns AllowDenyPattern	A class to store allow deny regexes
environment_patterns.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
environment_patterns.allow array	List of regex patterns to include in ingestion Default: ['.*']
environment_patterns.allow.string string
environment_patterns.deny array	List of regex patterns to exclude from ingestion. Default: []
environment_patterns.deny.string string
namespace_to_platform_instance One of NamespacePlatformMapping, null	Maps OpenLineage namespace prefixes to platform instance/environment using longest prefix matching. Unmapped namespaces extract platform from URI with defaults (env=PROD). Example: {"snowflake://prod-account": {"platform_instance": "snowflake_prod", "env": "PROD"}} Default: None
namespace_to_platform_instance.`key`.platform_instance One of string, null	DataHub platform instance to use for datasets from this namespace Default: None
namespace_to_platform_instance.`key`.convert_urns_to_lowercase boolean	Whether to convert dataset URNs to lowercase for this namespace. Default: False
namespace_to_platform_instance.`key`.database One of string, null	Default database name to prepend if dataset name doesn't include database context Default: None
namespace_to_platform_instance.`key`.schema One of string, null	Default schema name to prepend if dataset name doesn't include schema context Default: None
namespace_to_platform_instance.`key`.env string	Environment (PROD, DEV, etc.) to use for datasets from this namespace Default: PROD
pipeline_patterns AllowDenyPattern	A class to store allow deny regexes
pipeline_patterns.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
pipeline_patterns.allow array	List of regex patterns to include in ingestion Default: ['.*']
pipeline_patterns.allow.string string
pipeline_patterns.deny array	List of regex patterns to exclude from ingestion. Default: []
pipeline_patterns.deny.string string
project_patterns AllowDenyPattern	A class to store allow deny regexes
project_patterns.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
project_patterns.allow array	List of regex patterns to include in ingestion Default: ['.*']
project_patterns.allow.string string
project_patterns.deny array	List of regex patterns to exclude from ingestion. Default: []
project_patterns.deny.string string
streaming_pipeline_patterns AllowDenyPattern	A class to store allow deny regexes
streaming_pipeline_patterns.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
streaming_pipeline_patterns.allow array	List of regex patterns to include in ingestion Default: ['.*']
streaming_pipeline_patterns.allow.string string
streaming_pipeline_patterns.deny array	List of regex patterns to exclude from ingestion. Default: []
streaming_pipeline_patterns.deny.string string
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null	Stateful ingestion configuration. Default: None
stateful_ingestion.enabled boolean	Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False Default: False
stateful_ingestion.fail_safe_threshold number	Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0
stateful_ingestion.remove_stale_metadata boolean	Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True

The JSONSchema for this configuration is inlined below.

{
  "$defs": {
    "AllowDenyPattern": {
      "additionalProperties": false,
      "description": "A class to store allow deny regexes",
      "properties": {
        "allow": {
          "default": [
            ".*"
          ],
          "description": "List of regex patterns to include in ingestion",
          "items": {
            "type": "string"
          },
          "title": "Allow",
          "type": "array"
        },
        "deny": {
          "default": [],
          "description": "List of regex patterns to exclude from ingestion.",
          "items": {
            "type": "string"
          },
          "title": "Deny",
          "type": "array"
        },
        "ignoreCase": {
          "anyOf": [
            {
              "type": "boolean"
            },
            {
              "type": "null"
            }
          ],
          "default": true,
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "title": "Ignorecase"
        }
      },
      "title": "AllowDenyPattern",
      "type": "object"
    },
    "BucketDuration": {
      "enum": [
        "DAY",
        "HOUR"
      ],
      "title": "BucketDuration",
      "type": "string"
    },
    "MatillionAPIConfig": {
      "additionalProperties": false,
      "properties": {
        "client_id": {
          "description": "Matillion API Client ID for OAuth2 authentication.",
          "format": "password",
          "title": "Client Id",
          "type": "string",
          "writeOnly": true
        },
        "client_secret": {
          "description": "Matillion API Client Secret for OAuth2 authentication.",
          "format": "password",
          "title": "Client Secret",
          "type": "string",
          "writeOnly": true
        },
        "region": {
          "$ref": "#/$defs/MatillionRegion",
          "default": "EU1",
          "description": "Matillion Data Productivity Cloud region (EU1 or US1)"
        },
        "custom_base_url": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Custom API base URL for VPC endpoints or on-premise installations.",
          "title": "Custom Base Url"
        },
        "custom_oauth_token_url": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Custom OAuth2 token endpoint URL for VPC endpoints or on-premise installations.",
          "title": "Custom Oauth Token Url"
        },
        "request_timeout_sec": {
          "default": 30,
          "description": "Request timeout in seconds",
          "title": "Request Timeout Sec",
          "type": "integer"
        }
      },
      "required": [
        "client_id",
        "client_secret"
      ],
      "title": "MatillionAPIConfig",
      "type": "object"
    },
    "MatillionRegion": {
      "enum": [
        "EU1",
        "US1"
      ],
      "title": "MatillionRegion",
      "type": "string"
    },
    "NamespacePlatformMapping": {
      "additionalProperties": false,
      "properties": {
        "platform_instance": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "DataHub platform instance to use for datasets from this namespace",
          "title": "Platform Instance"
        },
        "env": {
          "default": "PROD",
          "description": "Environment (PROD, DEV, etc.) to use for datasets from this namespace",
          "title": "Env",
          "type": "string"
        },
        "database": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Default database name to prepend if dataset name doesn't include database context",
          "title": "Database"
        },
        "schema": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Default schema name to prepend if dataset name doesn't include schema context",
          "title": "Schema"
        },
        "convert_urns_to_lowercase": {
          "default": false,
          "description": "Whether to convert dataset URNs to lowercase for this namespace.",
          "title": "Convert Urns To Lowercase",
          "type": "boolean"
        }
      },
      "title": "NamespacePlatformMapping",
      "type": "object"
    },
    "StatefulStaleMetadataRemovalConfig": {
      "additionalProperties": false,
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "title": "Enabled",
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "default": true,
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "title": "Remove Stale Metadata",
          "type": "boolean"
        },
        "fail_safe_threshold": {
          "default": 75.0,
          "description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
          "maximum": 100.0,
          "minimum": 0.0,
          "title": "Fail Safe Threshold",
          "type": "number"
        }
      },
      "title": "StatefulStaleMetadataRemovalConfig",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "properties": {
    "env": {
      "default": "PROD",
      "description": "The environment that all assets produced by DataHub platform ingestion source belong to",
      "title": "Env",
      "type": "string"
    },
    "platform_instance": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "The instance of the platform that all assets produced by this recipe belong to",
      "title": "Platform Instance"
    },
    "stateful_ingestion": {
      "anyOf": [
        {
          "$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Stateful ingestion configuration."
    },
    "bucket_duration": {
      "$ref": "#/$defs/BucketDuration",
      "default": "DAY",
      "description": "Size of the time window to aggregate usage stats."
    },
    "end_time": {
      "description": "Latest date of lineage/usage to consider. Default: Current time in UTC",
      "format": "date-time",
      "title": "End Time",
      "type": "string"
    },
    "start_time": {
      "default": null,
      "description": "Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on `bucket_duration`). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'.",
      "format": "date-time",
      "title": "Start Time",
      "type": "string"
    },
    "api_config": {
      "$ref": "#/$defs/MatillionAPIConfig",
      "description": "Matillion API configuration"
    },
    "max_executions_per_pipeline": {
      "default": 10,
      "description": "Maximum number of recent pipeline executions to ingest per pipeline. Set to 0 to disable execution ingestion.",
      "title": "Max Executions Per Pipeline",
      "type": "integer"
    },
    "parse_sql_for_lineage": {
      "default": true,
      "description": "Whether to parse SQL from OpenLineage events to extract additional column-level lineage. Requires DataHub graph access. When enabled, SQL queries are parsed to infer lineage beyond what's explicitly provided in OpenLineage column mappings.",
      "title": "Parse Sql For Lineage",
      "type": "boolean"
    },
    "namespace_to_platform_instance": {
      "anyOf": [
        {
          "additionalProperties": {
            "$ref": "#/$defs/NamespacePlatformMapping"
          },
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maps OpenLineage namespace prefixes to platform instance/environment using longest prefix matching. Unmapped namespaces extract platform from URI with defaults (env=PROD). Example: {\"snowflake://prod-account\": {\"platform_instance\": \"snowflake_prod\", \"env\": \"PROD\"}}",
      "title": "Namespace To Platform Instance"
    },
    "lineage_platform_mapping": {
      "anyOf": [
        {
          "additionalProperties": {
            "type": "string"
          },
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Override platform name mappings from OpenLineage namespaces to DataHub platforms. Only needed for non-standard platforms. See documentation for list of pre-mapped platforms. Example: {\"customdb\": \"postgres\", \"mywarehouse\": \"snowflake\"}",
      "title": "Lineage Platform Mapping"
    },
    "include_streaming_pipelines": {
      "default": true,
      "description": "Whether to ingest Matillion streaming pipelines (CDC pipelines). Streaming pipelines are emitted as separate DataFlows with pipeline_type='streaming'.",
      "title": "Include Streaming Pipelines",
      "type": "boolean"
    },
    "streaming_pipeline_patterns": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for filtering Matillion streaming pipelines to ingest."
    },
    "pipeline_patterns": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for filtering Matillion pipelines to ingest."
    },
    "project_patterns": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for filtering Matillion projects to ingest."
    },
    "environment_patterns": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for filtering Matillion environments to ingest."
    },
    "extract_projects_to_containers": {
      "default": true,
      "description": "Whether to extract Matillion projects as DataHub containers. When enabled, pipelines are organized under project containers, providing hierarchical navigation.",
      "title": "Extract Projects To Containers",
      "type": "boolean"
    },
    "include_unpublished_pipelines": {
      "default": true,
      "description": "Whether to discover and ingest unpublished pipelines from recent execution history. When enabled, the connector will discover pipelines that have been executed but not yet published. Disable this to only ingest published pipelines from the published-pipelines API.",
      "title": "Include Unpublished Pipelines",
      "type": "boolean"
    }
  },
  "required": [
    "api_config"
  ],
  "title": "MatillionSourceConfig",
  "type": "object"
}

Capabilities

OpenLineage Namespace Mapping

Optional configuration to map OpenLineage namespace URIs to DataHub platform information. Without this, the connector extracts platform type from URIs with default environment.

Fields:

platform_instance: Platform instance identifier (must match source ingestion)
database / schema: Defaults for incomplete dataset names from OpenLineage
- 3-tier platforms (Snowflake, Postgres, Redshift): database.schema.table
- 2-tier platforms (MySQL, Hive): schema.table
convert_urns_to_lowercase: Normalize URNs to lowercase (use true for Snowflake)
env: Environment tag (PROD, DEV, etc.)

Fallback behavior: Unmapped namespaces extract platform type from the URI (e.g., postgresql://... → postgres) without platform instance assignment.

SQL Parsing for Column-Level Lineage

Enable parse_sql_for_lineage: true to parse SQL queries from OpenLineage events for additional column-level lineage.

Requirements:

DataHub graph connection configured
Schema information in OpenLineage events

Platform-Specific Handling

Snowflake: Use convert_urns_to_lowercase: true in namespace mapping

BigQuery: 3-tier naming (project.dataset.table). Set database: project-id, schema: dataset-name

MySQL / 2-tier: 2-tier naming (schema.table). Set schema only

Postgres / Redshift: 3-tier naming (database.schema.table). Set both database and schema

Filtering Options

The connector supports flexible regex-based filtering to control what metadata is ingested.

Project Filtering

project_patterns:
  allow: ["^prod-.*", "^staging-.*"]
  deny: [".*-deprecated$"]

Environment Filtering

environment_patterns:
  allow: ["^production$", "^staging$"]
  deny: ["^sandbox.*"]

Pipeline Filtering

pipeline_patterns:
  allow: [".*"]
  deny: ["^test_.*", ".*_backup$"]

Streaming Pipeline Filtering

streaming_pipeline_patterns:
  allow: ["^cdc_.*"]
  deny: [".*_test$"]

All patterns are case-insensitive by default and support full regex syntax. Deny patterns take precedence over allow patterns.

Child Pipeline Dependencies

The connector automatically detects and tracks when pipelines call other pipelines (via "Run Pipeline" components). This creates step-level dependency relationships in DataHub, showing:

Which pipeline steps trigger child pipelines
Complete execution lineage across pipeline orchestrations
Cross-pipeline data flow for comprehensive impact analysis

No configuration needed — this feature is automatic when execution history is ingested.

Published vs Unpublished Pipelines

The connector can discover pipelines from two sources:

Published Pipelines — Pipelines explicitly published in Matillion DPC (fetched from /published-pipelines API)
Unpublished Pipelines — Pipelines discovered from recent execution history (fetched from /pipeline-executions API)

By default, both types are ingested. To only ingest published pipelines:

include_unpublished_pipelines: false

This is useful when:

You want to control what appears in DataHub via Matillion's publish workflow
You have many development/test pipelines that run but shouldn't be documented
You want to reduce ingestion time and API calls

Limitations

SQL parsing for column-level lineage requires a DataHub graph connection and schema information in OpenLineage events. Unsupported SQL dialects or complex queries are skipped with a warning.
Column-level lineage is only available when Matillion pipelines emit SQL via OpenLineage; transformations without SQL output will have coarse-grained lineage only.

Troubleshooting

Lineage Not Showing Up

Verify namespace mapping matches source ingestion platform instances
Check logs for Processing OpenLineage event messages
Confirm dataset names in OpenLineage match actual tables

Column-Level Lineage Missing

Enable parse_sql_for_lineage: true (requires DataHub graph connection).

Execution History Not Appearing

Adjust start_time to query further back in time if needed
Verify API permissions for Pipeline Executions API

Performance Issues

Reduce time window by adjusting start_time (e.g., only last 7 days instead of 30)
Use filtering patterns to reduce scope:
- project_patterns to filter projects
- environment_patterns to filter environments
- pipeline_patterns to filter pipelines
- streaming_pipeline_patterns to filter streaming pipelines
Disable include_streaming_pipelines if not needed
Increase api_config.request_timeout_sec if needed

Code Coordinates

Class Name: datahub.ingestion.source.matillion_dpc.matillion.MatillionSource
Browse on GitHub

Questions?

If you've got any questions on configuring ingestion for Matillion DPC, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.

Matillion DPC

Overview​

Concept Mapping​

Module matillion-dpc​

Important Capabilities​

Overview​

Prerequisites​

Obtain API Credentials​

Required Permissions​

Lineage and Dependencies​

OpenLineage Namespace Mapping (Optional)​

Install the Plugin​

Starter Recipe​

Config Details​

Capabilities​

OpenLineage Namespace Mapping​

SQL Parsing for Column-Level Lineage​

Platform-Specific Handling​

Filtering Options​

Project Filtering​

Environment Filtering​

Pipeline Filtering​

Streaming Pipeline Filtering​

Child Pipeline Dependencies​

Published vs Unpublished Pipelines​

Limitations​

Troubleshooting​

Lineage Not Showing Up​

Column-Level Lineage Missing​

Execution History Not Appearing​

Performance Issues​

Code Coordinates​

Overview

Concept Mapping

Module `matillion-dpc`

Important Capabilities

Overview

Prerequisites

Obtain API Credentials

Required Permissions

Lineage and Dependencies

OpenLineage Namespace Mapping (Optional)

Install the Plugin

Starter Recipe

Config Details

Capabilities

OpenLineage Namespace Mapping

SQL Parsing for Column-Level Lineage

Platform-Specific Handling

Filtering Options

Project Filtering

Environment Filtering

Pipeline Filtering

Streaming Pipeline Filtering

Child Pipeline Dependencies

Published vs Unpublished Pipelines

Limitations

Troubleshooting

Lineage Not Showing Up

Column-Level Lineage Missing

Execution History Not Appearing

Performance Issues

Code Coordinates