Fivetran

Overview

Fivetran is a streaming or integration platform. Learn more in the official Fivetran documentation.

The DataHub integration for Fivetran covers streaming/integration entities such as topics, connectors, pipelines, or jobs. It also captures column-level lineage and stateful deletion detection.

Concept Mapping

Fivetran	Datahub
`Connector`	DataJob
`Source`	Dataset
`Destination`	Dataset
`Connector Run`	DataProcessInstance

Source and destination are mapped to Dataset as an Input and Output of Connector.

Module `fivetran`

Important Capabilities

Capability	Status	Notes
Column-level Lineage	✅	Enabled by default, can be disabled via configuration `include_column_lineage`.
Detect Deleted Entities	✅	Enabled by default via stateful ingestion.
Platform Instance	✅	Enabled by default.

Overview

The fivetran module ingests metadata from Fivetran into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.

Integration Details

This source extracts the following:

Connectors in fivetran as Data Pipelines and Data Jobs to represent data lineage information between source and destination.
Connector sources - DataJob input Datasets.
Connector destination - DataJob output Datasets.
Connector runs - DataProcessInstances as DataJob runs.

Configuration Notes

Prerequisites:

Set up and complete initial sync of the Fivetran Platform Connector
Enable automatic schema updates (default) to avoid sync inconsistencies
Configure the destination platform (Snowflake, BigQuery, Databricks, or Managed Data Lake) in your recipe

Prerequisites

Before running ingestion, ensure network connectivity to the source, valid authentication credentials, and read permissions for metadata APIs required by this module.

To use the Fivetran REST API integration, you need:

Required API Permissions:

Read access to connection details (GET /v1/connections/{connection_id})
The API key must be associated with a user or service account that has access to the connectors you want to ingest
The API key inherits permissions from the user or service account it's associated with

Fivetran Managed Data Lake Service

The Fivetran Managed Data Lake Service replicates data to S3 as Iceberg tables and exposes them through an Iceberg REST Catalog (Polaris / Snowflake Open Catalog) or AWS Glue.

Use log_source: rest_api for Managed Data Lake destinations. The REST mode reads the Fivetran log via API and discovers each destination's service per-call — no Snowflake catalog-linked database setup is required. By default, REST-discovered MDL destinations emit Iceberg URNs:

urn:li:dataset:(urn:li:dataPlatform:iceberg, <schema>.<table>, <env>)

The namespace is the Fivetran connector schema verbatim (no fivetran_ prefix). This matches DataHub's iceberg source convention so URNs align if the same Iceberg / Polaris catalog is also ingested directly via the Iceberg source connector.

Example recipe

source:
  type: fivetran
  config:
    log_source: rest_api
    api_config:
      api_key: "${FIVETRAN_API_KEY}"
      api_secret: "${FIVETRAN_API_SECRET}"

    # Optional: align URNs with a separate Iceberg source connector
    destination_to_platform_instance:
      my_fivetran_destination_id:
        platform_instance: "polaris_us_west" # must match the Iceberg source recipe
        env: PROD

Managed Data Lake routing (Iceberg / Glue / S3 / GCS / ADLS)

Fivetran's REST API reports service: managed_data_lake for every Managed Data Lake destination, regardless of whether the underlying object storage is AWS S3, Google Cloud Storage, or Azure Data Lake Storage Gen2, and regardless of whether it's fronted by an Iceberg REST catalog (Polaris) or AWS Glue. The default URN routing is iceberg — correct for Polaris / Iceberg-REST setups across any of the three clouds. Override per destination by pinning platform in destination_to_platform_instance:

config:
  destination_to_platform_instance:
    polaris_warehouse_a:
      platform: iceberg # default; emit `iceberg.<schema>.<table>` URNs
    glue_warehouse_b:
      platform: glue # emit `glue.<database>.<schema>.<table>` URNs
      database: <actual Glue database name from your AWS Glue console>
      platform_instance: "glue_us_west" # match the Glue source recipe
      env: PROD
    s3_lake_c:
      platform: s3 # emit `s3.<bucket>/<prefix_path>/<schema>/<table>` URNs
    gcs_lake_d:
      platform: gcs # emit `gcs.<bucket>/<prefix_path>/<schema>/<table>` URNs
    adls_lake_e:
      platform: abs # emit `abs.<storage_account>/<container>/<prefix_path>/<schema>/<table>` URNs

Iceberg (default, or platform: iceberg). Emits urn:li:dataset:(iceberg, <schema>.<table>, env). No extra config needed for Polaris / Iceberg-REST destinations — this is the fallback default for any MDL destination whose config doesn't trigger another auto-detect rule.

Glue (platform: glue, or auto-detected). Emits urn:li:dataset:(glue, <database>.<schema>.<table>, env) aligned with DataHub's Glue source. Auto-detected in two cases — no explicit platform: glue needed in either:

The destination has Fivetran's should_maintain_tables_in_glue: true toggle set (visible via /v1/destinations/{id}), OR
The user supplied database on the destination entry (a database name only makes sense for Glue among MDL platforms, so the connector treats it as a glue-intent signal and avoids silently dropping it on an iceberg/s3/gcs/abs route).

You must supply database yourself for Glue routing. Fivetran's REST API does not expose the actual Glue database name it creates, and the Fivetran docs do not document the Glue-table-naming convention (Fivetran shares one Glue database per region across all destinations in that region). Inspect your AWS Glue console to find the actual database name and configure it on the destination entry:

destination_to_platform_instance:
  glue_warehouse_a:
    # platform: glue can be auto-detected from the MDL toggle, but you can
    # also pin it explicitly.
    database: fivetran_managed_data_lake_us_west_2 # actual Glue database name
    platform_instance: "glue_us_west" # match the Glue source recipe

The connector then composes urn:li:dataset:(glue, <database>.<schema>.<table>, env) using the <schema>.<table> from Fivetran's lineage record verbatim as the Glue table name. Verify against your Glue catalog that Fivetran's table names are formatted this way; if they aren't, the URN won't align with DataHub's Glue source URNs and lineage won't render.

Until database is set, Glue lineage edges are skipped with a structured warning (one per destination, not per edge — repeated edges on a misconfigured destination are silently skipped after the first warning).

Object-storage routing (platform: s3, gcs, or abs). Emits a path-style URN aligned with DataHub's S3, GCS, or Azure Blob / ADLS sources. The path prefix is composed from fields in the Fivetran destination response (/v1/destinations/{id}) — no extra recipe configuration required:

Platform	URN shape	Source of prefix in `/v1/destinations/{id}.config`
`s3`	`urn:li:dataset:(s3, <bucket>/<prefix_path>/<schema>/<table>, env)`	`bucket` + `prefix_path` (AWS-backed MDL)
`gcs`	`urn:li:dataset:(gcs, <bucket>/<prefix_path>/<schema>/<table>, env)`	`bucket` + `prefix_path` (GCS-backed MDL)
`abs`	`urn:li:dataset:(abs, <storage_account>/<container>/<prefix_path>/<schema>/<table>, env)`	`storage_account_name` + `container_name` + `prefix_path` (ADLS Gen2)

For example, an AWS-backed MDL destination with bucket: example-fivetran-lake and prefix_path: fivetran writing the sales.orders table emits urn:li:dataset:(s3, example-fivetran-lake/fivetran/sales/orders, PROD). To point lineage at a different layout (e.g., the same data mirrored under a different prefix in DataHub's storage source), override database on the same destination_to_platform_instance entry; the override is used as the URN prefix verbatim.

Storage-source URN alignment (important). Fivetran's MDL writes Iceberg-format tables to S3 / GCS / ADLS — meaning each <schema>/<table>/ folder contains an Iceberg metadata/ directory plus Parquet data files under data/. The Fivetran connector emits one URN per logical table at folder-level granularity (<bucket>/<prefix>/<schema>/<table>). For lineage to render against the dataset URNs produced by DataHub's S3, GCS, or ABS source, that source must be configured to produce table-level URNs that match this shape — typically by setting path_specs to treat each <schema>/<table>/ directory as a single dataset (e.g., path: s3://example-fivetran-lake/fivetran/{table}/ with table_name resolved from the directory name). If the data-lake source is instead configured to emit one URN per Parquet file, the Fivetran-emitted URN will not align and lineage won't render. When in doubt, prefer platform: iceberg (default) and ingest the Polaris / Iceberg REST catalog with DataHub's Iceberg source — the URNs align by construction without requiring additional path-spec coordination.

Overriding the URN platform per destination

For non-MDL destinations, or to align with a source connector whose platform name doesn't match Fivetran's discovered service (e.g. Unity Catalog), declare the platform explicitly:

destination_to_platform_instance:
  my_fivetran_destination_id:
    platform: unity_catalog
    platform_instance: "unity_us_west"
    env: PROD

The user override always wins; REST destination discovery still runs in the background to fill in any fields the user didn't pin (e.g. database from the discovered config when not overridden). For Glue routing, also set destination_to_platform_instance.<destination_id>.database to the actual Glue database name from your AWS Glue console.

Hybrid deployments and destination discovery

If your Fivetran setup has a single account-level Fivetran Platform Connector delivering log data to one destination (typically Snowflake) but actual data is spread across destinations of different types (e.g., Snowflake for some connectors, Managed Data Lake for others), the per-recipe destination_platform field can only describe one destination's type at a time.

Whenever api_config is set, the connector automatically consults the Fivetran REST API for each destination whose platform isn't pinned in destination_to_platform_instance, and emits URNs based on the discovered service:

source:
  type: fivetran
  config:
    fivetran_log_config:
      destination_platform: snowflake # where the log lives
      snowflake_destination_config:
        # ... your Snowflake log destination details ...
    api_config:
      api_key: "${FIVETRAN_API_KEY}"
      api_secret: "${FIVETRAN_API_SECRET}"

Discovery results are cached per-ingest, so each unique destination_id triggers at most one REST call.

Precedence: declarative entries in destination_to_platform_instance always win over discovery — use them to override an inaccurate REST result or fix one destination without touching the rest. Discovery still runs (one cached call per destination per ingest) so unpinned fields like database are auto-populated from the destination response when relevant.

Failures: if the REST call fails for a destination, the connector logs a structured warning and falls back to the recipe's default destination_platform. The ingest does not abort. Set the override explicitly via destination_to_platform_instance to bypass discovery for that destination.

MDL destinations: REST-discovered Managed Data Lake destinations default to iceberg URN routing. Override per-destination via destination_to_platform_instance.<id>.platform if you need a different platform (e.g. glue).

Choosing between `log_database` and `rest_api` modes

The connector reads metadata from two possible providers — the Fivetran Platform Connector log warehouse (DB) and the Fivetran REST API. Each provider supplies a different set of capabilities; the connector composes them based on which credentials you provide.

Capability matrix

Feature	DB log	REST API
Connector list / metadata	✅ (1 SQL query)	✅ (paginated per group)
Source platform (from connector type)	✅	✅
Destination platform routing (Snowflake / BigQuery / Databricks)	✅ via `destination_platform`	✅ via `/v1/destinations/{id}` discovery
Managed Data Lake → Iceberg URN (Polaris / Iceberg REST)	❌	✅ default for `service: managed_data_lake`
Managed Data Lake → Glue / S3 / GCS / ABS URN routing	❌	✅ via `destination_to_platform_instance.<id>.platform` (covers AWS, GCS, ADLS Gen2 backings)
Read Fivetran log when log destination is Managed Data Lake	❌	n/a — REST does not read a log database
Table lineage (with historical, including disabled tables)	✅ historical	⚠️ current enabled config only
Column lineage with source/destination column names	✅	✅ full coverage via per-table fetch from `/v1/connections/{id}/schemas/{schema}/tables/{table}/columns` (the bulk schemas-config endpoint only returns user-modified columns; the per-table endpoint fills the rest)
User / owner emails	✅ (1 SQL query)	✅ (paginated per group)
Sync history → DataProcessInstance events	✅	❌ (Fivetran REST has no sync-history endpoint; restored in REST-primary hybrid via DB log)
Rich sync-failure detail (`end_message_data` JSON)	✅	❌
Hashed / PII column flags	✅	⚠️ partial
Google Sheets connection config (`sheet_id`, `named_range`)	❌	✅ (REST is the only source)

Credential coverage — what's available per config combination

Configuration	What you get	What's not available
`fivetran_log_config` only	Full happy path for Snowflake / BigQuery / Databricks log setups: connectors, lineage with historical view, owner emails, DPI events, rich failure detail.	Managed Data Lake destinations (Iceberg / Polaris); per-destination URN routing for hybrid accounts with mixed destination types; Google Sheets connection config.
`api_config` only	All structural metadata for any destination type, including Managed Data Lake (Iceberg / Glue / S3 / GCS / ABS URN routing). Full table+column lineage via `/v1/connections/{id}/schemas` plus per-table column fetches. Works without warehouse credentials.	DataProcessInstance events (no REST sync-history endpoint); rich failure detail (DB log's `end_message_data` JSON); historical lineage of disabled tables.
Both `fivetran_log_config` and `api_config`	Recommended for full coverage. DB-primary by default (DB owns connectors / lineage / users / jobs; REST owns destination routing and Google Sheets details). Set `log_source: rest_api` for REST-primary hybrid — REST owns connectors / lineage / routing, DB log fills in DPI events plus higher-fidelity lineage.	Connectors visible only in destinations other than the configured `fivetran_log_config` warehouse won't get DPI events (run history requires log access).

Which `log_source` value to pick

log_source is optional — leave it unset and the connector infers the right value from the credential blocks you supply:

Credentials provided	Inferred `log_source`	What you get
`fivetran_log_config` only	`log_database`	DB owns everything: connectors, table+column lineage, users, jobs, run history. No destination discovery.
`api_config` only	`rest_api`	REST owns connectors, full table+column lineage, users, destination routing, Google Sheets. No DataProcessInstance run-history (no sync-history endpoint).
Both blocks	`log_database` (DB-primary)	DB owns connectors / lineage / users / jobs. REST fills in destination routing + Google Sheets details. Recommended for full coverage.
Both blocks + `log_source: rest_api`	`rest_api` (REST-primary)	REST owns connectors / schemas / users / destination routing. DB log fills in per-run DataProcessInstance events (REST has no sync-history endpoint) AND lineage (DB log carries explicit `source_column_name` / `destination_column_name` from sync events, slightly higher fidelity than REST schemas-config).

Set log_source explicitly only when you have both blocks and want REST-primary routing — the explicit value overrides the DB-primary default.

Hybrid mode (REST API + DB log)

REST mode by itself emits structural metadata (DataFlow, DataJob, datasets, full table+column lineage from /v1/connections/{id}/schemas) but no per-run DataProcessInstance events — Fivetran's REST API doesn't expose a sync-history endpoint.

REST-primary hybrid plugs that gap. Provide both blocks with log_source: rest_api and the connector queries the DB log warehouse for two things:

Sync history — DB log's sync_logs query produces DataProcessInstance events, one per recent sync run (controlled by history_sync_lookback_period).
Higher-fidelity lineage — the DB log's column_lineage table carries explicit source_column_name / destination_column_name written by the Fivetran Platform Connector during each sync. When the DB lineage reader is wired in, the REST reader prefers it over the schemas-config endpoint. If the DB lineage query fails transiently, the REST reader falls back to the schemas-config endpoint and emits a one-shot warning.

The fallback chain in REST-primary hybrid is: DB log lineage → REST schemas-config endpoint. Both produce full column lineage; the DB log is preferred when available.

source:
  type: fivetran
  config:
    log_source: rest_api
    api_config:
      api_key: "${FIVETRAN_API_KEY}"
      api_secret: "${FIVETRAN_API_SECRET}"

    # When this block is present alongside `log_source: rest_api`, REST owns
    # connectors / schemas / users / destination routing, and the log
    # warehouse fills in two things: per-run sync history (DataProcessInstance
    # events) and higher-fidelity lineage from the column_lineage table.
    fivetran_log_config:
      destination_platform: snowflake
      snowflake_destination_config:
        account_id: "${SNOWFLAKE_ACCOUNT_ID}"
        warehouse: "${SNOWFLAKE_WAREHOUSE}"
        username: "${SNOWFLAKE_USER}"
        password: "${SNOWFLAKE_PASS}"
        role: "${SNOWFLAKE_ROLE}"
        database: "${SNOWFLAKE_LOG_DB}"
        log_schema: "fivetran_log"

Tradeoffs: REST mode makes one or more API calls per connector instead of bulk SQL queries. For accounts with hundreds of connectors, expect noticeably more API requests during ingest (typically still well under Fivetran's per-minute rate limits). REST-only mode (no fivetran_log_config block) emits no DataProcessInstance events because Fivetran's REST API has no sync-history endpoint — use REST-primary hybrid as shown above to restore them.

Performance and rate limits (REST mode)

Per-connector schema and sync-history fetches run in parallel. Two knobs control the trade-off between wall-clock time and Fivetran rate-limit pressure:

rest_api_max_workers (default 4, range 1–32) — number of worker threads issuing concurrent HTTP calls. Higher values issue more requests/sec against the Fivetran API and reduce wall-clock time on accounts with many connectors. Set to 1 for fully sequential behaviour.
rest_api_per_connector_timeout_sec (default 300) — hard cap per connector. If a single REST call hangs, that connector is skipped with a warning instead of stalling the whole run.

If you start hitting Fivetran rate limits (HTTP 429s in the ingest log), lower rest_api_max_workers rather than raising it. The retry logic backs off on 429s, but reducing concurrency avoids the retries entirely.

For accounts with very large connectors that legitimately take minutes per call, raise rest_api_per_connector_timeout_sec. For smaller accounts where you'd rather fail fast on a hung request, lower it.

The per-connector limits — max_jobs_per_connector, max_table_lineage_per_connector, max_column_lineage_per_connector — apply equally in REST mode and bound the per-connector lineage payload to match the DB reader's behaviour.

Fivetran REST API Configuration

The Fivetran REST API configuration is required for Google Sheets connectors and optional for other use cases. It provides access to connection details that aren't available in the Platform Connector logs.

Setup

To obtain API credentials:

Log in to your Fivetran account
Go to Settings → API Config
Create or use an existing API key and secret

api_config:
  api_key: "your_api_key"
  api_secret: "your_api_secret"
  base_url: "https://api.fivetran.com" # Optional, defaults to this
  request_timeout_sec: 30 # Optional, defaults to 30 seconds

Google Sheets Connector Support

Google Sheets connectors require special handling because Google Sheets is not yet natively supported as a DataHub source. As a workaround, the Fivetran source creates Dataset entities for Google Sheets and includes them in the lineage.

Requirements

Fivetran REST API configuration (api_config) is required for Google Sheets connectors
The API is used to fetch connection details that aren't available in Platform Connector logs

What Gets Created

For each Google Sheets connector, two Dataset entities are created:

Google Sheet Dataset: Represents the entire Google Sheet
- Platform: google_sheets
- Subtype: GOOGLE_SHEETS
- Contains the sheet ID extracted from the Google Sheets URL
Named Range Dataset: Represents the specific named range being synced
- Platform: google_sheets
- Subtype: GOOGLE_SHEETS_NAMED_RANGE
- Contains the named range identifier
- Has upstream lineage to the Google Sheet Dataset

Limitations

Column lineage is disabled for Google Sheets connectors due to stale metadata issues in the Fivetran Platform Connector (as of October 2025)
This is a workaround that will be removed once DataHub natively supports Google Sheets as a source
If the Fivetran API is unavailable or the connector details can't be fetched, the connector will be skipped with a warning

Example Configuration

source:
  type: fivetran
  config:
    # Required for Google Sheets connectors
    api_config:
      api_key: "your_api_key"
      api_secret: "your_api_secret"

    # ... other configuration ...

Install the Plugin

pip install 'acryl-datahub[fivetran]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: fivetran
  config:
    # Optional - Choose how to read Fivetran log data. Leave unset to let the
    # connector infer this from which credential blocks you provide:
    #   only fivetran_log_config             → log_database
    #   only api_config                      → rest_api
    #   both (no explicit log_source)        → log_database (DB-primary;
    #                                          REST still fills in destination
    #                                          routing + Google Sheets details)
    # Set explicitly to override — e.g. `rest_api` with both blocks present
    # runs REST-primary and uses the DB log only for per-run sync history.
    # log_source: rest_api

    # NOTE: Whenever `api_config` is set, the connector automatically fetches
    # each destination's `service` (snowflake / bigquery / databricks /
    # managed_data_lake / ...) via the Fivetran REST API and routes URN
    # construction accordingly. This is required for hybrid deployments where
    # the Fivetran log lives in one destination but data is spread across
    # multiple destinations of different types. Pin `platform` on a per-
    # destination entry under `destination_to_platform_instance` to skip the
    # REST round-trip for that destination.

    # Fivetran log connector destination server configurations
    fivetran_log_config:
      destination_platform: snowflake
      # Optional - If destination platform is 'snowflake', provide snowflake configuration.
      snowflake_destination_config:
        # Coordinates
        account_id: "abc48144"
        warehouse: "COMPUTE_WH"
        database: "MY_SNOWFLAKE_DB"
        log_schema: "FIVETRAN_LOG"

        # Credentials
        username: "${SNOWFLAKE_USER}"
        password: "${SNOWFLAKE_PASS}"
        role: "snowflake_role"
      # Optional - If destination platform is 'bigquery', provide bigquery configuration.
      bigquery_destination_config:
        # Credentials
        credential:
          private_key_id: "project_key_id"
          project_id: "project_id"
          client_email: "client_email"
          client_id: "client_id"
          private_key: "private_key"
        dataset: "fivetran_log_dataset"
      # Optional - If destination platform is 'databricks', provide databricks configuration.
      databricks_destination_config:
        # Credentials
        token: "token"
        workspace_url: "workspace_url"
        warehouse_id: "warehouse_id"

        # Coordinates
        catalog: "fivetran_catalog"
        log_schema: "fivetran_log"

      # NOTE: For Managed Data Lake destinations (Iceberg / Polaris / Glue),
      # use `log_source: rest_api` (see below) instead of fivetran_log_config.

    # Optional - filter for certain connector names instead of ingesting everything.
    # connector_patterns:
    #   allow:
    #     - connector_name

    # Optional -- A mapping of the connector's all sources to its database.
    # sources_to_database:
    #   connector_id: source_db

    # Optional - Fivetran REST API configuration (required for Google Sheets connectors)
    # api_config:
    #   api_key: "your_api_key"
    #   api_secret: "your_api_secret"
    #   base_url: "https://api.fivetran.com"  # Optional
    #   request_timeout_sec: 30  # Optional
    # Optional -- This mapping is optional and only required to configure platform-instance for source
    # A mapping of Fivetran connector id to data platform instance
    # sources_to_platform_instance:
    #   connector_id:
    #     platform_instance: cloud_instance
    #     env: DEV

    # Optional -- This mapping is optional and only required to configure platform-instance for destination.
    # A mapping of Fivetran destination id to data platform instance.
    # For Managed Data Lake destinations, set this to match the platform_instance
    # used by your Iceberg / Glue source recipe so emitted URNs align with the
    # ones that source connector emits and lineage renders end-to-end. Use
    # `platform: glue` (or another platform) here to override the default
    # `iceberg` URN routing for MDL destinations.
    # destination_to_platform_instance:
    #   destination_id:
    #     platform: iceberg
    #     platform_instance: cloud_instance
    #     env: DEV

sink:
  # sink configs

Config Details

Options
Schema

Note that a . is used to denote nested fields in the YAML recipe.

Field	Description
history_sync_lookback_period integer	The number of days to look back when extracting connectors' sync history. Default: 7
include_column_lineage boolean	Populates table->table column lineage. Default: True
log_source One of Enum, null	Where to read the Fivetran log from. Leave unset to let the connector infer this from which credential blocks you provide: - Only `fivetran_log_config` → `log_database`. - Only `api_config` → `rest_api`. - Both → `log_database` (DB-primary; REST still owns destination routing and Google Sheets details). Set this explicitly to override the default routing — e.g. `rest_api` with a `fivetran_log_config` block also present runs REST-primary with the DB log only providing per-run sync history. Default: None
max_column_lineage_per_connector integer	Maximum number of column lineage entries to retrieve per connector. Default: 1000
max_jobs_per_connector integer	Maximum number of sync jobs to retrieve per connector. Default: 500
max_table_lineage_per_connector integer	Maximum number of table lineage entries to retrieve per connector. Default: 120
platform_instance One of string, null	The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None
rest_api_max_workers integer	Number of worker threads used to fetch per-connector data (schemas + sync history) in parallel when `log_source: rest_api`. Values >1 issue concurrent HTTP calls to the Fivetran REST API and meaningfully speed up ingestion for accounts with hundreds of connectors. Set to 1 for fully sequential behaviour. Lower this (not raise it) if you hit Fivetran rate limits. Ignored in `log_database` mode. Default: 4
rest_api_per_connector_timeout_sec integer	Hard wall-clock timeout (seconds) for fetching a single connector's schema + sync history when `log_source: rest_api`. If exceeded, that connector is emitted without lineage / run history and a warning is recorded — the rest of the ingest continues. Guards against a single hung HTTP call stalling the whole run. Healthy connectors finish in seconds; bump only if you have very large connectors that legitimately need more. Default: 300
env string	The environment that all assets produced by this connector belong to Default: PROD
api_config One of FivetranAPIConfig, null	Fivetran REST API configuration, used to provide wider support for connections. Default: None
api_config.api_key ❓ string(password)	Fivetran API key
api_config.api_secret ❓ string(password)	Fivetran API secret
api_config.base_url string	Fivetran API base URL Default: https://api.fivetran.com
api_config.request_timeout_sec integer	Request timeout in seconds Default: 30
connector_patterns AllowDenyPattern	A class to store allow deny regexes
connector_patterns.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
connector_patterns.allow array	List of regex patterns to include in ingestion Default: ['.*']
connector_patterns.allow.string string
connector_patterns.deny array	List of regex patterns to exclude from ingestion. Default: []
connector_patterns.deny.string string
destination_patterns AllowDenyPattern	A class to store allow deny regexes
destination_patterns.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
destination_patterns.allow array	List of regex patterns to include in ingestion Default: ['.*']
destination_patterns.allow.string string
destination_patterns.deny array	List of regex patterns to exclude from ingestion. Default: []
destination_patterns.deny.string string
destination_to_platform_instance map(str,PlatformDetail)
destination_to_platform_instance.`key`.platform One of string, null	Override the platform type detection. Default: None
destination_to_platform_instance.`key`.database One of string, null	The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database. Default: None
destination_to_platform_instance.`key`.database_lowercase boolean	Lowercase the `database` segment when constructing the dataset URN. Defaults to True to match DataHub's standard lowercase URN convention (and to preserve the long-standing Fivetran connector behaviour). Set False to keep the case Fivetran reports — useful when aligning with another DataHub source whose URN preserves the database casing (e.g. some Glue or Iceberg setups). Schema and table segments are always passed through unchanged. Default: True
destination_to_platform_instance.`key`.include_schema_in_urn boolean	Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector. Default: True
destination_to_platform_instance.`key`.platform_instance One of string, null	The instance of the platform that all assets produced by this recipe belong to Default: None
destination_to_platform_instance.`key`.env string	The environment that all assets produced by DataHub platform ingestion source belong to Default: PROD
fivetran_log_config One of FivetranLogConfig, null	Fivetran Platform Connector log destination configuration. Required for `log_database` mode (the inferred default whenever this block is present). Optional in `rest_api` mode — when supplied alongside `api_config`, the REST reader uses the DB log only for per-run sync history (which the REST API doesn't expose). Default: None
fivetran_log_config.destination_platform Enum	One of: "snowflake", "bigquery", "databricks" Default: snowflake
fivetran_log_config.bigquery_destination_config One of BigQueryDestinationConfig, null	If destination platform is 'bigquery', provide bigquery configuration. Default: None
fivetran_log_config.bigquery_destination_config.dataset ❓ string	The fivetran connector log dataset.
fivetran_log_config.bigquery_destination_config.extra_client_options object	Additional options to pass to google.cloud.logging_v2.client.Client. Default: {}
fivetran_log_config.bigquery_destination_config.project_on_behalf One of string, null	[Advanced] The BigQuery project in which queries are executed. Will be passed when creating a job. If not passed, falls back to the project associated with the service account. Default: None
fivetran_log_config.bigquery_destination_config.credential One of GCPCredential, null	BigQuery credential informations Default: None
fivetran_log_config.bigquery_destination_config.credential.client_email ❓ string	Client email
fivetran_log_config.bigquery_destination_config.credential.client_id ❓ string	Client Id
fivetran_log_config.bigquery_destination_config.credential.private_key ❓ string(password)	Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n'
fivetran_log_config.bigquery_destination_config.credential.private_key_id ❓ string	Private key id
fivetran_log_config.bigquery_destination_config.credential.auth_provider_x509_cert_url string	Auth provider x509 certificate url Default: https://www.googleapis.com/oauth2/v1/certs
fivetran_log_config.bigquery_destination_config.credential.auth_uri string	Authentication uri Default: https://accounts.google.com/o/oauth2/auth
fivetran_log_config.bigquery_destination_config.credential.client_x509_cert_url One of string, null	If not set it will be default to https://www.googleapis.com/robot/v1/metadata/x509/client_email Default: None
fivetran_log_config.bigquery_destination_config.credential.project_id One of string, null	Project id to set the credentials Default: None
fivetran_log_config.bigquery_destination_config.credential.token_uri string	Token uri Default: https://oauth2.googleapis.com/token
fivetran_log_config.bigquery_destination_config.credential.type string	Authentication type Default: service_account
fivetran_log_config.databricks_destination_config One of DatabricksDestinationConfig, null	If destination platform is 'databricks', provide databricks configuration. Default: None
fivetran_log_config.databricks_destination_config.catalog ❓ string	The fivetran connector log catalog.
fivetran_log_config.databricks_destination_config.log_schema ❓ string	The fivetran connector log schema.
fivetran_log_config.databricks_destination_config.workspace_url ❓ string	Databricks workspace url. e.g. https://my-workspace.cloud.databricks.com
fivetran_log_config.databricks_destination_config.client_id One of string, null	Databricks service principal client ID Default: None
fivetran_log_config.databricks_destination_config.client_secret One of string(password), null	Databricks service principal client secret Default: None
fivetran_log_config.databricks_destination_config.extra_client_options object	Additional options to pass to Databricks SQLAlchemy client. Default: {}
fivetran_log_config.databricks_destination_config.scheme string	Default: databricks
fivetran_log_config.databricks_destination_config.token One of string(password), null	Databricks personal access token Default: None
fivetran_log_config.databricks_destination_config.warehouse_id One of string, null	SQL Warehouse id, for running queries. Must be explicitly provided to enable SQL-based features. Required for the following features that need SQL access: 1) Tag extraction (include_tags=True) - queries system.information_schema.tags 2) Hive Metastore catalog (include_hive_metastore=True) - queries legacy hive_metastore catalog 3) System table lineage (lineage_data_source=SYSTEM_TABLES) - queries system.access.table_lineage/column_lineage 4) Data profiling (profiling.enabled=True) - runs SELECT/ANALYZE queries on tables. When warehouse_id is missing, these features will be automatically disabled (with warnings) to allow ingestion to continue. Default: None
fivetran_log_config.databricks_destination_config.azure_auth One of AzureAuthConfig, null	Azure configuration Default: None
fivetran_log_config.databricks_destination_config.azure_auth.client_id ❓ string	Azure application (client) ID. This is the unique identifier for the registered Azure AD application.
fivetran_log_config.databricks_destination_config.azure_auth.client_secret ❓ string(password)	Azure application client secret used for authentication. This is a confidential credential that should be kept secure.
fivetran_log_config.databricks_destination_config.azure_auth.tenant_id ❓ string	Azure tenant (directory) ID. This identifies the Azure AD tenant where the application is registered.
fivetran_log_config.snowflake_destination_config One of SnowflakeDestinationConfig, null	If destination platform is 'snowflake', provide snowflake configuration. Default: None
fivetran_log_config.snowflake_destination_config.account_id ❓ string	Snowflake account identifier. e.g. xy12345, xy12345.us-east-2.aws, xy12345.us-central1.gcp, xy12345.central-us.azure, xy12345.us-west-2.privatelink. Refer Account Identifiers for more details.
fivetran_log_config.snowflake_destination_config.database ❓ string	The fivetran connector log database.
fivetran_log_config.snowflake_destination_config.log_schema ❓ string	The fivetran connector log schema.
fivetran_log_config.snowflake_destination_config.authentication_type string	The type of authenticator to use when connecting to Snowflake. Supports "DEFAULT_AUTHENTICATOR", "OAUTH_AUTHENTICATOR", "EXTERNAL_BROWSER_AUTHENTICATOR" and "KEY_PAIR_AUTHENTICATOR". Default: DEFAULT_AUTHENTICATOR
fivetran_log_config.snowflake_destination_config.connect_args One of object, null	Connect args to pass to Snowflake SqlAlchemy driver Default: None
fivetran_log_config.snowflake_destination_config.options object	Any options specified here will be passed to SQLAlchemy.create_engine as kwargs.
fivetran_log_config.snowflake_destination_config.password One of string(password), null	Snowflake password. Default: None
fivetran_log_config.snowflake_destination_config.preserve_case boolean	Pass `database` and `log_schema` identifiers verbatim when issuing `USE DATABASE` / `USE SCHEMA`, instead of Snowflake's default uppercasing of unquoted identifiers. Useful when the log lives in a Snowflake schema created with quoted lowercase names, or any other case-preserving setup where the uppercasing path would query identifiers that don't exist. For Managed Data Lake destinations specifically: prefer `log_source: rest_api` over a Snowflake catalog-linked database (CLD) — REST mode reads the log directly via API and avoids the identifier-casing issue altogether. Default: False
fivetran_log_config.snowflake_destination_config.private_key One of string(password), null	Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n' if using key pair authentication. Encrypted version of private key will be in a form of '-----BEGIN ENCRYPTED PRIVATE KEY-----\nencrypted-private-key\n-----END ENCRYPTED PRIVATE KEY-----\n' See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html Default: None
fivetran_log_config.snowflake_destination_config.private_key_password One of string(password), null	Password for your private key. Required if using key pair authentication with encrypted private key. Default: None
fivetran_log_config.snowflake_destination_config.private_key_path One of string, null	The path to the private key if using key pair authentication. Ignored if `private_key` is set. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html Default: None
fivetran_log_config.snowflake_destination_config.role One of string, null	Snowflake role. Default: None
fivetran_log_config.snowflake_destination_config.snowflake_domain string	Snowflake domain. Use 'snowflakecomputing.com' for most regions or 'snowflakecomputing.cn' for China (cn-northwest-1) region. Default: snowflakecomputing.com
fivetran_log_config.snowflake_destination_config.token One of string(password), null	OAuth token from external identity provider. Not recommended for most use cases because it will not be able to refresh once expired. Default: None
fivetran_log_config.snowflake_destination_config.username One of string, null	Snowflake username. Default: None
fivetran_log_config.snowflake_destination_config.warehouse One of string, null	Snowflake warehouse. Default: None
fivetran_log_config.snowflake_destination_config.oauth_config One of OAuthConfiguration, null	oauth configuration - https://docs.snowflake.com/en/user-guide/python-connector-example.html#connecting-with-oauth Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.authority_url ❓ string	Authority url of your identity provider
fivetran_log_config.snowflake_destination_config.oauth_config.client_id ❓ string	client id of your registered application
fivetran_log_config.snowflake_destination_config.oauth_config.provider ❓ Enum	One of: "microsoft", "okta"
fivetran_log_config.snowflake_destination_config.oauth_config.scopes ❓ array	scopes required to connect to snowflake
fivetran_log_config.snowflake_destination_config.oauth_config.scopes.string string
fivetran_log_config.snowflake_destination_config.oauth_config.client_secret One of string(password), null	client secret of the application if use_certificate = false Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.encoded_oauth_private_key One of string(password), null	base64 encoded private key content if use_certificate = true Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.encoded_oauth_public_key One of string, null	base64 encoded certificate content if use_certificate = true Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.use_certificate boolean	Do you want to use certificate and private key to authenticate using oauth Default: False
sources_to_platform_instance map(str,PlatformDetail)
sources_to_platform_instance.`key`.platform One of string, null	Override the platform type detection. Default: None
sources_to_platform_instance.`key`.database One of string, null	The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database. Default: None
sources_to_platform_instance.`key`.database_lowercase boolean	Lowercase the `database` segment when constructing the dataset URN. Defaults to True to match DataHub's standard lowercase URN convention (and to preserve the long-standing Fivetran connector behaviour). Set False to keep the case Fivetran reports — useful when aligning with another DataHub source whose URN preserves the database casing (e.g. some Glue or Iceberg setups). Schema and table segments are always passed through unchanged. Default: True
sources_to_platform_instance.`key`.include_schema_in_urn boolean	Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector. Default: True
sources_to_platform_instance.`key`.platform_instance One of string, null	The instance of the platform that all assets produced by this recipe belong to Default: None
sources_to_platform_instance.`key`.env string	The environment that all assets produced by DataHub platform ingestion source belong to Default: PROD
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null	Fivetran Stateful Ingestion Config. Default: None
stateful_ingestion.enabled boolean	Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False Default: False
stateful_ingestion.fail_safe_threshold number	Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0
stateful_ingestion.remove_stale_metadata boolean	Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True

The JSONSchema for this configuration is inlined below.

{
  "$defs": {
    "AllowDenyPattern": {
      "additionalProperties": false,
      "description": "A class to store allow deny regexes",
      "properties": {
        "allow": {
          "default": [
            ".*"
          ],
          "description": "List of regex patterns to include in ingestion",
          "items": {
            "type": "string"
          },
          "title": "Allow",
          "type": "array"
        },
        "deny": {
          "default": [],
          "description": "List of regex patterns to exclude from ingestion.",
          "items": {
            "type": "string"
          },
          "title": "Deny",
          "type": "array"
        },
        "ignoreCase": {
          "anyOf": [
            {
              "type": "boolean"
            },
            {
              "type": "null"
            }
          ],
          "default": true,
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "title": "Ignorecase"
        }
      },
      "title": "AllowDenyPattern",
      "type": "object"
    },
    "AzureAuthConfig": {
      "additionalProperties": false,
      "properties": {
        "client_secret": {
          "description": "Azure application client secret used for authentication. This is a confidential credential that should be kept secure.",
          "format": "password",
          "title": "Client Secret",
          "type": "string",
          "writeOnly": true
        },
        "client_id": {
          "description": "Azure application (client) ID. This is the unique identifier for the registered Azure AD application.",
          "title": "Client Id",
          "type": "string"
        },
        "tenant_id": {
          "description": "Azure tenant (directory) ID. This identifies the Azure AD tenant where the application is registered.",
          "title": "Tenant Id",
          "type": "string"
        }
      },
      "required": [
        "client_secret",
        "client_id",
        "tenant_id"
      ],
      "title": "AzureAuthConfig",
      "type": "object"
    },
    "BigQueryDestinationConfig": {
      "additionalProperties": false,
      "properties": {
        "credential": {
          "anyOf": [
            {
              "$ref": "#/$defs/GCPCredential"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "BigQuery credential informations"
        },
        "extra_client_options": {
          "additionalProperties": true,
          "default": {},
          "description": "Additional options to pass to google.cloud.logging_v2.client.Client.",
          "title": "Extra Client Options",
          "type": "object"
        },
        "project_on_behalf": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "[Advanced] The BigQuery project in which queries are executed. Will be passed when creating a job. If not passed, falls back to the project associated with the service account.",
          "title": "Project On Behalf"
        },
        "dataset": {
          "description": "The fivetran connector log dataset.",
          "title": "Dataset",
          "type": "string"
        }
      },
      "required": [
        "dataset"
      ],
      "title": "BigQueryDestinationConfig",
      "type": "object"
    },
    "DatabricksDestinationConfig": {
      "additionalProperties": false,
      "properties": {
        "scheme": {
          "default": "databricks",
          "title": "Scheme",
          "type": "string"
        },
        "token": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Databricks personal access token",
          "title": "Token"
        },
        "azure_auth": {
          "anyOf": [
            {
              "$ref": "#/$defs/AzureAuthConfig"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Azure configuration"
        },
        "client_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Databricks service principal client ID",
          "title": "Client Id"
        },
        "client_secret": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Databricks service principal client secret",
          "title": "Client Secret"
        },
        "workspace_url": {
          "description": "Databricks workspace url. e.g. https://my-workspace.cloud.databricks.com",
          "title": "Workspace Url",
          "type": "string"
        },
        "warehouse_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "SQL Warehouse id, for running queries. Must be explicitly provided to enable SQL-based features. Required for the following features that need SQL access: 1) Tag extraction (include_tags=True) - queries system.information_schema.tags 2) Hive Metastore catalog (include_hive_metastore=True) - queries legacy hive_metastore catalog 3) System table lineage (lineage_data_source=SYSTEM_TABLES) - queries system.access.table_lineage/column_lineage 4) Data profiling (profiling.enabled=True) - runs SELECT/ANALYZE queries on tables. When warehouse_id is missing, these features will be automatically disabled (with warnings) to allow ingestion to continue.",
          "title": "Warehouse Id"
        },
        "extra_client_options": {
          "additionalProperties": true,
          "default": {},
          "description": "Additional options to pass to Databricks SQLAlchemy client.",
          "title": "Extra Client Options",
          "type": "object"
        },
        "catalog": {
          "description": "The fivetran connector log catalog.",
          "title": "Catalog",
          "type": "string"
        },
        "log_schema": {
          "description": "The fivetran connector log schema.",
          "title": "Log Schema",
          "type": "string"
        }
      },
      "required": [
        "workspace_url",
        "catalog",
        "log_schema"
      ],
      "title": "DatabricksDestinationConfig",
      "type": "object"
    },
    "FivetranAPIConfig": {
      "additionalProperties": false,
      "properties": {
        "api_key": {
          "description": "Fivetran API key",
          "format": "password",
          "title": "Api Key",
          "type": "string",
          "writeOnly": true
        },
        "api_secret": {
          "description": "Fivetran API secret",
          "format": "password",
          "title": "Api Secret",
          "type": "string",
          "writeOnly": true
        },
        "base_url": {
          "default": "https://api.fivetran.com",
          "description": "Fivetran API base URL",
          "title": "Base Url",
          "type": "string"
        },
        "request_timeout_sec": {
          "default": 30,
          "description": "Request timeout in seconds",
          "title": "Request Timeout Sec",
          "type": "integer"
        }
      },
      "required": [
        "api_key",
        "api_secret"
      ],
      "title": "FivetranAPIConfig",
      "type": "object"
    },
    "FivetranLogConfig": {
      "additionalProperties": false,
      "properties": {
        "destination_platform": {
          "default": "snowflake",
          "description": "The destination platform where fivetran connector log tables are dumped. For Managed Data Lake destinations use `log_source: rest_api` instead (no `fivetran_log_config` block needed).",
          "enum": [
            "snowflake",
            "bigquery",
            "databricks"
          ],
          "title": "Destination Platform",
          "type": "string"
        },
        "snowflake_destination_config": {
          "anyOf": [
            {
              "$ref": "#/$defs/SnowflakeDestinationConfig"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "If destination platform is 'snowflake', provide snowflake configuration."
        },
        "bigquery_destination_config": {
          "anyOf": [
            {
              "$ref": "#/$defs/BigQueryDestinationConfig"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "If destination platform is 'bigquery', provide bigquery configuration."
        },
        "databricks_destination_config": {
          "anyOf": [
            {
              "$ref": "#/$defs/DatabricksDestinationConfig"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "If destination platform is 'databricks', provide databricks configuration."
        }
      },
      "title": "FivetranLogConfig",
      "type": "object"
    },
    "GCPCredential": {
      "additionalProperties": false,
      "properties": {
        "project_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Project id to set the credentials",
          "title": "Project Id"
        },
        "private_key_id": {
          "description": "Private key id",
          "title": "Private Key Id",
          "type": "string"
        },
        "private_key": {
          "description": "Private key in a form of '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n'",
          "format": "password",
          "title": "Private Key",
          "type": "string",
          "writeOnly": true
        },
        "client_email": {
          "description": "Client email",
          "title": "Client Email",
          "type": "string"
        },
        "client_id": {
          "description": "Client Id",
          "title": "Client Id",
          "type": "string"
        },
        "auth_uri": {
          "default": "https://accounts.google.com/o/oauth2/auth",
          "description": "Authentication uri",
          "title": "Auth Uri",
          "type": "string"
        },
        "token_uri": {
          "default": "https://oauth2.googleapis.com/token",
          "description": "Token uri",
          "title": "Token Uri",
          "type": "string"
        },
        "auth_provider_x509_cert_url": {
          "default": "https://www.googleapis.com/oauth2/v1/certs",
          "description": "Auth provider x509 certificate url",
          "title": "Auth Provider X509 Cert Url",
          "type": "string"
        },
        "type": {
          "default": "service_account",
          "description": "Authentication type",
          "title": "Type",
          "type": "string"
        },
        "client_x509_cert_url": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "If not set it will be default to https://www.googleapis.com/robot/v1/metadata/x509/client_email",
          "title": "Client X509 Cert Url"
        }
      },
      "required": [
        "private_key_id",
        "private_key",
        "client_email",
        "client_id"
      ],
      "title": "GCPCredential",
      "type": "object"
    },
    "OAuthConfiguration": {
      "additionalProperties": false,
      "properties": {
        "provider": {
          "$ref": "#/$defs/OAuthIdentityProvider",
          "description": "Identity provider for oauth.Supported providers are microsoft and okta."
        },
        "authority_url": {
          "description": "Authority url of your identity provider",
          "title": "Authority Url",
          "type": "string"
        },
        "client_id": {
          "description": "client id of your registered application",
          "title": "Client Id",
          "type": "string"
        },
        "scopes": {
          "description": "scopes required to connect to snowflake",
          "items": {
            "type": "string"
          },
          "title": "Scopes",
          "type": "array"
        },
        "use_certificate": {
          "default": false,
          "description": "Do you want to use certificate and private key to authenticate using oauth",
          "title": "Use Certificate",
          "type": "boolean"
        },
        "client_secret": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "client secret of the application if use_certificate = false",
          "title": "Client Secret"
        },
        "encoded_oauth_public_key": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "base64 encoded certificate content if use_certificate = true",
          "title": "Encoded Oauth Public Key"
        },
        "encoded_oauth_private_key": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "base64 encoded private key content if use_certificate = true",
          "title": "Encoded Oauth Private Key"
        }
      },
      "required": [
        "provider",
        "authority_url",
        "client_id",
        "scopes"
      ],
      "title": "OAuthConfiguration",
      "type": "object"
    },
    "OAuthIdentityProvider": {
      "enum": [
        "microsoft",
        "okta"
      ],
      "title": "OAuthIdentityProvider",
      "type": "string"
    },
    "PlatformDetail": {
      "additionalProperties": false,
      "properties": {
        "platform": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Override the platform type detection.",
          "title": "Platform"
        },
        "platform_instance": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The instance of the platform that all assets produced by this recipe belong to",
          "title": "Platform Instance"
        },
        "env": {
          "default": "PROD",
          "description": "The environment that all assets produced by DataHub platform ingestion source belong to",
          "title": "Env",
          "type": "string"
        },
        "database": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database.",
          "title": "Database"
        },
        "include_schema_in_urn": {
          "default": true,
          "description": "Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector.",
          "title": "Include Schema In Urn",
          "type": "boolean"
        },
        "database_lowercase": {
          "default": true,
          "description": "Lowercase the `database` segment when constructing the dataset URN. Defaults to True to match DataHub's standard lowercase URN convention (and to preserve the long-standing Fivetran connector behaviour). Set False to keep the case Fivetran reports \u2014 useful when aligning with another DataHub source whose URN preserves the database casing (e.g. some Glue or Iceberg setups). Schema and table segments are always passed through unchanged.",
          "title": "Database Lowercase",
          "type": "boolean"
        }
      },
      "title": "PlatformDetail",
      "type": "object"
    },
    "SnowflakeDestinationConfig": {
      "additionalProperties": false,
      "properties": {
        "options": {
          "additionalProperties": true,
          "description": "Any options specified here will be passed to [SQLAlchemy.create_engine](https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine) as kwargs.",
          "title": "Options",
          "type": "object"
        },
        "username": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Snowflake username.",
          "title": "Username"
        },
        "password": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Snowflake password.",
          "title": "Password"
        },
        "private_key": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Private key in a form of '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n' if using key pair authentication. Encrypted version of private key will be in a form of '-----BEGIN ENCRYPTED PRIVATE KEY-----\\nencrypted-private-key\\n-----END ENCRYPTED PRIVATE KEY-----\\n' See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html",
          "title": "Private Key"
        },
        "private_key_path": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The path to the private key if using key pair authentication. Ignored if `private_key` is set. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html",
          "title": "Private Key Path"
        },
        "private_key_password": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Password for your private key. Required if using key pair authentication with encrypted private key.",
          "title": "Private Key Password"
        },
        "oauth_config": {
          "anyOf": [
            {
              "$ref": "#/$defs/OAuthConfiguration"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "oauth configuration - https://docs.snowflake.com/en/user-guide/python-connector-example.html#connecting-with-oauth"
        },
        "authentication_type": {
          "default": "DEFAULT_AUTHENTICATOR",
          "description": "The type of authenticator to use when connecting to Snowflake. Supports \"DEFAULT_AUTHENTICATOR\", \"OAUTH_AUTHENTICATOR\", \"EXTERNAL_BROWSER_AUTHENTICATOR\" and \"KEY_PAIR_AUTHENTICATOR\".",
          "title": "Authentication Type",
          "type": "string"
        },
        "account_id": {
          "description": "Snowflake account identifier. e.g. xy12345,  xy12345.us-east-2.aws, xy12345.us-central1.gcp, xy12345.central-us.azure, xy12345.us-west-2.privatelink. Refer [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#format-2-legacy-account-locator-in-a-region) for more details.",
          "title": "Account Id",
          "type": "string"
        },
        "warehouse": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Snowflake warehouse.",
          "title": "Warehouse"
        },
        "role": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Snowflake role.",
          "title": "Role"
        },
        "connect_args": {
          "anyOf": [
            {
              "additionalProperties": true,
              "type": "object"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Connect args to pass to Snowflake SqlAlchemy driver",
          "title": "Connect Args"
        },
        "token": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "OAuth token from external identity provider. Not recommended for most use cases because it will not be able to refresh once expired.",
          "title": "Token"
        },
        "snowflake_domain": {
          "default": "snowflakecomputing.com",
          "description": "Snowflake domain. Use 'snowflakecomputing.com' for most regions or 'snowflakecomputing.cn' for China (cn-northwest-1) region.",
          "title": "Snowflake Domain",
          "type": "string"
        },
        "database": {
          "description": "The fivetran connector log database.",
          "title": "Database",
          "type": "string"
        },
        "log_schema": {
          "description": "The fivetran connector log schema.",
          "title": "Log Schema",
          "type": "string"
        },
        "preserve_case": {
          "default": false,
          "description": "Pass `database` and `log_schema` identifiers verbatim when issuing `USE DATABASE` / `USE SCHEMA`, instead of Snowflake's default uppercasing of unquoted identifiers. Useful when the log lives in a Snowflake schema created with quoted lowercase names, or any other case-preserving setup where the uppercasing path would query identifiers that don't exist. **For Managed Data Lake destinations specifically: prefer `log_source: rest_api` over a Snowflake catalog-linked database (CLD) \u2014 REST mode reads the log directly via API and avoids the identifier-casing issue altogether.**",
          "title": "Preserve Case",
          "type": "boolean"
        }
      },
      "required": [
        "account_id",
        "database",
        "log_schema"
      ],
      "title": "SnowflakeDestinationConfig",
      "type": "object"
    },
    "StatefulStaleMetadataRemovalConfig": {
      "additionalProperties": false,
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "title": "Enabled",
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "default": true,
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "title": "Remove Stale Metadata",
          "type": "boolean"
        },
        "fail_safe_threshold": {
          "default": 75.0,
          "description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
          "maximum": 100.0,
          "minimum": 0.0,
          "title": "Fail Safe Threshold",
          "type": "number"
        }
      },
      "title": "StatefulStaleMetadataRemovalConfig",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "properties": {
    "env": {
      "default": "PROD",
      "description": "The environment that all assets produced by this connector belong to",
      "title": "Env",
      "type": "string"
    },
    "platform_instance": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
      "title": "Platform Instance"
    },
    "stateful_ingestion": {
      "anyOf": [
        {
          "$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Fivetran Stateful Ingestion Config."
    },
    "fivetran_log_config": {
      "anyOf": [
        {
          "$ref": "#/$defs/FivetranLogConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Fivetran Platform Connector log destination configuration. Required for `log_database` mode (the inferred default whenever this block is present). Optional in `rest_api` mode \u2014 when supplied alongside `api_config`, the REST reader uses the DB log only for per-run sync history (which the REST API doesn't expose)."
    },
    "connector_patterns": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Filtering regex patterns for connector names."
    },
    "destination_patterns": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for destination ids to filter in ingestion. Fivetran destination IDs are usually two word identifiers e.g. canyon_tolerable, and are not the same as the destination database name. They're visible in the Fivetran UI under Destinations -> Overview -> Destination Group ID."
    },
    "include_column_lineage": {
      "default": true,
      "description": "Populates table->table column lineage.",
      "title": "Include Column Lineage",
      "type": "boolean"
    },
    "log_source": {
      "anyOf": [
        {
          "enum": [
            "log_database",
            "rest_api"
          ],
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Where to read the Fivetran log from. Leave unset to let the connector infer this from which credential blocks you provide:\n  - Only `fivetran_log_config` \u2192 `log_database`.\n  - Only `api_config` \u2192 `rest_api`.\n  - Both \u2192 `log_database` (DB-primary; REST still owns destination routing and Google Sheets details).\nSet this explicitly to override the default routing \u2014 e.g. `rest_api` with a `fivetran_log_config` block also present runs REST-primary with the DB log only providing per-run sync history.",
      "title": "Log Source"
    },
    "sources_to_platform_instance": {
      "additionalProperties": {
        "$ref": "#/$defs/PlatformDetail"
      },
      "default": {},
      "description": "A mapping from connector id to its platform/instance/env/database details.",
      "title": "Sources To Platform Instance",
      "type": "object"
    },
    "destination_to_platform_instance": {
      "additionalProperties": {
        "$ref": "#/$defs/PlatformDetail"
      },
      "default": {},
      "description": "A mapping of destination id to its platform/instance/env details.",
      "title": "Destination To Platform Instance",
      "type": "object"
    },
    "api_config": {
      "anyOf": [
        {
          "$ref": "#/$defs/FivetranAPIConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Fivetran REST API configuration, used to provide wider support for connections."
    },
    "history_sync_lookback_period": {
      "default": 7,
      "description": "The number of days to look back when extracting connectors' sync history.",
      "title": "History Sync Lookback Period",
      "type": "integer"
    },
    "max_jobs_per_connector": {
      "default": 500,
      "description": "Maximum number of sync jobs to retrieve per connector.",
      "exclusiveMinimum": 0,
      "title": "Max Jobs Per Connector",
      "type": "integer"
    },
    "max_table_lineage_per_connector": {
      "default": 120,
      "description": "Maximum number of table lineage entries to retrieve per connector.",
      "exclusiveMinimum": 0,
      "title": "Max Table Lineage Per Connector",
      "type": "integer"
    },
    "max_column_lineage_per_connector": {
      "default": 1000,
      "description": "Maximum number of column lineage entries to retrieve per connector.",
      "exclusiveMinimum": 0,
      "title": "Max Column Lineage Per Connector",
      "type": "integer"
    },
    "rest_api_max_workers": {
      "default": 4,
      "description": "Number of worker threads used to fetch per-connector data (schemas + sync history) in parallel when `log_source: rest_api`. Values >1 issue concurrent HTTP calls to the Fivetran REST API and meaningfully speed up ingestion for accounts with hundreds of connectors. Set to 1 for fully sequential behaviour. Lower this (not raise it) if you hit Fivetran rate limits. Ignored in `log_database` mode.",
      "maximum": 32,
      "minimum": 1,
      "title": "Rest Api Max Workers",
      "type": "integer"
    },
    "rest_api_per_connector_timeout_sec": {
      "default": 300,
      "description": "Hard wall-clock timeout (seconds) for fetching a single connector's schema + sync history when `log_source: rest_api`. If exceeded, that connector is emitted without lineage / run history and a warning is recorded \u2014 the rest of the ingest continues. Guards against a single hung HTTP call stalling the whole run. Healthy connectors finish in seconds; bump only if you have very large connectors that legitimately need more.",
      "exclusiveMinimum": 0,
      "title": "Rest Api Per Connector Timeout Sec",
      "type": "integer"
    }
  },
  "title": "FivetranSourceConfig",
  "type": "object"
}

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

Database and Schema Name Handling

The Fivetran source uses quoted identifiers for database and schema names to properly handle special characters and case-sensitive names. This follows Snowflake's quoted identifier convention, which is then transpiled to the target database dialect (Snowflake, BigQuery, or Databricks).

Important Notes:

Database names are automatically wrapped in double quotes (e.g., use database "my-database")
Schema names are automatically wrapped in double quotes (e.g., "my-schema".table_name)
This ensures proper handling of database and schema names containing:
- Hyphens (e.g., my-database)
- Spaces (e.g., my database)
- Special characters (e.g., my.database)
- Case-sensitive names (e.g., MyDatabase)

Migration Impact:

If you have database or schema names with special characters, they will now be properly quoted in SQL queries
This change ensures consistent behavior across all supported destination platforms
No configuration changes are required - the quoting is handled automatically

Case Sensitivity Considerations:

Important: In Snowflake, unquoted identifiers are automatically converted to uppercase when stored and resolved (e.g., mydatabase becomes MYDATABASE), while double-quoted identifiers preserve the exact case as entered (e.g., "mydatabase" stays as mydatabase). See Snowflake's identifier documentation for details.
Backward Compatibility: The system automatically handles backward compatibility for valid unquoted identifiers (identifiers containing only letters, numbers, and underscores). These identifiers are automatically uppercased before quoting to match Snowflake's behavior for unquoted identifiers. This means:
- If your database/schema name is a valid unquoted identifier (e.g., fivetran_logs, MY_SCHEMA), it will be automatically uppercased to match existing Snowflake objects created without quotes
- No configuration changes are required for standard identifiers (letters, numbers, underscores only)
Recommended: For best practices and to ensure consistency, maintain the exact case of your database and schema names in your configuration to match what's stored in Snowflake

Snowflake destination Configuration Guide

If your fivetran platform connector destination is snowflake, you need to provide user details and its role with correct privileges in order to fetch metadata.
Snowflake system admin can follow this guide to create a fivetran_datahub role, assign it the required privileges, and assign it to a user by executing the following Snowflake commands from a user with the ACCOUNTADMIN role or MANAGE GRANTS privilege.

create or replace role fivetran_datahub;

// Grant access to a warehouse to run queries to view metadata
grant operate, usage on warehouse "<your-warehouse>" to role fivetran_datahub;

// Grant access to view database and schema in which your log and metadata tables exist
// Note: Database and schema names are automatically quoted, so use quoted identifiers if your names contain special characters
grant usage on DATABASE "<fivetran-log-database>" to role fivetran_datahub;
grant usage on SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;

// Grant access to execute select query on schema in which your log and metadata tables exist
grant select on all tables in SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;

// Grant the fivetran_datahub to the snowflake user.
grant role fivetran_datahub to user snowflake_user;

Bigquery destination Configuration Guide

If your fivetran platform connector destination is bigquery, you need to setup a ServiceAccount as per BigQuery docs and select BigQuery Data Viewer and BigQuery Job User IAM roles.
Create and Download a service account JSON keyfile and provide bigquery connection credential in bigquery destination config.

Databricks destination Configuration Guide

Get your Databricks instance's workspace url
Create a Databricks Service Principal
1. You can skip this step and use your own account to get things running quickly, but we strongly recommend creating a dedicated service principal for production use.
Generate a Databricks Personal Access token following the following guides:
1. Service Principals
2. Personal Access Tokens
Provision your service account, to ingest your workspace's metadata and lineage, your service principal must have all of the following:
1. One of: metastore admin role, ownership of, or USE CATALOG privilege on any catalogs you want to ingest
2. One of: metastore admin role, ownership of, or USE SCHEMA privilege on any schemas you want to ingest
3. Ownership of or SELECT privilege on any tables and views you want to ingest
4. Ownership documentation
5. Privileges documentation
Check the starter recipe below and replace workspace_url and token with your information from the previous steps.

Working with Platform Instances

If you have multiple instances of source/destination systems that are referred in your fivetran setup, you'd need to configure platform instance for these systems in fivetran recipe to generate correct lineage edges. Refer the document Working with Platform Instances to understand more about this.

While configuring the platform instance for source system you need to provide connector id as key and for destination system provide destination id as key. When creating the connection details in the fivetran UI make a note of the destination Group ID of the service account, as that will need to be used in the destination_to_platform_instance configuration. I.e:

In this case the configuration would be something like:

destination_to_platform_instance:
  greyish_positive: <--- this comes from bigquery destination - see screenshot
    database: <big query project ID>
    env: PROD

Example - Multiple Postgres Source Connectors each reading from different postgres instance

# Map of connector source to platform instance
sources_to_platform_instance:
  postgres_connector_id1:
    platform_instance: cloud_postgres_instance
    env: PROD

  postgres_connector_id2:
    platform_instance: local_postgres_instance
    env: DEV

Example - Multiple Snowflake Destinations each writing to different snowflake instance

# Map of destination to platform instance
destination_to_platform_instance:
  snowflake_destination_id1:
    platform_instance: prod_snowflake_instance
    env: PROD

  snowflake_destination_id2:
    platform_instance: dev_snowflake_instance
    env: PROD

Limitations

Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.

Supported Destinations

Works only for:

Snowflake destination
Bigquery destination
Databricks destination

Ingestion Limits

To prevent excessive data ingestion, the following configurable limits apply per connector. They apply equally in log_database and rest_api modes:

Sync History: max_jobs_per_connector (default: 500)
Table Lineage: max_table_lineage_per_connector (default: 120)
Column Lineage: max_column_lineage_per_connector (default: 1000)

Set them at the top of the source config:

source:
  type: fivetran
  config:
    max_jobs_per_connector: 1000 # Increase sync history limit
    max_table_lineage_per_connector: 500 # Increase table lineage limit
    max_column_lineage_per_connector: 5000 # Increase column lineage limit
    fivetran_log_config:
      # ... destination config ...

For backward compatibility, the same fields are still accepted under fivetran_log_config (with a deprecation warning); top-level placement wins on conflict. When these limits are exceeded, only the most recent entries are ingested.

Troubleshooting

If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.

Code Coordinates

Class Name: datahub.ingestion.source.fivetran.fivetran.FivetranSource
Browse on GitHub

Questions?

If you've got any questions on configuring ingestion for Fivetran, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.

Fivetran

Overview​

Concept Mapping​

Module fivetran​

Important Capabilities​

Overview​

Integration Details​

Configuration Notes​

Prerequisites​

Fivetran Managed Data Lake Service​

Example recipe​

Managed Data Lake routing (Iceberg / Glue / S3 / GCS / ADLS)​

Overriding the URN platform per destination​

Hybrid deployments and destination discovery​

Choosing between log_database and rest_api modes​

Capability matrix​

Credential coverage — what's available per config combination​

Which log_source value to pick​

Hybrid mode (REST API + DB log)​

Performance and rate limits (REST mode)​

Fivetran REST API Configuration​

Setup​

Google Sheets Connector Support​

Requirements​

What Gets Created​

Limitations​

Example Configuration​

Install the Plugin​

Starter Recipe​

Config Details​

Capabilities​

Database and Schema Name Handling​

Snowflake destination Configuration Guide​

Bigquery destination Configuration Guide​

Databricks destination Configuration Guide​

Working with Platform Instances​

Example - Multiple Postgres Source Connectors each reading from different postgres instance​

Example - Multiple Snowflake Destinations each writing to different snowflake instance​

Limitations​

Supported Destinations​

Ingestion Limits​

Troubleshooting​

Code Coordinates​

Overview

Concept Mapping

Module `fivetran`

Important Capabilities

Overview

Integration Details

Configuration Notes

Prerequisites

Fivetran Managed Data Lake Service

Example recipe

Managed Data Lake routing (Iceberg / Glue / S3 / GCS / ADLS)

Overriding the URN platform per destination

Hybrid deployments and destination discovery

Choosing between `log_database` and `rest_api` modes

Capability matrix

Credential coverage — what's available per config combination

Which `log_source` value to pick

Hybrid mode (REST API + DB log)

Performance and rate limits (REST mode)

Fivetran REST API Configuration

Setup

Google Sheets Connector Support

Requirements

What Gets Created

Limitations

Example Configuration

Install the Plugin

Starter Recipe

Config Details

Capabilities

Database and Schema Name Handling

Snowflake destination Configuration Guide

Bigquery destination Configuration Guide

Databricks destination Configuration Guide

Working with Platform Instances

Example - Multiple Postgres Source Connectors each reading from different postgres instance

Example - Multiple Snowflake Destinations each writing to different snowflake instance

Limitations

Supported Destinations

Ingestion Limits

Troubleshooting

Code Coordinates