Skip to main content

QuickSight

Overview

Amazon QuickSight is AWS's serverless, cloud-scale business intelligence service for building interactive dashboards and paginated reports. Learn more in the official QuickSight documentation.

The DataHub integration for QuickSight covers BI entities such as dashboards, analyses, and datasets, along with the folder hierarchy that organizes them. It also stitches cross-platform table-level and column-level lineage from QuickSight datasets back to their upstream warehouse/database tables (Athena, Redshift, Snowflake, S3, and more), and optionally captures ownership, AWS resource tags, users/groups, and stateful deletion detection.

Concept Mapping

QuickSightDataHubNotes
FolderContainerSubType "Folder"; nests via folder membership (Enterprise edition)
NamespaceContainerSubType "Namespace"; opt-in via add_namespace_container
DatasetDatasetSubType "Dataset"; schema from OutputColumns, upstream lineage
AnalysisDashboardSubType "Analysis"
DashboardDashboardSubType "Dashboard"; linked to its source Analysis
VisualChartOne Chart per visual; emitted when extract_dashboard_definitions is enabled
UserUser (a.k.a CorpUser)Optionally extracted via extract_users_and_groups
GroupGroup (a.k.a CorpGroup)Optionally extracted via extract_users_and_groups

QuickSight data sources (the raw warehouse/database connections — Athena, Redshift, Snowflake, S3, etc.) are not modeled as their own entities. As with Tableau/Looker/PowerBI, the connection is used purely to resolve the upstream platform of each Dataset's tables for lineage; the lineage points directly at the warehouse table.

Account separation is handled via platform_instance (the Glue / Redshift / PowerBI convention) rather than an account-level container.

Module quicksight

Incubating

Important Capabilities

CapabilityStatusNotes
Asset ContainersEnabled by default.
Column-level LineageEnabled via include_column_lineage.
DescriptionsEnabled by default.
Detect Deleted EntitiesEnabled by default via stateful ingestion.
Extract OwnershipEnabled via extract_ownership.
Extract TagsEnabled via extract_tags.
Platform InstanceEnabled by default.
Schema MetadataEnabled by default.
Table-Level LineageEnabled via extract_lineage.
Test ConnectionEnabled by default.

Overview

The quicksight module ingests metadata from Amazon QuickSight into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.

This source extracts the following:

  • Folders (and optionally Namespaces) as Containers that organize the browse hierarchy.
  • QuickSight Datasets as Datasets with the Dataset subtype, including schemaMetadata derived from OutputColumns.
  • Analyses and Dashboards as Dashboard entities (subtypes Analysis and Dashboard), with each published Dashboard linked back to the Analysis it was built from.
  • Visuals within a dashboard's sheets as Chart entities (when extract_dashboard_definitions is enabled).
  • Table-level lineage from QuickSight Datasets to their upstream warehouse/database tables, plus column-level lineage for CustomSql datasets parsed via sqlglot.
  • Optionally ownership (from resource permissions), AWS resource tags, and users/groups.

QuickSight is a regional service, so a single ingestion run targets one aws_region. Multi-region deployments run one recipe per region.

Prerequisites

QuickSight enforces three independent permission layers — all three must be satisfied for ingestion to succeed.

Layer 1 — AWS IAM policy

There is no AWS-managed read-only policy for QuickSight, so attach the following custom policy to the ingesting principal:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sts:GetCallerIdentity",
"quicksight:ListDashboards",
"quicksight:ListAnalyses",
"quicksight:ListDataSets",
"quicksight:ListDataSources",
"quicksight:ListFolders",
"quicksight:ListFolderMembers",
"quicksight:ListNamespaces",
"quicksight:ListUsers",
"quicksight:ListGroups",
"quicksight:ListGroupMemberships",
"quicksight:ListTagsForResource",
"quicksight:DescribeDashboard",
"quicksight:DescribeDashboardPermissions",
"quicksight:DescribeAnalysis",
"quicksight:DescribeAnalysisPermissions",
"quicksight:DescribeDataSet",
"quicksight:DescribeDataSetPermissions",
"quicksight:DescribeDataSource",
"quicksight:DescribeFolder",
"quicksight:DescribeFolderPermissions"
],
"Resource": "*"
}
]
}

The API operations DescribeDashboardDefinition and DescribeAnalysisDefinition reuse the quicksight:DescribeDashboard / quicksight:DescribeAnalysis IAM actions — there is no separate *Definition action.

Layer 2 — QuickSight user role

The service user's QuickSight role must be AUTHOR (or AUTHOR_PRO) or higher. READER is not sufficient — it is denied ListNamespaces, ListDataSources, ListAnalyses, and some definition calls. Register or upgrade the user:

aws quicksight update-user \
--aws-account-id <ACCOUNT_ID> --namespace default \
--user-name <IAM_USER_NAME> --email <any-email> --role AUTHOR

AUTHOR is the least-privileged role that grants full read access; ADMIN works but is unnecessarily broad.

Layer 3 — Resource permissions

Each asset has its own "Share" permission list. The service user only sees assets it has been shared with. The recommended setup is to create one shared folder, grant the service user Read on it, and place all ingestable assets inside.

QuickSight's AccessDeniedException messages always blame IAM ("no identity-based policy allows...") even when the real cause is Layer 2 or Layer 3. Check all three layers when you see this error.

Install the Plugin

pip install 'acryl-datahub[quicksight]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: quicksight
config:
# QuickSight is regional — a single run targets one region.
aws_region: us-east-1

# AWS authentication (pick one). See AwsConnectionConfig for all options.
# aws_profile: my-named-profile
# -- or explicit keys --
# aws_access_key_id: "${AWS_ACCESS_KEY_ID}"
# aws_secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
# -- or role assumption --
# aws_role: arn:aws:iam::123456789012:role/datahub-quicksight-ingest

# Optional - auto-detected via sts:GetCallerIdentity when omitted.
# aws_account_id: "123456789012"

# Lineage & enrichment (all default to true except usage / users-groups).
extract_lineage: true
include_column_lineage: true
extract_ownership: true
extract_tags: true
# extract_users_and_groups: true # opt-in; needs ListUsers/ListGroups perms

# Container hierarchy
# add_shared_folders_container: true # synthesize a "Shared folders" root
# add_namespace_container: true # only for multi-namespace Enterprise accounts

# Optional - cross-platform lineage stitching. Key by QuickSight DataSourceId
# (UUID, preferred) or display name. Must mirror the upstream connector's
# casing/env so URNs line up.
# external_data_sources:
# "<data-source-uuid-or-name>":
# env: PROD
# platform_instance: prod-redshift
# convert_urns_to_lowercase: true
# default_database: analytics # SQL-parser fallback for unqualified CustomSql tables
# default_schema: public

# Optional - filters (allow/deny regex). Omit to ingest everything.
# dashboard_pattern:
# allow:
# - ".*Sales.*"

# Optional - automatic stale-entity (soft-delete) removal across runs.
# stateful_ingestion:
# enabled: true

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
add_namespace_container
boolean
When enabled, QuickSight namespaces are emitted as containers and appear as a level in the browse hierarchy. Most accounts have only the single built-in default namespace, so this is off by default (assets sit directly under the platform / platform_instance, or under their folder). Enable it for Enterprise accounts that use multiple namespaces. Mirrors Tableau's add_site_container. Account separation is handled via platform_instance, not a container — matching the Glue / Redshift / PowerBI / Informatica convention.
Default: False
add_shared_folders_container
boolean
When enabled, a synthetic Shared folders container is emitted as the root of the folder hierarchy and every top-level QuickSight folder is nested beneath it — mirroring the Shared folders section in the QuickSight left-nav. (QuickSight's Shared folders is a UI category for SHARED-type folders, not a real folder, so it is only synthesized when this is on.) Off by default since we only ingest shared folders, making the level a constant prefix. Loose assets that belong to no folder are unaffected.
Default: False
aws_access_key_id
One of string, null
AWS access key ID. Can be auto-detected, see the AWS boto3 docs for details.
Default: None
aws_account_id
One of string, null
AWS account ID that owns the QuickSight assets. Auto-detected via sts:GetCallerIdentity when not provided.
Default: None
aws_advanced_config
object
Advanced AWS configuration options. These are passed directly to botocore.config.Config.
aws_endpoint_url
One of string, null
The AWS service endpoint. This is normally constructed automatically, but can be overridden here.
Default: None
aws_profile
One of string, null
The named profile to use from AWS credentials. Falls back to default profile if not specified and no access keys provided. Profiles are configured in ~/.aws/credentials or ~/.aws/config.
Default: None
aws_proxy
One of string, null
A set of proxy configs to use with AWS. See the botocore.config docs for details.
Default: None
aws_region
One of string, null
AWS region code.
Default: None
aws_retry_mode
Enum
One of: "legacy", "standard", "adaptive"
Default: adaptive
aws_retry_num
integer
Number of times to retry failed AWS requests. See the botocore.retry docs for details.
Default: 5
aws_secret_access_key
One of string(password), null
AWS secret access key. Can be auto-detected, see the AWS boto3 docs for details.
Default: None
aws_session_token
One of string(password), null
AWS session token. Can be auto-detected, see the AWS boto3 docs for details.
Default: None
extract_analysis_definitions
boolean
Whether to fetch full analysis definitions (sheets/visuals). These payloads are large.
Default: True
extract_dashboard_definitions
boolean
Whether to fetch full dashboard definitions (sheets/visuals). These payloads are large; disabling this skips Chart entity emission.
Default: True
extract_lineage
boolean
Whether to extract upstream lineage from QuickSight Datasets to their backing warehouse/database tables.
Default: True
extract_ownership
boolean
Whether to extract ownership from QuickSight resource permissions.
Default: True
extract_tags
boolean
Whether to extract AWS resource tags on QuickSight assets.
Default: True
extract_users_and_groups
boolean
Whether to extract QuickSight users and groups (opt-in; often noisy and requires additional permissions).
Default: False
include_column_lineage
boolean
Whether to extract column-level lineage for CustomSql datasets via sqlglot. Requires extract_lineage to be enabled.
Default: True
platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.
Default: None
read_timeout
number
The timeout for reading from the connection (in seconds).
Default: 60
strip_user_ids_from_email
boolean
When extracting ownership, strip the email domain from user identities so the CorpUser URN uses the bare username (e.g. jane@acme.com -> jane). Matches Looker's strip_user_ids_from_email. For IAM/SSO-federated principals the QuickSight identity is the role-session name (typically the email), which is used regardless of this flag.
Default: False
env
string
The environment that all assets produced by this connector belong to
Default: PROD
analysis_pattern
AllowDenyPattern
A class to store allow deny regexes
analysis_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
aws_role
One of string, array, null
AWS roles to assume. If using the string format, the role ARN can be specified directly. If using the object format, the role can be specified in the RoleArn field and additional available arguments are the same as boto3's STS.Client.assume_role.
Default: None
aws_role.union
One of string, AwsAssumeRoleConfig
aws_role.union.RoleArn 
string
ARN of the role to assume.
aws_role.union.ExternalId
One of string, null
External ID to use when assuming the role.
Default: None
dashboard_pattern
AllowDenyPattern
A class to store allow deny regexes
dashboard_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
data_source_pattern
AllowDenyPattern
A class to store allow deny regexes
data_source_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
dataset_pattern
AllowDenyPattern
A class to store allow deny regexes
dataset_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
external_data_sources
map(str,ExternalDataSourceConfig)
Per-data-source overrides used when stitching QuickSight Datasets to
their upstream warehouse/database tables.

Keyed in :class:QuickSightSourceConfig.external_data_sources by the stable
QuickSight DataSourceId (UUID) — preferred because it survives renames in
the QuickSight UI — with the data source display name accepted as a fallback.
UUID matches take precedence over name matches.
external_data_sources.key.env
string
The environment that all assets produced by this connector belong to
Default: PROD
external_data_sources.key.convert_urns_to_lowercase
boolean
Whether to lower-case identifiers when constructing upstream Dataset URNs. Must match the convert_urns_to_lowercase setting used by the corresponding upstream connector recipe (e.g. Snowflake / BigQuery typically preserve case).
Default: False
external_data_sources.key.default_database
One of string, null
Default database/catalog name used as the SQL-parser fallback when a CustomSql definition references unqualified tables.
Default: None
external_data_sources.key.default_schema
One of string, null
Default schema name used as the SQL-parser fallback when a CustomSql definition references unqualified tables.
Default: None
external_data_sources.key.platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.
Default: None
folder_pattern
AllowDenyPattern
A class to store allow deny regexes
folder_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
namespace_pattern
AllowDenyPattern
A class to store allow deny regexes
namespace_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
tag_pattern
AllowDenyPattern
A class to store allow deny regexes
tag_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
stateful_ingestion
One of StatefulStaleMetadataRemovalConfig, null
Stateful ingestion configuration (enables stale entity removal).
Default: None
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.fail_safe_threshold
number
Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.
Default: 75.0
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Capabilities

Cross-platform lineage

QuickSight Datasets are stitched to their upstream warehouse/database tables so lineage spans from a dashboard down to the source table. The connector resolves the upstream platform from the data source's connection parameters (Athena, Redshift, Snowflake, RDS variants, and more). S3-backed datasets are an exception — see the limitation below.

For the upstream Dataset URNs to line up with the platform's own ingested tables, the env, platform_instance, and URN casing must match the upstream connector's recipe. Configure these per data source via external_data_sources, keyed by the QuickSight DataSourceId (UUID, preferred — it survives renames) with the display name accepted as a fallback.

Column-level lineage

CustomSql datasets carry a SQL definition that is parsed with sqlglot to derive column-level lineage. Unqualified table references are resolved using the default_database / default_schema configured for that data source in external_data_sources. Column-level lineage requires both extract_lineage and include_column_lineage to be enabled.

Ownership, tags, users and groups

With extract_ownership enabled, owners are derived from each asset's QuickSight resource permissions. IAM/SSO-federated principals are normalized to the role-session name (typically the user's email) so the resulting CorpUser URN matches DataHub's urn:li:corpuser:<email> convention; set strip_user_ids_from_email to use the bare username instead. extract_tags maps AWS resource tags to DataHub tags (filterable via tag_pattern). extract_users_and_groups (opt-in) emits CorpUser / CorpGroup entities and their memberships.

Stateful ingestion

Enable stateful_ingestion.enabled to automatically soft-delete entities that disappear from QuickSight between runs (stale entity removal).

Limitations

Regional scope

QuickSight is a regional service, so a single ingestion run only sees assets in one aws_region. Run one recipe per region for multi-region deployments.

Folder hierarchy requires Enterprise edition

Folders are a QuickSight Enterprise-edition feature. On Standard-edition accounts no folders exist, so assets are emitted directly under the platform / platform_instance.

Only account-level folders are ingested (not personal "My folders")

The connector ingests every folder returned by the ListFolders API — i.e. both SHARED and RESTRICTED folder types — and these are filterable by name via folder_pattern. Personal "My folders" are private to each user and are not exposed by any QuickSight API, so they cannot be ingested. Assets that live only in a user's "My folders" (and in no shared/restricted folder) are still ingested as entities — they simply attach to the namespace container (if enabled) or the platform root rather than to a folder.

Definition payloads

Chart (visual) entities are only emitted when extract_dashboard_definitions is enabled. Definition payloads are large; disable extract_dashboard_definitions / extract_analysis_definitions to reduce API cost at the expense of visual-level detail.

No upstream lineage for S3-backed datasets

QuickSight only exposes the manifest file location for S3 data sources (DescribeDataSource → S3Parameters.ManifestFileLocation); neither the data source nor the dataset's PhysicalTableMap.S3Source carries the underlying data file/prefix paths. Because DataHub's S3 source keys datasets by the data path/prefix, a URN built from the manifest key (bucket/manifest.json) would never match an S3-ingested dataset. The connector therefore skips upstream lineage for S3-backed datasets rather than emit a dangling edge (the skip count is surfaced in the ingestion report). Relational (Athena/Redshift/Snowflake/…) and CustomSql lineage are unaffected.

Troubleshooting

AccessDeniedException despite a correct IAM policy

QuickSight enforces three independent permission layers (AWS IAM policy, the QuickSight user role, and per-resource share permissions — see Prerequisites). Its AccessDeniedException messages always point at IAM ("no identity-based policy allows...") even when the real cause is the user's role being READER instead of AUTHOR, or the asset simply not being shared with the service user. When you hit this error, verify all three layers, not just the IAM policy.

Throttling / TPS errors

QuickSight applies per-API transactions-per-second limits. The connector uses adaptive retry mode, but very large accounts may still see throttling — re-run the ingestion, optionally narrowing scope with the *_pattern filters.

Unresolved upstream lineage

If dashboard-to-table lineage is missing, the upstream Dataset URN produced by QuickSight likely does not match the URN the upstream connector emits. Confirm the env, platform_instance, and convert_urns_to_lowercase in external_data_sources match the upstream recipe exactly.

Code Coordinates

  • Class Name: datahub.ingestion.source.quicksight.quicksight.QuickSightSource
  • Browse on GitHub
Questions?

If you've got any questions on configuring ingestion for QuickSight, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.