Skip to main content
Version: Next

Hex

This connector ingests Hex assets into DataHub.

Concept Mapping

Hex ConceptDataHub ConceptNotes
"hex"Data Platform
WorkspaceContainer
ProjectDashboardSubtype Project
ComponentDashboardSubtype Component
CollectionTag

Other Hex concepts are not mapped to DataHub entities yet.

Limitations

Currently, the Hex API has some limitations that affect the completeness of the extracted metadata:

  1. Projects and Components Relationship: The API does not support fetching the many-to-many relationship between Projects and their Components.

  2. Metadata Access: There is no direct method to retrieve metadata for Collections, Status, or Categories. This information is only available indirectly through references within Projects and Components.

Please keep these limitations in mind when working with the Hex connector.

For the Dataset - Hex Project lineage, the connector relies on the Hex query metadata feature. Therefore, in order to extract lineage information, the required setup must include:

  • A separated warehouse ingestor (eg BigQuery, Snowflake, Redshift, ...) with use_queries_v2 enabled in order to fetch Queries. This will ingest the queries into DataHub as Query entities and the ones triggered by Hex will include the corresponding Hex query metadata.
  • A DataHub server with version >= SaaS 0.3.10 or > OSS 1.0.0 so the Query entities are properly indexed by source (Hex in this case) and so fetched and processed by the Hex ingestor in order to emit the Dataset - Project lineage.

Please note:

  • Lineage is only captured for scheduled executions of the Project.
  • In cases where queries are handled by hextoolkit, Hex query metadata is not injected, which prevents capturing lineage. Testing

Important Capabilities

CapabilityStatusNotes
Asset ContainersEnabled by default
DescriptionsSupported by default
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
Extract OwnershipSupported by default
Platform InstanceEnabled by default

Prerequisites

Workspace name

Workspace name is required to fetch the data from Hex. You can find the workspace name in the URL of your Hex home page.

https://app.hex.tech/<workspace_name>"

Eg: In https://app.hex.tech/acryl-partnership, acryl-partnership is the workspace name.

Authentication

To authenticate with Hex, you will need to provide your Hex API Bearer token. You can obtain your API key by following the instructions on the Hex documentation.

Either PAT (Personal Access Token) or Workspace Token can be used as API Bearer token:

  • (Recommended) If Workspace Token, a read-only token would be enough for ingestion.
  • If PAT, ingestion will be done with the user's permissions.

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: hex
config:
workspace_name: # Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>
token: # Your PAT or Workspace token

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
token 
string(password)
Hex API token; either PAT or Workflow token - https://learn.hex.tech/docs/api/api-overview#authentication
workspace_name 
string
Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>
base_url
string
Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex.
categories_as_tags
boolean
Emit Hex Category as tags
Default: True
collections_as_tags
boolean
Emit Hex Collections as tags
Default: True
datahub_page_size
integer
Number of items to fetch per DataHub API call.
Default: 100
include_components
boolean
Default: True
include_lineage
boolean
Include Hex lineage, being fetched from DataHub. See "Limitations" section in the docs for more details about the limitations of this feature.
Default: True
lineage_end_time
string(date-time)
Latest date of lineage to consider. Default: Current time in UTC. You can specify absolute time like '2023-01-01' or relative time like '-1 day' or '-1d'.
lineage_start_time
string(date-time)
Earliest date of lineage to consider. Default: 1 day before lineage end time. You can specify absolute time like '2023-01-01' or relative time like '-7 days' or '-7d'.
page_size
integer
Number of items to fetch per Hex API call.
Default: 100
patch_metadata
boolean
Emit metadata as patch events
Default: False
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.
set_ownership_from_email
boolean
Set ownership identity from owner/creator email
Default: True
status_as_tag
boolean
Emit Hex Status as tags
Default: True
env
string
The environment that all assets produced by this connector belong to
Default: PROD
component_title_pattern
AllowDenyPattern
Regex pattern for component titles to filter in ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
component_title_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
component_title_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
component_title_pattern.allow.string
string
component_title_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
component_title_pattern.deny.string
string
project_title_pattern
AllowDenyPattern
Regex pattern for project titles to filter in ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
project_title_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
project_title_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
project_title_pattern.allow.string
string
project_title_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
project_title_pattern.deny.string
string
stateful_ingestion
StatefulStaleMetadataRemovalConfig
Configuration for stateful ingestion and stale metadata removal.
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.fail_safe_threshold
number
Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.
Default: 75.0
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.hex.hex.HexSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Hex, feel free to ping us on our Slack.