Okta
Overview
Okta is an identity and access management platform. Learn more in the official Okta documentation.
The DataHub integration for Okta covers identity entities such as users, groups, and memberships. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.
Concept Mapping
The mapping below provides a platform-level view. Module-specific mappings and nuances are documented in each module section.
This plugin extracts the following:
- Users
- Groups
- Group Membership
Modules on this platform: okta.
Module okta
Important Capabilities
| Capability | Status | Notes |
|---|---|---|
| Descriptions | ✅ | Optionally enabled via configuration. |
| Detect Deleted Entities | ✅ | Enabled by default via stateful ingestion. |
Overview
The okta module ingests metadata from Okta into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.
Note that any users ingested from this connector will not be able to log into DataHub unless you have Okta OIDC SSO enabled. You can, however, have these users ingested into DataHub before they log in for the first time if you would like to take actions like adding them to a group or assigning them a role.
For instructions on how to do configure Okta OIDC SSO, please read the documentation here.
Prerequisites
Before running ingestion, ensure network connectivity to the source, valid authentication credentials, and read permissions for metadata APIs required by this module.
As a prerequisite, you should create a DataHub Application within the Okta Developer Console with full permissions to read your organization's Users and Groups.
Compatibility
Validated against Okta API Versions:
2021.07.2
Validated against load:
- User Count:
1000 - Group Count:
100 - Group Membership Edges:
1000(1 per User) - Run Time (Wall Clock):
2min 7sec
Install the Plugin
pip install 'acryl-datahub[okta]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: okta
config:
# Coordinates
okta_domain: "dev-35531955.okta.com"
# Credentials
okta_api_token: "11be4R_M2MzDqXawbTHfKGpKee0kuEOfX1RCQSRx99"
sink:
# sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description |
|---|---|
okta_api_token ✅ string(password) | An API token generated for the DataHub application inside your Okta Developer Console. e.g. 00be4R_M2MzDqXawbWgfKGpKee0kuEOfX1RCQSRx00 |
okta_domain ✅ string | The location of your Okta Domain, without a protocol. Can be found in Okta Developer console. e.g. dev-33231928.okta.com |
delay_seconds One of number, integer | Number of seconds to wait between calls to Okta's REST APIs. (Okta rate limits). Defaults to 10ms. Default: 0.01 |
include_deprovisioned_users boolean | Whether to ingest users in the DEPROVISIONED state from Okta. Default: False |
include_suspended_users boolean | Whether to ingest users in the SUSPENDED state from Okta. Default: False |
ingest_group_membership boolean | Whether group membership should be ingested into DataHub. ingest_groups must be True if this is True. Default: True |
ingest_groups boolean | Whether groups should be ingested into DataHub. Default: True |
ingest_groups_users boolean | Only ingest users belonging to the selected groups. This option is only useful when ingest_users is set to False and ingest_group_membership to True. Default: True |
ingest_users boolean | Whether users should be ingested into DataHub. Default: True |
mask_group_id boolean | Default: True |
mask_user_id boolean | Default: True |
okta_groups_filter One of string, null | Okta filter expression (not regex) for ingesting groups. Only one of okta_groups_filter and okta_groups_search can be set. See (https://developer.okta.com/docs/reference/api/groups/#filters) for more info. Default: None |
okta_groups_search One of string, null | Okta search expression (not regex) for ingesting groups. Only one of okta_groups_filter and okta_groups_search can be set. See (https://developer.okta.com/docs/reference/api/groups/#list-groups-with-search) for more info. Default: None |
okta_profile_to_group_name_attr string | Which Okta Group Profile attribute to use as input to DataHub group name mapping. Default: name |
okta_profile_to_group_name_regex string | A regex used to parse the DataHub group name from the attribute specified in okta_profile_to_group_name_attr. Default: (.*) |
okta_profile_to_username_attr string | Which Okta User Profile attribute to use as input to DataHub username mapping. Common values used are - login, email. Default: email |
okta_profile_to_username_regex string | A regex used to parse the DataHub username from the attribute specified in okta_profile_to_username_attr. Default: (.*) |
okta_users_filter One of string, null | Okta filter expression (not regex) for ingesting users. Only one of okta_users_filter and okta_users_search can be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-a-filter) for more info. Default: None |
okta_users_search One of string, null | Okta search expression (not regex) for ingesting users. Only one of okta_users_filter and okta_users_search can be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-search) for more info. Default: None |
page_size integer | The number of entities requested from Okta's REST APIs in one request. Default: 100 |
skip_users_without_a_group boolean | Whether to only ingest users that are members of groups. If this is set to False, all users will be ingested regardless of group membership. Default: False |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Okta Stateful Ingestion Config. Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"properties": {
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Okta Stateful Ingestion Config."
},
"okta_domain": {
"description": "The location of your Okta Domain, without a protocol. Can be found in Okta Developer console. e.g. dev-33231928.okta.com",
"title": "Okta Domain",
"type": "string"
},
"okta_api_token": {
"description": "An API token generated for the DataHub application inside your Okta Developer Console. e.g. 00be4R_M2MzDqXawbWgfKGpKee0kuEOfX1RCQSRx00",
"format": "password",
"title": "Okta Api Token",
"type": "string",
"writeOnly": true
},
"ingest_users": {
"default": true,
"description": "Whether users should be ingested into DataHub.",
"title": "Ingest Users",
"type": "boolean"
},
"ingest_groups": {
"default": true,
"description": "Whether groups should be ingested into DataHub.",
"title": "Ingest Groups",
"type": "boolean"
},
"ingest_group_membership": {
"default": true,
"description": "Whether group membership should be ingested into DataHub. ingest_groups must be True if this is True.",
"title": "Ingest Group Membership",
"type": "boolean"
},
"ingest_groups_users": {
"default": true,
"description": "Only ingest users belonging to the selected groups. This option is only useful when `ingest_users` is set to False and `ingest_group_membership` to True.",
"title": "Ingest Groups Users",
"type": "boolean"
},
"okta_profile_to_username_attr": {
"default": "email",
"description": "Which Okta User Profile attribute to use as input to DataHub username mapping. Common values used are - login, email.",
"title": "Okta Profile To Username Attr",
"type": "string"
},
"okta_profile_to_username_regex": {
"default": "(.*)",
"description": "A regex used to parse the DataHub username from the attribute specified in `okta_profile_to_username_attr`.",
"title": "Okta Profile To Username Regex",
"type": "string"
},
"okta_profile_to_group_name_attr": {
"default": "name",
"description": "Which Okta Group Profile attribute to use as input to DataHub group name mapping.",
"title": "Okta Profile To Group Name Attr",
"type": "string"
},
"okta_profile_to_group_name_regex": {
"default": "(.*)",
"description": "A regex used to parse the DataHub group name from the attribute specified in `okta_profile_to_group_name_attr`.",
"title": "Okta Profile To Group Name Regex",
"type": "string"
},
"include_deprovisioned_users": {
"default": false,
"description": "Whether to ingest users in the DEPROVISIONED state from Okta.",
"title": "Include Deprovisioned Users",
"type": "boolean"
},
"include_suspended_users": {
"default": false,
"description": "Whether to ingest users in the SUSPENDED state from Okta.",
"title": "Include Suspended Users",
"type": "boolean"
},
"page_size": {
"default": 100,
"description": "The number of entities requested from Okta's REST APIs in one request.",
"title": "Page Size",
"type": "integer"
},
"delay_seconds": {
"anyOf": [
{
"type": "number"
},
{
"type": "integer"
}
],
"default": 0.01,
"description": "Number of seconds to wait between calls to Okta's REST APIs. (Okta rate limits). Defaults to 10ms.",
"title": "Delay Seconds"
},
"okta_users_filter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Okta filter expression (not regex) for ingesting users. Only one of `okta_users_filter` and `okta_users_search` can be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-a-filter) for more info.",
"title": "Okta Users Filter"
},
"okta_users_search": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Okta search expression (not regex) for ingesting users. Only one of `okta_users_filter` and `okta_users_search` can be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-search) for more info.",
"title": "Okta Users Search"
},
"okta_groups_filter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Okta filter expression (not regex) for ingesting groups. Only one of `okta_groups_filter` and `okta_groups_search` can be set. See (https://developer.okta.com/docs/reference/api/groups/#filters) for more info.",
"title": "Okta Groups Filter"
},
"okta_groups_search": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Okta search expression (not regex) for ingesting groups. Only one of `okta_groups_filter` and `okta_groups_search` can be set. See (https://developer.okta.com/docs/reference/api/groups/#list-groups-with-search) for more info.",
"title": "Okta Groups Search"
},
"skip_users_without_a_group": {
"default": false,
"description": "Whether to only ingest users that are members of groups. If this is set to False, all users will be ingested regardless of group membership.",
"title": "Skip Users Without A Group",
"type": "boolean"
},
"mask_group_id": {
"default": true,
"title": "Mask Group Id",
"type": "boolean"
},
"mask_user_id": {
"default": true,
"title": "Mask User Id",
"type": "boolean"
}
},
"required": [
"okta_domain",
"okta_api_token"
],
"title": "OktaConfig",
"type": "object"
}
Capabilities
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
Extracting DataHub Users
User entities are extracted from Okta users APIs and mapped to DataHub CorpUser entities.
Usernames
Usernames serve as unique identifiers for users on DataHub. This connector extracts usernames using the "login" field of an Okta User Profile. By default, the 'login' attribute, which contains an email, is parsed to extract the text before the "@" and map that to the DataHub username.
If this is not how you wish to map to DataHub usernames, you can provide a custom mapping using the configurations options detailed below. Namely, okta_profile_to_username_attr and okta_profile_to_username_regex. e.g. if you want to map emails to urns then you may use the following configuration:
okta_profile_to_username_attr: "email"
okta_profile_to_username_regex: ".*"
Profiles
This connector also extracts basic user profile information from Okta. The following fields of the Okta User Profile are extracted and mapped to the DataHub CorpUserInfo aspect:
- display name
- first name
- last name
- title
- department
- country code
Extracting DataHub Groups
Group entities are extracted from Okta groups APIs and mapped to DataHub CorpGroup entities.
Group Names
Group names serve as unique identifiers for groups on DataHub. This connector extracts group names using the "name" attribute of an Okta Group Profile. By default, a URL-encoded version of the full group name is used as the unique identifier (CorpGroupKey) and the raw "name" attribute is mapped as the display name that will appear in DataHub's UI.
If this is not how you wish to map to DataHub group names, you can provide a custom mapping using the configurations options detailed below. Namely, okta_profile_to_group_name_attr and okta_profile_to_group_name_regex.
Profiles
This connector also extracts basic group information from Okta. The following fields of the Okta Group Profile are extracted and mapped to the DataHub CorpGroupInfo aspect:
- name
- description
Extracting Group Membership
User-to-group membership edges are extracted and emitted as DataHub group membership relationships.
This connector additional extracts the edges between Users and Groups that are stored in Okta. It maps them to the GroupMembership aspect associated with DataHub users (CorpUsers).
Filtering and Searching
Use connector filter/search configuration to scope user and group extraction to relevant identities and reduce ingestion load.
You can also choose to ingest a subset of users or groups to Datahub by adding flags for filtering or searching. For users, set either the okta_users_filter or okta_users_search flag (only one can be set at a time). For groups, set either the okta_groups_filter or okta_groups_search flag. Note that these are not regular expressions.
Limitations
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
Troubleshooting
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.
Code Coordinates
- Class Name:
datahub.ingestion.source.identity.okta.OktaSource - Browse on GitHub
If you've got any questions on configuring ingestion for Okta, feel free to ping us on our Slack.
This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.
Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.