Salesforce
Overview
Salesforce is a DataHub utility or metadata-focused integration. Learn more in the official Salesforce documentation.
The DataHub integration for Salesforce covers metadata entities and operational objects relevant to this connector. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.
Concept Mapping
| Source Concept | DataHub Concept | Notes |
|---|---|---|
Salesforce | Data Platform | |
| Standard Object | Dataset | subtype "Standard Object" |
| Custom Object | Dataset | subtype "Custom Object" |
Module salesforce
Important Capabilities
| Capability | Status | Notes |
|---|---|---|
| Data Profiling | ✅ | Only table level profiling is supported via profiling.enabled config field. Supported for types - Table. |
| Detect Deleted Entities | ✅ | Enabled by default via stateful ingestion. |
| Domains | ✅ | Supported via the domain config field. |
| Extract Tags | ✅ | Enabled by default. |
| Platform Instance | ✅ | Can be equivalent to Salesforce organization. |
| Schema Metadata | ✅ | Enabled by default. |
| Table-Level Lineage | ✅ | Extract table-level lineage for Salesforce objects. Supported for types - Custom Object, Object. |
Overview
The salesforce module ingests metadata from Salesforce into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance. Python library simple-salesforce is used for authenticating and calling Salesforce REST API to retrieve details from Salesforce instance.
REST API Resources used in this integration
- Versions
- Tooling API Query on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- Record Count
Prerequisites
Before running ingestion, ensure network connectivity to the source, valid authentication credentials, and read permissions for metadata APIs required by this module.
Authentication options
To ingest metadata from Salesforce, you need one of:
- Salesforce username, password, security token
- Salesforce username, consumer key and private key for JSON web token access
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)
The account used to access Salesforce requires the following permissions for this integration to work:
- View Setup and Configuration
- View All Data
Install the Plugin
pip install 'acryl-datahub[salesforce]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"
object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description |
|---|---|
access_token One of string(password), null | Access token for instance url Default: None |
api_version One of string, null | If specified, overrides default version used by the Salesforce package. Example value: '59.0' Default: None |
auth Enum | One of: "USERNAME_PASSWORD", "DIRECT_ACCESS_TOKEN", "JSON_WEB_TOKEN" |
consumer_key One of string(password), null | Consumer key for Salesforce JSON web token access Default: None |
ingest_tags boolean | Ingest Tags from source. This will override Tags entered from UI Default: False |
instance_url One of string, null | Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com Default: None |
is_sandbox boolean | Connect to Sandbox instance of your Salesforce Default: False |
password One of string(password), null | Password for Salesforce user Default: None |
platform string | Default: salesforce |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
private_key One of string(password), null | Private key as a string for Salesforce JSON web token access Default: None |
security_token One of string(password), null | Security token for Salesforce username Default: None |
use_referenced_entities_as_upstreams boolean | (Experimental) If enabled, referenced entities will be treated as upstream entities. Default: False |
username One of string, null | Salesforce username Default: None |
env string | The environment that all assets produced by this connector belong to Default: PROD |
domain map(str,AllowDenyPattern) | A class to store allow deny regexes |
domain. key.allowarray | List of regex patterns to include in ingestion Default: ['.*'] |
domain. key.allow.stringstring | |
domain. key.ignoreCaseOne of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
domain. key.denyarray | List of regex patterns to exclude from ingestion. Default: [] |
domain. key.deny.stringstring | |
object_pattern AllowDenyPattern | A class to store allow deny regexes |
object_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
profile_pattern AllowDenyPattern | A class to store allow deny regexes |
profile_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
profiling SalesforceProfilingConfig | |
profiling.enabled boolean | Whether profiling should be done. Supports only table-level profiling at this stage Default: False |
profiling.operation_config OperationConfig | |
profiling.operation_config.lower_freq_profile_enabled boolean | Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling. Default: False |
profiling.operation_config.profile_date_of_month One of integer, null | Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect. Default: None |
profiling.operation_config.profile_day_of_week One of integer, null | Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect. Default: None |
stateful_ingestion One of StatefulIngestionConfig, null | Stateful Ingestion Config Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"OperationConfig": {
"additionalProperties": false,
"properties": {
"lower_freq_profile_enabled": {
"default": false,
"description": "Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.",
"title": "Lower Freq Profile Enabled",
"type": "boolean"
},
"profile_day_of_week": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.",
"title": "Profile Day Of Week"
},
"profile_date_of_month": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.",
"title": "Profile Date Of Month"
}
},
"title": "OperationConfig",
"type": "object"
},
"SalesforceAuthType": {
"enum": [
"USERNAME_PASSWORD",
"DIRECT_ACCESS_TOKEN",
"JSON_WEB_TOKEN"
],
"title": "SalesforceAuthType",
"type": "string"
},
"SalesforceProfilingConfig": {
"additionalProperties": false,
"properties": {
"enabled": {
"default": false,
"description": "Whether profiling should be done. Supports only table-level profiling at this stage",
"title": "Enabled",
"type": "boolean"
},
"operation_config": {
"$ref": "#/$defs/OperationConfig",
"description": "Experimental feature. To specify operation configs."
}
},
"title": "SalesforceProfilingConfig",
"type": "object"
},
"StatefulIngestionConfig": {
"additionalProperties": false,
"description": "Basic Stateful Ingestion Specific Configuration for any source.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
}
},
"title": "StatefulIngestionConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulIngestionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Stateful Ingestion Config"
},
"platform": {
"default": "salesforce",
"title": "Platform",
"type": "string"
},
"auth": {
"$ref": "#/$defs/SalesforceAuthType",
"default": "USERNAME_PASSWORD"
},
"username": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Salesforce username",
"title": "Username"
},
"password": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Password for Salesforce user",
"title": "Password"
},
"consumer_key": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Consumer key for Salesforce JSON web token access",
"title": "Consumer Key"
},
"private_key": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Private key as a string for Salesforce JSON web token access",
"title": "Private Key"
},
"security_token": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Security token for Salesforce username",
"title": "Security Token"
},
"instance_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com",
"title": "Instance Url"
},
"is_sandbox": {
"default": false,
"description": "Connect to Sandbox instance of your Salesforce",
"title": "Is Sandbox",
"type": "boolean"
},
"access_token": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Access token for instance url",
"title": "Access Token"
},
"ingest_tags": {
"default": false,
"description": "Ingest Tags from source. This will override Tags entered from UI",
"title": "Ingest Tags",
"type": "boolean"
},
"object_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for Salesforce objects to filter in ingestion."
},
"domain": {
"additionalProperties": {
"$ref": "#/$defs/AllowDenyPattern"
},
"default": {},
"description": "Regex patterns for tables/schemas to describe domain_key domain key (domain_key can be any string like \"sales\".) There can be multiple domain keys specified.",
"title": "Domain",
"type": "object"
},
"api_version": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "If specified, overrides default version used by the Salesforce package. Example value: '59.0'",
"title": "Api Version"
},
"profiling": {
"$ref": "#/$defs/SalesforceProfilingConfig",
"default": {
"enabled": false,
"operation_config": {
"lower_freq_profile_enabled": false,
"profile_date_of_month": null,
"profile_day_of_week": null
}
}
},
"profile_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for profiles to filter in ingestion, allowed by the `object_pattern`."
},
"use_referenced_entities_as_upstreams": {
"default": false,
"description": "(Experimental) If enabled, referenced entities will be treated as upstream entities.",
"title": "Use Referenced Entities As Upstreams",
"type": "boolean"
}
},
"title": "SalesforceConfig",
"type": "object"
}
Capabilities
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
Limitations
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by Salesforce RecordCount REST API.
- This integration does not support ingesting Salesforce External Objects
Troubleshooting
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.
Code Coordinates
- Class Name:
datahub.ingestion.source.salesforce.SalesforceSource - Browse on GitHub
If you've got any questions on configuring ingestion for Salesforce, feel free to ping us on our Slack.
This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.
Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.