Skip to main content

NiFi

Overview

Apache NiFi is a streaming or integration platform. Learn more in the official Apache NiFi documentation.

The DataHub integration for Apache NiFi covers streaming/integration entities such as topics, connectors, pipelines, or jobs. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.

Concept Mapping

Source ConceptDataHub ConceptNotes
"Nifi"Data Platform
Nifi flowData Flow
Nifi Ingress / Egress ProcessorData Job
Nifi Remote PortData Job
Nifi Port with remote connectionsDataset
Nifi Process GroupContainerSubtype Process Group

Module nifi

Certified

Important Capabilities

CapabilityStatusNotes
Detect Deleted EntitiesEnabled by default via stateful ingestion.
Table-Level LineageSupported. See docs for limitations.

Overview

The nifi module ingests metadata from Nifi into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.

Prerequisites

Before running ingestion, ensure network connectivity to the source, valid authentication credentials, and read permissions for metadata APIs required by this module.

Access Policies

This connector requires following access policies to be set in Nifi for ingestion user.

Global Access Policies
PolicyPrivilegeResourceAction
view the UIAllows users to view the UI/flowR
query provenanceAllows users to submit a Provenance Search and request Event Lineage/provenanceR
Component level Access Policies (required to be set on root process group)
PolicyPrivilegeResourceAction
view the componentAllows users to view component configuration details/<component-type>/<component-UUID>R
view the dataAllows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events/data/<component-type>/<component-UUID>R
view provenanceAllows users to view provenance events generated by this component/provenance-data/<component-type>/<component-UUID>R

Authentication

This connector supports following authentication mechanisms

Single User Authentication (auth: SINGLE_USER)

Connector will pass this username and password as used on Nifi Login Page over /access/token REST endpoint. This mode also works when Kerberos login identity provider is set up for Nifi.

Client Certificates Authentication (auth: CLIENT_CERT)

Connector will use client_cert_file(required) and client_key_file(optional), client_key_password(optional) for mutual TLS authentication.

Kerberos Authentication via SPNEGO (auth: Kerberos)

If nifi has been configured to use Kerberos SPNEGO, connector will pass user’s Kerberos ticket to nifi over /access/kerberos REST endpoint. It is assumed that user's Kerberos ticket is already present on the machine on which ingestion runs. This is usually done by installing krb5-user and then running kinit for user.

sudo apt install krb5-user
kinit user@REALM
Basic Authentication (auth: BASIC_AUTH)

Connector will use HTTPBasicAuth with username and password.

No Authentication (auth: NO_AUTH)

This is useful for testing purposes.

Install the Plugin

pip install 'acryl-datahub[nifi]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"

# Credentials
auth: SINGLE_USER
username: admin
password: password

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
site_url 
string
URL for Nifi, ending with /nifi/. e.g. https://mynifi.domain/nifi/
auth
Enum
One of: "NO_AUTH", "SINGLE_USER", "CLIENT_CERT", "KERBEROS", "BASIC_AUTH"
ca_file
One of boolean, string, null
Path to PEM file containing certs for the root CA(s) for the NiFi.Set to False to disable SSL verification.
Default: None
client_cert_file
One of string, null
Path to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT"
Default: None
client_key_file
One of string, null
Path to PEM file containing the client’s secret key
Default: None
client_key_password
One of string(password), null
The password to decrypt the client_key_file
Default: None
emit_process_group_as_container
boolean
Whether to emit Nifi process groups as container entities.
Default: False
incremental_lineage
boolean
When enabled, emits incremental/patch lineage for Nifi processors. When disabled, re-states lineage on each run.
Default: True
password
One of string(password), null
Nifi password, must be set for auth = "SINGLE_USER"
Default: None
provenance_days
integer
time window to analyze provenance events for external datasets
Default: 7
site_name
string
Site name to identify this site with, useful when using input and output ports receiving remote connections
Default: default
site_url_to_site_name
map(str,string)
username
One of string, null
Nifi username, must be set for auth = "SINGLE_USER"
Default: None
env
string
The environment that all assets produced by this connector belong to
Default: PROD
process_group_pattern
AllowDenyPattern
A class to store allow deny regexes
process_group_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
stateful_ingestion
One of StatefulIngestionConfig, null
Stateful Ingestion Config
Default: None
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

Limitations

Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.

  • Lineage extraction analyzes provenance events. Verify your NiFi provenance retention period and run ingestion frequently enough to capture events before they expire.

  • Limited ingress/egress processors are supported

    • S3: ListS3, FetchS3Object, PutS3Object
    • SFTP: ListSFTP, FetchSFTP, GetSFTP, PutSFTP

Troubleshooting

If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.

Code Coordinates

  • Class Name: datahub.ingestion.source.nifi.NifiSource
  • Browse on GitHub
Questions?

If you've got any questions on configuring ingestion for NiFi, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.