Skip to main content
New: Open Source Analytics Agent
Read the Blog
→
1.4.0
Docs
Integrations
Search
DataHub Integrations
Connect to 140+ data and AI systems
Database
52
BI Tool
22
ETL/ELT
17
Metadata Systems
12
Messaging
9
Data Lake
6
Internal
6
AI+ML
5
Orchestrator
4
Data Catalog
4
Data Quality
4
Identity Provider
3
Collaboration
3
CRM
2
Metadata
1
Testing
Aerospike
Aerospike is a high-performance NoSQL database designed for real-time applications requiring low latency and high throughput at scale.
Airbyte
API
Open-source data integration platform for syncing data from APIs, databases, and files to warehouses and lakes.
✓ Certified
Airflow
Airflow is an open-source data orchestration tool used for scheduling, monitoring, and managing complex data pipelines.
✓ Certified
Aiven for Apache Kafka
Fully managed Apache Kafka service available across major cloud providers with built-in monitoring and security.
AlloyDB for PostgreSQL
API
Fully managed PostgreSQL-compatible database service on Google Cloud designed for demanding enterprise workloads.
Alteryx
API
Self-service data analytics platform for data preparation, blending, and advanced analytics workflows.
Amazon Kinesis
API
AWS real-time data streaming service for collecting, processing, and analyzing data streams at any scale.
✓ Certified
Amazon MSK
Fully managed Apache Kafka service on AWS for building and running streaming data applications.
Amazon QuickSight
API
AWS serverless business intelligence service for creating interactive dashboards and visualizations.
Anaplan
API
Cloud-based enterprise planning platform for financial planning, supply chain, and workforce planning.
Apache Beam
API
Unified programming model for defining and executing both batch and streaming data processing pipelines.
Incubating
Apache Doris
High-performance real-time analytical database based on MPP architecture for large-scale data analytics.
Apache Hudi
Apache Hudi is an open-source data lake framework that provides ACID transactions, efficient upserts, time travel queries, and incremental data processing for large-scale datasets.
Apache Impala
API
Open-source massively parallel processing SQL query engine for data stored in Apache Hadoop.
✓ Certified
AWS Athena
Athena is a serverless interactive query service that enables users to analyze data in Amazon S3 using standard SQL.
✓ Certified
Azure AD
Azure AD is a cloud-based identity and access management tool that provides secure authentication and authorization for users and applications.
Incubating
Azure Blob Storage
Azure Blob Storage is a cloud-based object storage service that allows users to store and manage large amounts of unstructured data.
Azure Cosmos DB
API
Globally distributed, multi-model NoSQL database service on Azure for mission-critical applications.
Incubating
Azure Data Factory
Cloud-based data integration service for creating, scheduling, and orchestrating data pipelines at scale.
✓ Certified
Azure Event Hubs
Cloud-native data streaming service on Azure with native Apache Kafka support for real-time event ingestion.
Azure Kusto
API
Fast and scalable data exploration service optimized for log and telemetry data analytics.
Azure SQL
API
Fully managed relational cloud database service built on SQL Server for modern application development.
Azure Synapse Analytics
API
Integrated analytics service on Azure that combines enterprise data warehousing with big data analytics.
✓ Certified
BigQuery
BigQuery is a cloud-based data warehousing and analytics tool that allows users to store, query, and analyze large datasets quickly and efficiently.
✓ Certified
Business Glossary
Ingest a comprehensive list of business terms, definitions, and term hierarchies into DataHub from a YAML-based business glossary file.
Incubating
Cassandra
Apache Cassandra is a distributed NoSQL database designed to handle large amounts of data across many commodity servers.
✓ Certified
ClickHouse
ClickHouse is an open-source column-oriented database management system designed for high-performance data processing and analytics.
Testing
CockroachDb
CockroachDB is a distributed SQL database that provides strong consistency, high availability, and horizontal scalability.
Incubating
Confluence
Atlassian's team workspace for creating, organizing, and collaborating on documentation and knowledge bases.
✓ Certified
Confluent Kafka
Enterprise-grade data streaming platform built on Apache Kafka with schema registry, governance, and managed cloud service.
CouchBase
API
Distributed NoSQL cloud database for enterprise applications with built-in full-text search and analytics.
CrateDB
API
Distributed SQL database built on a shared-nothing architecture for machine data and IoT workloads.
Incubating
CSV
An ingestion source for enriching metadata provided in CSV format provided by DataHub
✓ Certified
Dagster
Dagster is a next-generation open source orchestration platform for the development, production, and observation of data assets.
✓ Certified
Databricks
Databricks is a cloud-based data and AI platform built on Apache Spark, used for data engineering, analytics, and machine learning workflows.
✓ Certified
DataHub
Integrate your open source DataHub instance with DataHub Cloud or other on-prem DataHub instances
Testing
DataHub Apply
DataHub utility for applying metadata changes and configurations to your DataHub instance.
Testing
DataHub Debug
DataHub utility for debugging and diagnosing metadata issues in your DataHub instance.
Incubating
DataHub Documents
DataHub utility for ingesting and managing documentation assets, runbooks, and tribal-knowledge documents stored in DataHub.
Testing
DataHub GC
DataHub garbage collection utility for cleaning up stale metadata and soft-deleted entities.
Testing
DB2
IBM's enterprise relational database management system for transactional and analytical workloads.
✓ Certified
dbt
dbt is a data transformation tool that enables analysts and engineers to transform data in their warehouses through a modular, SQL-based approach.
Incubating
Delta Lake
Delta Lake is an open-source data lake storage layer that provides ACID transactions, schema enforcement, and data versioning for big data workloads.
Demo Data
Demo Data is a data tool that provides sample data sets for demonstration and testing purposes.
Incubating
dlt
dlt integration with DataHub.
Domo
API
Cloud-based business intelligence platform with data integration, visualization, and collaboration capabilities.
✓ Certified
Dremio
Dremio is a data lake engine that provides fast, direct access to data lake storage with enterprise-grade security and governance.
Incubating
Druid
Druid is an open-source data store designed for real-time analytics on large datasets.
Incubating
DynamoDB
DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
✓ Certified
Elasticsearch
Elasticsearch is a distributed, open-source search and analytics engine designed for handling large volumes of data.
Exasol
API
High-performance in-memory analytics database optimized for data warehousing and analytics workloads.
Incubating
Excel
Microsoft Excel spreadsheet integration for importing metadata definitions and enrichment data.
Testing
Fabric Data Factory
Microsoft Fabric Data Factory is a cloud-based data integration service for building and orchestrating data pipelines within the Microsoft Fabric platform.
Testing
Fabric OneLake
Microsoft Fabric's unified data lake for storing and managing all organizational data in one place.
✓ Certified
Feast
Feast is an open-source feature store that enables teams to manage, store, and discover features for machine learning applications.
✓ Certified
File
Ingest metadata from a JSON file containing pre-generated DataHub MCEs/MCPs — useful for backfills, debugging, and offline pipelines.
✓ Certified
File Based Lineage
File Based Lineage is a data tool that tracks the lineage of data files and their dependencies.
✓ Certified
Fivetran
Fivetran is a cloud-based data integration platform that provides automated data pipelines for analytics and business intelligence.
Incubating
Flink
Real-time stream processing framework for stateful computations over bounded and unbounded data streams.
✓ Certified
Glue
Glue is a data integration service that allows users to extract, transform, and load data from various sources into a data warehouse.
Google Cloud Bigtable
API
Fully managed, scalable NoSQL database service for large analytical and operational workloads on Google Cloud.
Incubating
Google Cloud Knowledge Catalog (Dataplex)
Google Cloud's intelligent data fabric for unified data management, governance, and analytics across data lakes and warehouses.
Incubating
Google Cloud Storage
Google Cloud Storage is a unified object storage service that provides industry-leading durability and availability for data.
✓ Certified
Grafana
Grafana is an open-source analytics and monitoring platform that allows users to query, visualize, and alert on metrics and logs.
Great Expectations
Great Expectations is an open-source data validation and testing tool that helps data teams maintain data quality and integrity.
Greenplum
API
Open-source massively parallel processing data warehouse based on PostgreSQL for large-scale analytics.
Incubating
Hex
Hex is a collaborative data workspace that allows teams to explore, analyze, and share data insights through interactive notebooks.
Hightouch
API
Reverse ETL platform that syncs data from your warehouse to business tools like CRMs, ad platforms, and SaaS apps.
✓ Certified
Hive
Hive is a data warehousing tool that facilitates querying and managing large datasets stored in Hadoop Distributed File System (HDFS).
✓ Certified
Hive Metastore
Hive Metastore (HMS) stores metadata for Hive, Presto, Trino, and Spark in a backend RDBMS such as MySQL or PostgreSQL.
IBM Cognos Analytics
API
Enterprise business intelligence suite for interactive dashboards, reports, and AI-powered data exploration.
Incubating
Iceberg
Iceberg is a data tool that allows users to manage and query large-scale data sets using a distributed architecture.
Incubating
Informatica
Enterprise data integration and management platform for ETL, data quality, and master data management.
Incubating
JSON Schemas
JSON Schemas is a data tool used to define the structure, format, and validation rules for JSON data.
✓ Certified
Kafka
Kafka is a distributed streaming platform that allows for the processing and storage of large amounts of data in real-time.
✓ Certified
Kafka Connect
Kafka Connect is an open-source data integration tool that enables the transfer of data between Apache Kafka and other data systems.
✓ Certified
LDAP
LDAP (Lightweight Directory Access Protocol) is a data tool used for accessing and managing distributed directory information services over an IP network.
Lightdash
API
Open-source BI tool that connects to your dbt project to provide self-serve analytics and exploration.
✓ Certified
Looker
Looker is a business intelligence and data analytics platform that allows users to explore, analyze, and share data insights in real-time.
✓ Certified
MariaDB
MariaDB is an open-source relational database management system that is a fork of MySQL.
Matillion
API
Cloud-native data integration platform for building ELT pipelines into cloud data warehouses.
✓ Certified
Metabase
Metabase is an open-source business intelligence and data visualization tool that allows users to easily query and visualize their data.
Microsoft Fabric
API
Unified analytics platform from Microsoft that integrates data engineering, data science, warehousing, and real-time analytics.
MicroStrategy
API
Enterprise analytics and mobility platform for business intelligence, dashboards, and reporting.
Incubating
MLflow
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle.
✓ Certified
Mode
Mode is a cloud-based data analysis and visualization platform that enables businesses to explore, analyze, and share data in a collaborative environment.
✓ Certified
MongoDB
MongoDB is a NoSQL database that stores data in flexible, JSON-like documents, making it easy to store and retrieve data for modern applications.
Monte Carlo
API
Data observability platform that detects, resolves, and prevents data quality issues across your data stack.
✓ Certified
MySQL
MySQL is an open-source relational database management system that allows users to store, organize, and retrieve data efficiently.
✓ Certified
Neo4j
Neo4j is a graph database management system that stores data in nodes and relationships, enabling complex graph queries and analytics.
✓ Certified
NiFi
NiFi is a data integration tool that allows users to automate the flow of data between systems and applications.
Incubating
Notion
Notion is an all-in-one workspace for notes, docs, wikis, and project management. Ingest pages and databases as DataHub datasets.
✓ Certified
Okta
Okta is a cloud-based identity and access management platform that enables secure single sign-on across applications and devices.
Incubating
Omni
Omni is a business intelligence and analytics platform that combines SQL, spreadsheets, and no-code exploration for collaborative data analysis.
Incubating
OpenAPI
OpenAPI is a specification for building and documenting RESTful APIs. Ingest endpoints and schemas as DataHub datasets and lineage.
✓ Certified
OpenLineage
Open standard for data lineage collection and analysis. DataHub supports OpenLineage via a native REST endpoint and Spark event listener plugin.
Incubating
Oracle
Oracle is a relational database management system that provides a comprehensive and integrated platform for managing and analyzing large amounts of data.
Incubating
Pinecone
Pinecone is a vector database designed for building fast, scalable similarity search and AI-powered applications.
Pinot
API
Real-time distributed OLAP datastore designed for low-latency analytics on large-scale datasets.
✓ Certified
PostgreSQL
Postgres is an open-source relational database management system that provides a powerful tool for storing, managing, and analyzing large amounts of data.
✓ Certified
Power BI
Power BI is a Microsoft business analytics service that provides interactive visualizations and self-service business intelligence dashboards.
Incubating
PowerBI Report Server
On-premises report server for hosting and managing Power BI reports, paginated reports, and KPIs.
✓ Certified
Prefect
Prefect is a modern workflow orchestration for data and ML engineers.
✓ Certified
Preset
Preset is a cloud-native business intelligence platform that provides self-service analytics and data visualization capabilities.
✓ Certified
Presto
Presto is an open-source distributed SQL query engine designed for fast and interactive analytics on large-scale data sets.
Protobuf Schemas
Protobuf Schemas is a data tool used for defining and serializing structured data in a compact and efficient manner.
Incubating
Pulsar
Pulsar is a real-time data processing and messaging platform that enables high-performance data streaming and processing.
Incubating
Qlik Sense
Qlik Sense is a business intelligence and data analytics platform that enables users to create interactive visualizations and dashboards.
Incubating
RDF
RDF (Resource Description Framework) is a standard model for data interchange. Ingest ontologies and knowledge graphs as DataHub glossary terms.
Incubating
Redash
Redash is a data visualization and collaboration platform for connecting to data sources and building interactive dashboards.
✓ Certified
Redpanda
Kafka-compatible streaming data platform built for mission-critical workloads without ZooKeeper.
✓ Certified
Redshift
Redshift is a cloud-based data warehousing tool that allows users to store and analyze large amounts of data in a scalable and cost-effective manner.
✓ Certified
S3 Data Lake
Amazon S3 is a cloud object storage service for storing and analyzing data files at scale, commonly used as a data lake foundation.
Testing
SAC
SAP Analytics Cloud (SAC) is a cloud-based business intelligence and planning platform that provides analytics and planning capabilities.
✓ Certified
SageMaker
SageMaker is a data tool that provides a fully-managed platform for building, training, and deploying machine learning models at scale.
✓ Certified
Salesforce
Salesforce is a cloud-based CRM platform that helps businesses manage sales, marketing, customer service, and analytics activities.
Testing
SAP HANA
SAP HANA is an in-memory data platform that enables businesses to process large volumes of data in real-time.
Incubating
Sigma
Sigma is a cloud-native analytics and business intelligence platform that provides spreadsheet-like interface for data analysis.
SingleStore
API
Distributed SQL database designed for real-time analytics and transactional workloads in a single unified engine.
Sisense
API
Embedded analytics platform for building and integrating data-driven insights into products and workflows.
✓ Certified
Slack
Send notifications to Slack channels for entity changes, assertions, and incidents in DataHub. Includes user discovery from Slack workspaces.
Testing
SnapLogic
SnapLogic is an enterprise iPaaS that connects applications, data sources, and APIs across cloud and on-premises environments via low-code workflows.
✓ Certified
Snowflake
Snowflake is a cloud-based data warehousing platform that allows users to store, manage, and analyze large amounts of structured and semi-structured data.
Incubating
Snowplow
Open-source behavioral data platform for collecting, processing, and modeling event-level data.
Soda
API
Data quality platform for testing, monitoring, and profiling data with checks defined as code.
Spark
Spark is a data processing tool that enables fast and efficient processing of large-scale data sets using distributed computing.
Spline
API
Open-source data lineage tracking and visualization tool for Apache Spark pipelines.
Incubating
SQL Queries
DataHub utility for ingesting SQL query metadata and lineage from query log files, generating column-level lineage between datasets.
✓ Certified
SQL Server
Microsoft SQL Server is a relational database management system designed to store, manage, and retrieve data efficiently and securely.
Incubating
SQLAlchemy
SQLAlchemy is a Python-based data tool that provides a set of high-level API for connecting to relational databases and performing SQL operations.
SSIS
API
SQL Server Integration Services — Microsoft's enterprise ETL platform for data migration, transformation, and workflow automation.
Testing
StarRocks
High-performance analytical database for real-time, multi-dimensional analytics at scale.
✓ Certified
Superset
Apache Superset is an open-source data exploration and visualization platform for building interactive dashboards and ad-hoc analysis.
✓ Certified
Tableau
Tableau is a data visualization and business intelligence tool that helps users analyze and present data in a visually appealing and interactive way.
Talend
API
Open-source and enterprise data integration platform for ETL, data quality, and cloud data management.
Testing
Teradata
Teradata is an enterprise data warehouse platform for storing, managing, and analyzing large volumes of structured data at scale.
ThoughtSpot
API
AI-powered analytics platform with natural language search for self-service business intelligence.
TimescaleDB
API
Open-source time-series database built on PostgreSQL for fast analytics on time-series data.
✓ Certified
Trino
Trino is an open-source distributed SQL query engine for fast analytics on large-scale data warehouses, lakes, and federated sources.
Incubating
Vertex AI
Vertex AI is Google Cloud's machine learning platform that provides tools for building, training, and deploying ML models at scale.
✓ Certified
Vertica
Vertica is a high-performance, column-oriented, relational database management system designed for large-scale data warehousing and analytics.
Don't see your data source?
Request a Connector
|
Build Your Own