Skip to main content
Version: Next

MLflow

Testing

Important Capabilities

CapabilityStatusNotes
DescriptionsExtract descriptions for MLflow Registered Models and Model Versions
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
Extract TagsExtract tags for MLflow Registered Model Stages

Concept Mapping

This ingestion source maps the following MLflow Concepts to DataHub Concepts:

Source ConceptDataHub ConceptNotes
Registered ModelMlModelGroupThe name of a Model Group is the same as a Registered Model's name (e.g. my_mlflow_model)
Model VersionMlModelThe name of a Model is {registered_model_name}{model_name_separator}{model_version} (e.g. my_mlflow_model_1 for Registered Model named my_mlflow_model and Version 1, my_mlflow_model_2, etc.)
Model StageTagThe mapping between Model Stages and generated Tags is the following:
- Production: mlflow_production
- Staging: mlflow_staging
- Archived: mlflow_archived
- None: mlflow_none

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: mlflow
config:
# Coordinates
tracking_uri: tracking_uri

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
base_external_url
string
Base URL to use when constructing external URLs to MLflow. If not set, tracking_uri is used if it's an HTTP URL. If neither is set, external URLs are not generated.
model_name_separator
string
A string which separates model name from its version (e.g. model_1 or model-1)
Default: _
registry_uri
string
Registry server URI. If not set, an MLflow default registry_uri is used (value of tracking_uri or MLFLOW_REGISTRY_URI environment variable)
tracking_uri
string
Tracking server URI. If not set, an MLflow default tracking_uri is used (local mlruns/ directory or MLFLOW_TRACKING_URI environment variable)
env
string
The environment that all assets produced by this connector belong to
Default: PROD
stateful_ingestion
StatefulIngestionConfig
Stateful Ingestion Config
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False

Code Coordinates

  • Class Name: datahub.ingestion.source.mlflow.MLflowSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for MLflow, feel free to ping us on our Slack.