DataHubMockData
This source is for generating mock data for testing purposes. Expect breaking changes as we iterate on the mock data source.
CLI based Ingestion
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
enabled boolean | Whether this source is enabled Default: True |
gen_1 LineageConfigGen1 | Configuration for lineage data generation |
gen_1.emit_lineage boolean | Whether to emit lineage data for testing purposes. When False, no lineage data is generated regardless of other settings. Default: False |
gen_1.level_subtypes map(str,string) | |
gen_1.lineage_fan_out integer | Number of downstream tables that each upstream table connects to. This controls the 'width' of the lineage graph. Higher values create more parallel downstream tables per level. Default: 3 |
gen_1.lineage_fan_out_after_first_hop integer | Optional limit on fanout for hops after the first hop. When set, prevents exponential growth by limiting the number of downstream tables per upstream table at levels 2 and beyond. When None, uses the standard exponential growth (lineage_fan_out^level). |
gen_1.lineage_hops integer | Number of hops (levels) in the lineage graph. This controls the 'depth' of the lineage graph. Level 0 is the root table, and each subsequent level contains downstream tables. Higher values create deeper lineage chains. Default: 2 |
gen_1.subtype_pattern Enum | Pattern for determining SubTypes. Options: 'alternating', 'all_table', 'all_view', 'level_based' Default: alternating |
The JSONSchema for this configuration is inlined below.
{
"title": "DataHubMockDataConfig",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether this source is enabled",
"default": true,
"type": "boolean"
},
"gen_1": {
"title": "Gen 1",
"description": "Configuration for lineage data generation",
"allOf": [
{
"$ref": "#/definitions/LineageConfigGen1"
}
]
}
},
"additionalProperties": false,
"definitions": {
"SubTypePattern": {
"title": "SubTypePattern",
"description": "An enumeration.",
"enum": [
"alternating",
"all_table",
"all_view",
"level_based"
],
"type": "string"
},
"LineageConfigGen1": {
"title": "LineageConfigGen1",
"description": "Configuration for generating mock lineage data for testing purposes.\n\nThis configuration controls how the mock data source generates a hierarchical\nlineage graph with multiple levels of upstream/downstream relationships.\n\nThe lineage graph is structured as follows:\n- Level 0: 1 table (root)\n- Level 1: lineage_fan_out tables (each connected to the root)\n- Level 2+: If lineage_fan_out_after_first_hop is set, uses that value;\n otherwise uses lineage_fan_out^level tables (each connected to a level 1 table)\n- ... and so on for lineage_hops levels\n\nExamples:\n - With lineage_fan_out=2, lineage_hops=1: Creates 3 tables total\n (1 root + 2 downstream) with 2 lineage relationships\n - With lineage_fan_out=3, lineage_hops=2: Creates 13 tables total\n (1 + 3 + 9) with 12 lineage relationships\n - With lineage_fan_out=4, lineage_hops=1: Creates 5 tables total\n (1 + 4) with 4 lineage relationships\n - With lineage_fan_out=3, lineage_hops=3, lineage_fan_out_after_first_hop=2:\n Creates 1 + 3 + 6 + 12 = 22 tables total (prevents exponential growth)\n\nTable naming convention: \"hops_{lineage_hops}_f_{lineage_fan_out}_h{level}_t{table_index}\"",
"type": "object",
"properties": {
"emit_lineage": {
"title": "Emit Lineage",
"description": "Whether to emit lineage data for testing purposes. When False, no lineage data is generated regardless of other settings.",
"default": false,
"type": "boolean"
},
"lineage_fan_out": {
"title": "Lineage Fan Out",
"description": "Number of downstream tables that each upstream table connects to. This controls the 'width' of the lineage graph. Higher values create more parallel downstream tables per level.",
"default": 3,
"type": "integer"
},
"lineage_hops": {
"title": "Lineage Hops",
"description": "Number of hops (levels) in the lineage graph. This controls the 'depth' of the lineage graph. Level 0 is the root table, and each subsequent level contains downstream tables. Higher values create deeper lineage chains.",
"default": 2,
"type": "integer"
},
"lineage_fan_out_after_first_hop": {
"title": "Lineage Fan Out After First Hop",
"description": "Optional limit on fanout for hops after the first hop. When set, prevents exponential growth by limiting the number of downstream tables per upstream table at levels 2 and beyond. When None, uses the standard exponential growth (lineage_fan_out^level).",
"type": "integer"
},
"subtype_pattern": {
"description": "Pattern for determining SubTypes. Options: 'alternating', 'all_table', 'all_view', 'level_based'",
"default": "alternating",
"allOf": [
{
"$ref": "#/definitions/SubTypePattern"
}
]
},
"level_subtypes": {
"title": "Level Subtypes",
"description": "Mapping of level to subtype for level_based pattern",
"default": {
"0": "Table",
"1": "View",
"2": "Table"
},
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"additionalProperties": false
}
}
}
Code Coordinates
- Class Name:
datahub.ingestion.source.mock_data.datahub_mock_data.DataHubMockDataSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for DataHubMockData, feel free to ping us on our Slack.
Is this page helpful?