Metabase
Important Capabilities
Capability | Status | Notes |
---|---|---|
Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion.remove_stale_metadata |
Platform Instance | ✅ | Enabled by default |
Table-Level Lineage | ✅ | Supported by default |
This plugin extracts Charts, dashboards, and associated metadata. This plugin is in beta and has only been tested on PostgreSQL and H2 database.
Collection
/api/collection endpoint is used to retrieve the available collections.
/api/collection/<COLLECTION_ID>/items?models=dashboard endpoint is used to retrieve a given collection and list their dashboards.
Dashboard
/api/dashboard/<DASHBOARD_ID> endpoint is used to retrieve a given Dashboard and grab its information.
- Title and description
- Last edited by
- Owner
- Link to the dashboard in Metabase
- Associated charts
Chart
/api/card endpoint is used to retrieve the following information.
- Title and description
- Last edited by
- Owner
- Link to the chart in Metabase
- Datasource and lineage
The following properties for a chart are ingested in DataHub.
Name | Description |
---|---|
Dimensions | Column names |
Filters | Any filters applied to the chart |
Metrics | All columns that are being used for aggregation |
CLI based Ingestion
Install the Plugin
The metabase
source works out of the box with acryl-datahub
.
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
connect_uri string | Metabase host URL. Default: localhost:3000 |
database_alias_map object | Database name map to use when constructing dataset URN. |
database_id_to_instance_map map(str,string) | |
default_schema string | Default schema name to use when schema is not provided in an SQL query Default: public |
display_uri string | optional URL to use in links (if connect_uri is only for ingestion) |
engine_platform_map map(str,string) | |
exclude_other_user_collections boolean | Flag that if true, exclude other user collections Default: False |
password string(password) | Metabase password. |
platform_instance_map map(str,string) | |
username string | Metabase username. |
env string | The environment that all assets produced by this connector belong to Default: PROD |
stateful_ingestion StatefulStaleMetadataRemovalConfig | Base specialized config for Stateful Ingestion with stale metadata removal capability. |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "MetabaseConfig",
"description": "Any non-Dataset source that produces lineage to Datasets should inherit this class.\ne.g. Orchestrators, Pipelines, BI Tools etc.",
"type": "object",
"properties": {
"stateful_ingestion": {
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
},
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance_map": {
"title": "Platform Instance Map",
"description": "A holder for platform -> platform_instance mappings to generate correct dataset urns",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"connect_uri": {
"title": "Connect Uri",
"description": "Metabase host URL.",
"default": "localhost:3000",
"type": "string"
},
"display_uri": {
"title": "Display Uri",
"description": "optional URL to use in links (if `connect_uri` is only for ingestion)",
"type": "string"
},
"username": {
"title": "Username",
"description": "Metabase username.",
"type": "string"
},
"password": {
"title": "Password",
"description": "Metabase password.",
"type": "string",
"writeOnly": true,
"format": "password"
},
"database_alias_map": {
"title": "Database Alias Map",
"description": "Database name map to use when constructing dataset URN.",
"type": "object"
},
"engine_platform_map": {
"title": "Engine Platform Map",
"description": "Custom mappings between metabase database engines and DataHub platforms",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"database_id_to_instance_map": {
"title": "Database Id To Instance Map",
"description": "Custom mappings between metabase database id and DataHub platform instance",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"default_schema": {
"title": "Default Schema",
"description": "Default schema name to use when schema is not provided in an SQL query",
"default": "public",
"type": "string"
},
"exclude_other_user_collections": {
"title": "Exclude Other User Collections",
"description": "Flag that if true, exclude other user collections",
"default": false,
"type": "boolean"
}
},
"additionalProperties": false,
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
"default": {},
"type": "object"
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
}
}
}
Metabase databases will be mapped to a DataHub platform based on the engine listed in the
api/database response. This mapping can be
customized by using the engine_platform_map
config option. For example, to map databases using the athena
engine to
the underlying datasets in the glue
platform, the following snippet can be used:
engine_platform_map:
athena: glue
DataHub will try to determine database name from Metabase api/database
payload. However, the name can be overridden from database_alias_map
for a given database connected to Metabase.
If several platform instances with the same platform (e.g. from several distinct clickhouse clusters) are present in DataHub, the mapping between database id in Metabase and platform instance in DataHub may be configured with the following map:
database_id_to_instance_map:
"42": platform_instance_in_datahub
The key in this map must be string, not integer although Metabase API provides id
as number.
If database_id_to_instance_map
is not specified, platform_instance_map
is used for platform instance mapping. If none of the above are specified, platform instance is not used when constructing urn
when searching for dataset relations.
If needed it is possible to exclude collections from other users by setting the following configuration:
exclude_other_user_collections: true
Compatibility
Metabase version v0.48.3
Code Coordinates
- Class Name:
datahub.ingestion.source.metabase.MetabaseSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Metabase, feel free to ping us on our Slack.