Skip to content

Immuta Databricks Spark Integration with Unity Catalog Support Pre-Configuration Details

Prerequisites

  • Databricks Runtime 11.3.
  • Unity Catalog enabled on your Databricks cluster.
  • Unity Catalog metastore created and attached to a Databricks workspace.
  • The metastore owner you are using to manage permissions has been granted access to all catalogs, schemas, and tables that will be protected by Immuta. Data protected by Immuta should only be granted to privileged users in Unity Catalog so that the only view of that data is through an Immuta-enabled cluster.
  • You have generated a personal access token for the metastore owner that Immuta can use to read data in Unity Catalog.
  • You do not plan to use non-Unity Catalog enabled clusters with Immuta data sources. Once enabled, all access to data source tables must be on Databricks clusters with Unity Catalog enabled on runtime 11.3.

Feature Availability

Project Workspaces Databricks Tag Ingestion User Impersonation Native Query Audit Multiple Integrations
❌ ❌ ✅ ✅ ✅

For details about the supported features listed in the table above, see the pre-configuration details page for Databricks.

Supported Databricks Cluster Configurations

The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

Example cluster Databricks Runtime Unity Catalog in Databricks Databricks Spark integration Databricks Spark with Unity Catalog support Databricks Unity Catalog integration
Cluster 1 9.1 Unavailable ✅ ⛔ Unavailable
Cluster 2 10.4 Unavailable ✅ ⛔ Unavailable
Cluster 3 11.3 ⛔ ✅ / ⛔ ⛔ / ✅ Unavailable
Cluster 4 11.3 ✅ ⛔ ✅ ⛔
Cluster 5 11.3 ✅ ✅ ⛔ ✅

Legend:

  • ✅ The feature or integration is enabled.
  • ⛔ The feature or integration is disabled.

Databricks Metastore Magic

Databricks metastore magic allows you to migrate your data from the Databricks legacy Hive metastore to the Unity Catalog metastore while protecting data and maintaining your current processes in a single Immuta tenant.

No configuration is necessary to enable this feature. For more details, see the Databricks metastore magic overview.

Caveats and Limitations

  • Native workspaces are not supported. Creating a native workspace on a Unity Catalog enabled host is undefined behavior and may cause data loss or crashes.
  • Tables must be GRANTed access to the Databricks metastore owner token configured for the integration. For the table to be accessible to the user, the full chain of catalog, schema, and table must all have the appropriate grants to this administrator user to allow them to SELECT from the table.
  • Direct file access to Immuta data sources is not supported.
  • Limited Enforcement (called available until protected by policy on the App Settings page), which makes Immuta clusters available to all Immuta users until protected by a policy, is not supported. You must set IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS and IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_WRITES to false in your cluster policies manually or by selecting Protected until made available by policy in the Databricks integration section of the App Settings page.
  • R notebooks may have path-related errors accessing tables.
  • Databricks on Azure will return errors when creating a database in a scratch location when Unity Catalog is enabled.
  • Databricks accounts deployed on Google Cloud Platform are not supported.

Next

Configure Databricks Spark integration with Unity Catalog support.