Section Contents
This integration enforces policies on Databricks tables registered as data sources in Immuta, allowing users to query policy-enforced data on Databricks clusters (including job clusters). Immuta policies are applied to the plan that Spark builds for users' queries, all executed directly against Databricks tables.
The guides in this section outline how to integrate Databricks with Immuta to gain value from all three Immuta modules: Discover, Detect, and Secure.
How-to guides
- Databricks configuration: Configure the Databricks Spark integration.
- DBFS access: Access DBFS in Databricks for non-sensitive data.
- Limited enforcement in Databricks: Allow Immuta users to access tables that are not protected by Immuta.
- Hiding the Immuta database in Databricks: Hide the Immuta database from users in Databricks, since user queries do not need to reference it.
- Run spark-submit jobs on Databricks: Run R and Scala
spark-submit
jobs on your Databricks cluster. - Project UDFs cache settings: Raise the caching on-cluster and lower the cache timeouts for the Immuta web service to allow use of project UDFs in Spark jobs.
- External metastores: Use an existing Hive external metastore instead of the built-in metastore.
Reference guides
- Databricks Spark integration reference guide: This guide describes the design and components of the integration.
- Configuration settings: These guides describe various integration settings that can be configured, including environment variables, cluster policies, and performance.
- Databricks change data feed: This guide describes Immuta's support of Databricks change data feed.
- Databricks libraries: The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. This guide describes the feature and its configuration.
- Delta Lake API: When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked. This reference guide outlines the Spark SQL options that can be substituted for the Delta Lake API.
- Spark direct file reads: Immuta allows direct file reads in Spark for file paths. This guide describes that process.