Enhanced Onboarding and Data Source Registration
Info
This feature is being gradually rolled out to customers and may not be available to your account yet.
The enhanced onboarding and data source registration workflow allows you to register your data at the host level, making data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your host for changes so that data sources are added and removed automatically to reflect the state of data on your host.
Once you register your host, Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in your data platform:
- Host: This first tier represents your server or data platform account.
- Folder: This second tier represents your database or schema (depending on the structure of your remote platform).
- Data source: This third tier represents individual tables within your schema or database.
For example, the following object hierarchy for Snowflake hosts would be displayed on the Immuta infrastructure page:
- Host
- Database
- Schema
- Data source
Beyond making the registration of your data more intuitive, enhanced onboarding provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object discovery) at the host level.
Requirements
See the Snowflake or Databricks Unity Catalog host registration how-to guides for a list of requirements.
Host registration and crawls
In this enhanced onboarding workflow, you configure the integration and register data sources simultaneously. Once you save your configuration, Immuta manages and applies Snowflake or Unity Catalog governance features to data registered in Immuta.
Then, Immuta crawls your host to register all tables within every schema and database that the Snowflake role or Databricks account credentials you provided during the configuration has access to. The object metadata, user metadata, and policy definitions are stored in the Immuta metadata database, and this metadata is used to enforce policies for users accessing this data.
After initial registration, your host can be crawled in two ways:
- Periodic crawl: This crawl happens once every 24 hours. Currently, updating this schedule is not configurable.
- Manual crawl: You can manually trigger a crawl of your host.
During these subsequent crawls of your host, Immuta identifies tables, schemas, or databases that have been added or removed. If tables are added, new data sources are created in Immuta. If remote tables are deleted, the corresponding data sources will be disabled in Immuta.
For more information about the Snowflake or Databricks Unity Catalog integration and and how policies are enforced, see the Snowflake integration reference guide or Databricks Unity Catalog integration reference guide.
Integration settings
When registering a host, Immuta sets the configuration to the recommended default settings to protect your data1. The recommended settings are described below:
- Infrastructure object discovery: This setting allows Immuta to monitor schemas for changes. When Immuta identifies a new table, a data source will automatically be created. Similarly, if remote tables are deleted, the corresponding data sources will be disabled. This setting is enabled by default.
- Default run schedule: This sets the time interval for Immuta to check for new objects. By default, this schedule is set to 24 hours.
- Sensitive data discovery: This setting enables sensitive data discovery and allows you to select the sensitive data discovery framework that Immuta will apply to your data objects. This setting is enabled by default to use the preconfigured or global framework.
- Impersonation: This setting enable and defines the role for user impersonation in Snowflake. User impersonation is not supported in the Databricks Unity Catalog integration. This setting is disabled by default.
- Project workspaces: This setting enables Snowflake project workspaces. If you use Snowflake secure data sharing with Immuta, enable this setting, as project workspaces are required. If you use Snowflake table grants, disable this setting; project workspaces cannot be used when Snowflake table grants are enabled. Project workspaces are not supported in the Databricks Unity Catalog integration. This setting is disabled by default.
Unregistering a host
Unregistering a host automatically deletes all of its child objects in Immuta. However, Immuta will not remove the objects in your Snowflake or Databricks account.
Limitations and known issues
- Users can currently register a host, unregister a host, and update the connection information for a host.
- Snowflake and Databricks Unity Catalog are currently the only integrations that support the simplified data registration workflow.
- Databricks Unity Catalog:
- Only managed and external tables will be registered as data sources.
- Delta shares are unsupported.
Related guides
How-to guide
Reference guides
- Snowflake integration reference guide
- Databricks Unity Catalog reference guide
- Policies in Immuta
- Data sources in Immuta
-
Users cannot currently change the default settings. However, these settings can be adjusted using the legacy configuration workflow for Snowflake or Databricks Unity Catalog. ↩