Migrate From Legacy to Native SDD
This guide provides information and best practices for migrating from the deprecated legacy sensitive data discovery (SDD) option to the improved native SDD. This guide is for users who have already enabled SDD on their tenant and have Discovered tags on their data sources.
Before you begin
Native vs legacy SDD
Legacy SDD is deprecated. It will be removed and replaced by native SDD. Native SDD is significantly improved from legacy SDD for discovering and tagging your data with upgrades to the built-in identifiers. Additionally, the greatest benefit is the respect for data residency. Native SDD doesn't move any of your data when running. The discovery is done right in your data platform, and the platform only returns the matching identifiers and column names to Immuta.
See the Sensitive data discovery reference page for more information on native SDD.
Requirements
- Native SDD requires Snowflake, Databricks, Starburst (Trino), or Redshift data sources
- Legacy SDD enabled on your tenant
- Legacy SDD tags applied to your data sources: To find out if you have legacy SDD tags applied, create a governance report as described in the understand the context of you tags section.
Enable native SDD
Contact your Immuta representative to enable native SDD on your Immuta tenant. Many users already have native SDD enabled, so proceed to understand the context of your tags if you want to self-service check if native SDD is already running and tagging your data before you reach out to the representative.
This action will not change anything immediately on your tenant; however, anytime identification runs in the future, it will be native SDD instead of the legacy version.
To assess native SDD for your data, proceed with the steps below. If you do not review native SDD, the legacy SDD tags will all remain on your data source columns. However, when identification automatically runs on new data sources and columns, it will apply native SDD tags, and because of the improvements to SDD, it may tag different data than legacy SDD.
Understand the context of your tags
Requirement: Immuta permission GOVERNANCE
- Manually run identification globally to run native identification on your data sources.
- To check the tags on an individual data source, navigate to the data source data dictionary and select a Discovered tag. On the tag side sheet, you can determine the context of the tag. When identifiers match data, native SDD will apply tags, and their tag context will be Sensitive Data Discovery. Any tags with the contextLegacy Sensitive Data Discoverywere not matched by native SDD but will remain on the data source.
- To check your tags globally, navigate to the governance reports page and build a report for sensitive data discovery. This report will present the legacy tags on your data sources' columns and native SDD tags that are also on those columns. Use this report to assess the context of the Discovered tags and understand if native SDD is matching the data you want it to.
These actions will allow you to understand the differences between how native SDD and legacy SDD tag your data and whether your data is recognized as expected by native SDD or if legacy SDD was over-tagging your data. This way you can better tune SDD to your data.
If there are any legacy SDD tags that you want native SDD to catch, you need to tune native SDD so that this type of data is discovered in future tables and columns; see guidance on that in the next section.
Tune SDD
Requirement: Immuta permission GOVERNANCE
Using the report you built above, complete these actions to tune SDD:
- Focus on a legacy SDD tag properly applied to your data. Assess whether the native SDD tag on the column instead was applied more accurately than the legacy tag. If it is applied incorrectly, proceed to the next step.
- Create a new regex or dictionary identifier in the framework to discover this data with the tag you want applied. Ensure it is specific and will match your data with at least 90% confidence (or match).
- Complete the steps above for all legacy SDD tags.
- Retest your updated identifiers by re-running identification on the select data sources and continue refining to the level of accuracy you want.
Completing the actions above will create parity between what legacy SDD was tagging your data and what native SDD will tag in the future.