Unity catalog provides all the features required to your data governance & security:
- Table ACL - Row level access with dynamic view - Secure access to external location (blob storage) - Lineage at row & table level for data traceability - Traceability with audit logs - Lineage - Volumes - Genie - System Tables - Delta Sharing - Lakehouse federation - And many more
Because unity catalog is added as a supplement to your existing account, migrating your existing data to the new UC is very simple.
Unity Catalog works with 3 layers:
* CATALOG * SCHEMA (or DATABASE) * TABLE
The table created without Unity Catalog are available under the default `hive_metastore` catalog, and they're scoped at a workspace level.
New tables created with Unity Catalog will available at the account level, meaning that they're cross-workspace.
Introducing UCX
UCX is a toolkit for enabling Unity Catalog (UC) in your Databricks workspace. UCX provides commands and workflows for migrating tables and views to UC. UCX allows you to rewrite dashboards, jobs and notebooks to use the migrated data assets in UC. And there are many more features!
UCX is a public source project developed and maintained by Databricks Labs.
UCX documentation is available here: [https://databrickslabs.github.io/ucx/](https://databrickslabs.github.io/ucx/)
Migrating to Unity Catalog requires the follwing steps: * Assessing your current Databricks deployment * Creating and Attaching a metastore * Migrating Workspace Groups * Upgrading Tables * Upgrading Code * Validation
Let's try it!
Using UCX to upgrade to Unity Catalog streamlines the process and reduces the risk.
To upgrade a workspace to Unity Catalog the following steps have to be followed.
1. Installation 2. Assessment 3. Account and Cloud asset upgrade 4. Table Migration Always refer to the [documentation](https://databrickslabs.github.io/ucx/) when using UCX as the product evolves on a regular basis. 5. Code linting/upgrade
1/ Installation
UCX (and other Databricks Labs projects) is now embeded with the [Databrick CLI](https://docs.databricks.com/aws/en/dev-tools/cli/install).
The updated list of requirements for installation is avalailable in the [Documentation Site](https://databrickslabs.github.io/ucx/docs/installation/
installation-requirements).
Installing UCX is as simple as issuing the command `databricks labs install ucx`.
During the installation you'll be prompted with a number of questions pertaining to the installation and use of UCX.
2/ Assessment
The next step is running the assessment workflow. The assessment workflow produce a report with an inventory of all the assets that needs to be upgraded to UC and all the potential problems and incompatibilities.
Your assessment dashboard will be available for you in one command line: `databricks labs ucx ensure-assessment-run`
More information is available in the [Documentation Site](https://databrickslabs.github.io/ucx/docs/reference/workflows/
assessment-workflow).
3/ Account and Cloud Assets
Before we can migrate tables, we have to complete the following steps to make sure your account and cloud policies matche your organization requirements: - [Create Account Groups](https://databrickslabs.github.io/ucx/docs/reference/commands/
- [Create Catalogs and Schemas](https://databrickslabs.github.io/ucx/docs/reference/commands/
create-catalogs-schemas)
UCX can perform all these operations, check the documentation for more detail!
4/ Table Migration
The Table Migration workflow migrate multiple kind of tables and views. It can be run from the command line or by invoking the workflow.
The Table Migration workflow is deployed to the workspace when UCX is installed.
It can be invoked from the UI or using the `databricks labs ucx migrate-tables` command.
5/ Update your code to use the new migrated table
Now that the tables and view are migrated, we can migrate the code.
Code linting highlights all the changes we have to make in our code.
It is composed of three steps that are performed by the linter: - Scanning Code Resources - Gathering Code and Snippets from: - Jobs and dashboards - Notebooks - Files - Wheels - Eggs - Cluster configurations - PyPi
- Graphing - Construct a dependency graph by resolving: - References from jobs to dependencies - References from notebooks to notebooks - References from files to files - Breaking jobs into tasks - Breaking notebooks into cells
- Linting - Lint the graph to find: - Direct file system access - Table references - JVM access - RDD API - SQL context - Spark logging - dbutils.notebook.run - And more!
UCX has multiple options of migrating code: - Workflow Linting - Static Code Linting - Dashboard Migration
Analyzing your Workflow for UC updates
UCX can analyze the code related to all workflows or to a specific set of workflows.
Analyzing the workflows is performed as part of the assessment.
Analyzing the workflows involves the following steps: 1. Create a graph of all the dependent tasks/notebooks/scripts 2. Analyze all the code related to the workflows 3. Report the findings to the assessment tables
Static Code Linting
Static code linting crawls code accessible by the workstation that runs the CLI. Typically it is used to analyze code that is hosted in a GIT repository and is cloned/pulled locally.
Lint local code is inoked by the `databricks labs ucx lint-local-code` cli command.
Running this command from a modern IDE (such as PyCharm/IntelliJ/VSCode) will yield hyperlinks to the code issues.
More information is available in the [documentation](https://databrickslabs.github.io/ucx/docs/reference/commands/
lint-local-code).
Congratulation, you're ready to upgrade your asset and benefit from all Unity Catalog power!
For more details, please use the [official UCX site](https://databrickslabs.github.io/ucx) or reach out to your account team.
Interested by an example to migrate your tables manually?
Check the [01-manual-upgrade-to-UC]($./01-manual-upgrade-to-UC) for a custom example upgrading your table to UC!