Upgrade your tables to Databricks Unity Catalog



Unity catalog provides all the features required to your data governance & security:

- Table ACL
- Row level access with dynamic view
- Secure access to external location (blob storage)
- Lineage at row & table level for data traceability
- Traceability with audit logs
- Lineage
- Volumes
- Genie
- System Tables
- Delta Sharing
- Lakehouse federation
- And many more

Because unity catalog is added as a supplement to your existing account, migrating your existing data to the new UC is very simple.




Unity Catalog works with 3 layers:

* CATALOG
* SCHEMA (or DATABASE)
* TABLE

The table created without Unity Catalog are available under the default `hive_metastore` catalog, and they're scoped at a workspace level.

New tables created with Unity Catalog will available at the account level, meaning that they're cross-workspace.


Introducing UCX



UCX is a toolkit for enabling Unity Catalog (UC) in your Databricks workspace.
UCX provides commands and workflows for migrating tables and views to UC. UCX allows you to rewrite dashboards, jobs and notebooks to use the migrated data assets in UC. And there are many more features!

UCX is a public source project developed and maintained by Databricks Labs.

UCX documentation is available here: [https://databrickslabs.github.io/ucx/](https://databrickslabs.github.io/ucx/)




![UC Migration](https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/uc/ucx/ucx-migration-overview.gif?raw=true)

UC Migration - Overview


Migrating to Unity Catalog requires the follwing steps:
* Assessing your current Databricks deployment
* Creating and Attaching a metastore
* Migrating Workspace Groups
* Upgrading Tables
* Upgrading Code
* Validation

Let's try it!


Using UCX to upgrade to Unity Catalog streamlines the process and reduces the risk.

To upgrade a workspace to Unity Catalog the following steps have to be followed.

1. Installation
2. Assessment
3. Account and Cloud asset upgrade
4. Table Migration
Always refer to the [documentation](https://databrickslabs.github.io/ucx/) when using UCX as the product evolves on a regular basis.
5. Code linting/upgrade

1/ Installation





UCX (and other Databricks Labs projects) is now embeded with the [Databrick CLI](https://docs.databricks.com/aws/en/dev-tools/cli/install).

The updated list of requirements for installation is avalailable in the [Documentation Site](https://databrickslabs.github.io/ucx/docs/installation/

installation-requirements).


Installing UCX is as simple as issuing the command `databricks labs install ucx`.

During the installation you'll be prompted with a number of questions pertaining to the installation and use of UCX.

2/ Assessment





The next step is running the assessment workflow. The assessment workflow produce a report with an inventory of all the assets that needs to be upgraded to UC and all the potential problems and incompatibilities.

Your assessment dashboard will be available for you in one command line: `databricks labs ucx ensure-assessment-run`

More information is available in the [Documentation Site](https://databrickslabs.github.io/ucx/docs/reference/workflows/

assessment-workflow).





3/ Account and Cloud Assets


Before we can migrate tables, we have to complete the following steps to make sure your account and cloud policies matche your organization requirements:
- [Create Account Groups](https://databrickslabs.github.io/ucx/docs/reference/commands/

create-account-groups)


- [Assigning Metastore](https://databrickslabs.github.io/ucx/docs/reference/commands/

assign-metastore)


- [Migrate Permissions from Workspace Groups to Account Groups](https://databrickslabs.github.io/ucx/docs/reference/workflows/

group-migration-workflow)


- [Table Mapping](https://databrickslabs.github.io/ucx/docs/reference/commands/

create-table-mapping)


- [Create Uber Principal](https://databrickslabs.github.io/ucx/docs/reference/commands/

create-uber-principal)


- [Create Storage Credentials](https://databrickslabs.github.io/ucx/docs/reference/commands/

migrate-credentials)


- [Create External Locations](https://databrickslabs.github.io/ucx/docs/reference/commands/

migrate-locations)


- [Create Catalogs and Schemas](https://databrickslabs.github.io/ucx/docs/reference/commands/

create-catalogs-schemas)




UCX can perform all these operations, check the documentation for more detail!

4/ Table Migration





The Table Migration workflow migrate multiple kind of tables and views. It can be run from the command line or by invoking the workflow.

The Table Migration workflow is deployed to the workspace when UCX is installed.

It can be invoked from the UI or using the `databricks labs ucx migrate-tables` command.

5/ Update your code to use the new migrated table


Now that the tables and view are migrated, we can migrate the code.

Code linting highlights all the changes we have to make in our code.

It is composed of three steps that are performed by the linter:
- Scanning Code Resources - Gathering Code and Snippets from:
- Jobs and dashboards
- Notebooks
- Files
- Wheels
- Eggs
- Cluster configurations
- PyPi

- Graphing - Construct a dependency graph by resolving:
- References from jobs to dependencies
- References from notebooks to notebooks
- References from files to files
- Breaking jobs into tasks
- Breaking notebooks into cells

- Linting - Lint the graph to find:
- Direct file system access
- Table references
- JVM access
- RDD API
- SQL context
- Spark logging
- dbutils.notebook.run
- And more!

UCX has multiple options of migrating code:
- Workflow Linting
- Static Code Linting
- Dashboard Migration

Analyzing your Workflow for UC updates




UCX can analyze the code related to all workflows or to a specific set of workflows.

Analyzing the workflows is performed as part of the assessment.

Analyzing the workflows involves the following steps:
1. Create a graph of all the dependent tasks/notebooks/scripts
2. Analyze all the code related to the workflows
3. Report the findings to the assessment tables




Static Code Linting





Static code linting crawls code accessible by the workstation that runs the CLI. Typically it is used to analyze code that is hosted in a GIT repository and is cloned/pulled locally.


Lint local code is inoked by the `databricks labs ucx lint-local-code` cli command.

Running this command from a modern IDE (such as PyCharm/IntelliJ/VSCode) will yield hyperlinks to the code issues.


More information is available in the [documentation](https://databrickslabs.github.io/ucx/docs/reference/commands/

lint-local-code).


Congratulation, you're ready to upgrade your asset and benefit from all Unity Catalog power!



For more details, please use the [official UCX site](https://databrickslabs.github.io/ucx) or reach out to your account team.

Interested by an example to migrate your tables manually?



Check the [01-manual-upgrade-to-UC]($./01-manual-upgrade-to-UC) for a custom example upgrading your table to UC!