Unity Catalog : Support for Identity Columns, Primary + Foreign Key Constraints







To simplify SQL operations and support migrations from on-prem and alternative warehouse, Databricks Lakehouse now give customers convenient ways to build Entity Relationship Diagrams that are simple to maintain and evolve.

These features offer:
- The ability to automatically generate auto-incrementing identify columns. Just insert data and the engine will automatically increment the ID.
- Support for defining primary key
- Support for defining foreign key constraints

Note that as of now, Primary Key and Foreign Key are informational only and then won’t be enforced.




Use case





Defining PK & FK helps the BI analyst to understand the entity relationships and how to join tables. It also offers more information to BI tools who can leverage this to perform further optimisation.

We'll define the following star schema:
* dim_store
* dim_product
* dim_customer

And the fact table containing our sales information pointing to our dimension tables:

* fact_sales

Requirements:
- PK/FK requires Unity Catalog enabled (Hive Metastore is not supported for FK/PK)
- DBR 11.1


Cluster setup for UC






To be able to run this demo, make sure you create a cluster with the security mode enabled.

Go in the compute page, create a new cluster.

Select "Single User" and your UC-user (the user needs to exist at the workspace and the account level)

1/ Create a Dimension & Fact Tables In Unity Catalog



The first step is to create a Delta Tables in Unity Catalog (see [documentation](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html)).

We want to do that in SQL, to show multi-language support (we could have done it in python too):


* Use the `CREATE TABLE` command
* Add generated identity column with `GENERATED ALWAYS AS IDENTITY`
* Define PK with `PRIMARY KEY`
* Define Foreign Keys with `FOREIGN KEY REFERENCES`

2/ Let's look at the table definition for DIM_CUSTOMER



The first step is to run DESCRIBE TABLE EXTENDED

Constraints are shown at the bottom of the results:


| col_name | data_type |
|----------------|--------------------------|
|

Constraints | |


| dim_customer_pk | PRIMARY KEY (`customer_id`) |

3/ Let's add some data to the Dimension Tables



We want to do that in SQL, to show multi-language support:

* Use the `INSERT INTO` command to insert some rows in the table
* Note that we don't specify the values for IDs as they'll be generated by the engine with auto-increment

4/ Let's add some data to the Fact Tables



We want to do that in SQL, to show multi-language support:
1. Use the `INSERT INTO` command to insert some rows in the table

Query the tables joining data



We can now imply query the tables to retrieve our data based on the FK:

5/ Primary Key and Foreign Key in Data Explorer






6/ Primary Key and Foreign Key in DBSQL - Code Completion






Summary





As you have seen Primary Keys and Foreign Keys help the BI analyst to understand the entity relationships and how to join tables and even better having code completion do the joins for you.

The best Datawarehouse is a Lakehouse!

Next Steps:
- Try DBSQL query & dashboard editors
- Plug your BI tools (Tableau, PowerBI ...) to query these tables directly!