%md-sandbox # Unity Catalog : Support for Identity Columns, Primary + Foreign Key Constraints <img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/UC-FK-PK-crop.png" style="float:right; margin:10px 0px 0px 10px" width="700"/> To simplify SQL operations and support migrations from on-prem and alternative warehouse, Databricks Lakehouse now give customers convenient ways to build Entity Relationship Diagrams that are simple to maintain and evolve. These features offer: - The ability to automatically generate auto-incrementing identify columns. Just insert data and the engine will automatically increment the ID. - Support for defining primary key - Support for defining foreign key constraints Note that as of now, Primary Key and Foreign Key are informational only and then won’t be enforced. <br /><br /><br /> ## Use case <img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/UC-star-schema.png" style="float:right; margin:10px 0px 0px 10px" width="700"/> Defining PK & FK helps the BI analyst to understand the entity relationships and how to join tables. It also offers more information to BI tools who can leverage this to perform further optimisation. We'll define the following star schema: * dim_store * dim_product * dim_customer And the fact table containing our sales information pointing to our dimension tables: * fact_sales Requirements: - PK/FK requires Unity Catalog enabled (Hive Metastore is not supported for FK/PK) - DBR 11.1 <!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details. --> <img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=dbsql¬ebook=00-Identity_PK_FK&demo_name=identity-pk-fk&event=VIEW">
%md-sandbox ## Cluster setup for UC <img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-cluster-setup-single-user.png" style="float: right"/> To be able to run this demo, make sure you create a cluster with the security mode enabled. Go in the compute page, create a new cluster. Select "Single User" and your UC-user (the user needs to exist at the workspace and the account level)
%run ./_resources/00-setup
%md ## 1/ Create a Dimension & Fact Tables In Unity Catalog The first step is to create a Delta Tables in Unity Catalog (see [documentation](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html)). We want to do that in SQL, to show multi-language support (we could have done it in python too): * Use the `CREATE TABLE` command * Add generated identity column with `GENERATED ALWAYS AS IDENTITY` * Define PK with `PRIMARY KEY` * Define Foreign Keys with `FOREIGN KEY REFERENCES`
1/ Create a Dimension & Fact Tables In Unity Catalog
The first step is to create a Delta Tables in Unity Catalog (see documentation).
We want to do that in SQL, to show multi-language support (we could have done it in python too):
- Use the
CREATE TABLE
command - Add generated identity column with
GENERATED ALWAYS AS IDENTITY
- Define PK with
PRIMARY KEY
- Define Foreign Keys with
FOREIGN KEY REFERENCES
--STORE DIMENSION CREATE OR REPLACE TABLE dim_store( store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, store_name STRING, address STRING ); --PRODUCT DIMENSION CREATE OR REPLACE TABLE dim_product( product_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, sku STRING, description STRING, category STRING ); --CUSTOMER DIMENSION CREATE OR REPLACE TABLE dim_customer( customer_id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 0 INCREMENT BY 10) PRIMARY KEY, customer_name STRING, customer_profile STRING, address STRING ); CREATE OR REPLACE TABLE fact_sales( sales_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, product_id BIGINT NOT NULL CONSTRAINT dim_product_fk FOREIGN KEY REFERENCES dim_product, store_id BIGINT NOT NULL CONSTRAINT dim_store_fk FOREIGN KEY REFERENCES dim_store, customer_id BIGINT NOT NULL CONSTRAINT dim_customer_fk FOREIGN KEY REFERENCES dim_customer, price_sold DOUBLE, units_sold INT, dollar_cost DOUBLE );
OK
%md ## 2/ Let's look at the table definition for DIM_CUSTOMER The first step is to run DESCRIBE TABLE EXTENDED Constraints are shown at the bottom of the results: | col_name | data_type | |----------------|--------------------------| | # Constraints | | | dim_customer_pk | PRIMARY KEY (`customer_id`) |
2/ Let's look at the table definition for DIM_CUSTOMER
The first step is to run DESCRIBE TABLE EXTENDED
Constraints are shown at the bottom of the results:
col_name | data_type |
---|---|
# Constraints | |
dim_customer_pk | PRIMARY KEY (customer_id ) |
DESCRIBE TABLE EXTENDED dim_customer;
Table
To pick up a draggable item, press the space bar.
While dragging, use the arrow keys to move the item.
Press space again to drop the item in its new position, or press escape to cancel.
%md ## 3/ Let's add some data to the Dimension Tables We want to do that in SQL, to show multi-language support: * Use the `INSERT INTO` command to insert some rows in the table * Note that we don't specify the values for IDs as they'll be generated by the engine with auto-increment
3/ Let's add some data to the Dimension Tables
We want to do that in SQL, to show multi-language support:
- Use the
INSERT INTO
command to insert some rows in the table - Note that we don't specify the values for IDs as they'll be generated by the engine with auto-increment
INSERT INTO dim_store (store_name, address) VALUES ('City Store', '1 Main Rd, Whoville'); INSERT INTO dim_product (sku, description, category) VALUES ('1000001', 'High Tops', 'Ladies Shoes'), ('7000003', 'Printed T', 'Ladies Fashion Tops'); INSERT INTO dim_customer (customer_name, customer_profile, address) VALUES ('Al', 'Al profile', 'Databricks - Queensland Australia'), ('Quentin', 'REDACTED_PROFILE', 'Databricks - Paris France');
Table
To pick up a draggable item, press the space bar.
While dragging, use the arrow keys to move the item.
Press space again to drop the item in its new position, or press escape to cancel.
SELECT * FROM dim_product;
Table
To pick up a draggable item, press the space bar.
While dragging, use the arrow keys to move the item.
Press space again to drop the item in its new position, or press escape to cancel.
%md ## 4/ Let's add some data to the Fact Tables We want to do that in SQL, to show multi-language support: 1. Use the `INSERT INTO` command to insert some rows in the table
4/ Let's add some data to the Fact Tables
We want to do that in SQL, to show multi-language support:
- Use the
INSERT INTO
command to insert some rows in the table
INSERT INTO fact_sales (product_id, store_id, customer_id, price_sold, units_sold, dollar_cost) VALUES (1, 1, 0, 100.99, 2, 2.99), (2, 1, 0, 10.99, 2, 2.99), (1, 1, 0, 100.99, 2, 2.99), (1, 1, 10, 100.99, 2, 2.99), (2, 1, 10, 10.99, 2, 2.99);
Table
To pick up a draggable item, press the space bar.
While dragging, use the arrow keys to move the item.
Press space again to drop the item in its new position, or press escape to cancel.
%md ### Query the tables joining data We can now imply query the tables to retrieve our data based on the FK:
Query the tables joining data
We can now imply query the tables to retrieve our data based on the FK:
SELECT * FROM fact_sales INNER JOIN dim_product USING (product_id) INNER JOIN dim_customer USING (customer_id) INNER JOIN dim_store USING (store_id)
Table
To pick up a draggable item, press the space bar.
While dragging, use the arrow keys to move the item.
Press space again to drop the item in its new position, or press escape to cancel.
%md ## 5/ Primary Key and Foreign Key in Data Explorer <br /> <img src="https://github.com/althrussell/databricks-demo/raw/main/product-demos/pkfk/images/data_explorer.gif" style="float:right; margin-left:100px" width="700"/>
5/ Primary Key and Foreign Key in Data Explorer

%md ## 6/ Primary Key and Foreign Key in DBSQL - Code Completion <br /> <img src="https://github.com/althrussell/databricks-demo/raw/main/product-demos/pkfk/images/code_completion.gif" style="float:center; margin-left:100px" width="700"/>
6/ Primary Key and Foreign Key in DBSQL - Code Completion

-- %python spark.sql(f"DROP DATABASE {catalog}.{db} CASCADE")
%md-sandbox # Summary <img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/UC-FK-PK.png" style="float:right; margin-left:10px" width="900"/> As you have seen Primary Keys and Foreign Keys help the BI analyst to understand the entity relationships and how to join tables and even better having code completion do the joins for you. The best Datawarehouse is a Lakehouse! Next Steps: - Try DBSQL query & dashboard editors - Plug your BI tools (Tableau, PowerBI ...) to query these tables directly!