databricks-logo

00-Identity_PK_FK

(SQL)
Loading...
%run ./_resources/00-setup

1/ Create a Dimension & Fact Tables In Unity Catalog

The first step is to create a Delta Tables in Unity Catalog (see documentation).

We want to do that in SQL, to show multi-language support (we could have done it in python too):

  • Use the CREATE TABLE command
  • Add generated identity column with GENERATED ALWAYS AS IDENTITY
  • Define PK with PRIMARY KEY
  • Define Foreign Keys with FOREIGN KEY REFERENCES
--STORE DIMENSION
CREATE OR REPLACE  TABLE dim_store(
  store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  store_name STRING,
  address STRING
);

--PRODUCT DIMENSION
CREATE OR REPLACE  TABLE dim_product(
  product_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  sku STRING,
  description STRING,
  category STRING
);

--CUSTOMER DIMENSION
CREATE OR REPLACE  TABLE dim_customer(
  customer_id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 0 INCREMENT BY 10) PRIMARY KEY,
  customer_name STRING,
  customer_profile STRING,
  address STRING
);

CREATE OR REPLACE TABLE fact_sales(
  sales_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  product_id BIGINT NOT NULL CONSTRAINT dim_product_fk FOREIGN KEY REFERENCES dim_product,
  store_id BIGINT NOT NULL CONSTRAINT dim_store_fk FOREIGN KEY REFERENCES dim_store,
  customer_id BIGINT NOT NULL CONSTRAINT dim_customer_fk FOREIGN KEY REFERENCES dim_customer,
  price_sold DOUBLE,
  units_sold INT,
  dollar_cost DOUBLE
);

OK

2/ Let's look at the table definition for DIM_CUSTOMER

The first step is to run DESCRIBE TABLE EXTENDED

Constraints are shown at the bottom of the results:

col_name data_type
# Constraints
dim_customer_pk PRIMARY KEY (customer_id)
DESCRIBE TABLE EXTENDED dim_customer;

3/ Let's add some data to the Dimension Tables

We want to do that in SQL, to show multi-language support:

  • Use the INSERT INTO command to insert some rows in the table
  • Note that we don't specify the values for IDs as they'll be generated by the engine with auto-increment
INSERT INTO
  dim_store (store_name, address)
VALUES
  ('City Store', '1 Main Rd, Whoville');
  
INSERT INTO
  dim_product (sku, description, category)
VALUES
  ('1000001', 'High Tops', 'Ladies Shoes'),
  ('7000003', 'Printed T', 'Ladies Fashion Tops');
  
INSERT INTO
  dim_customer (customer_name, customer_profile, address)
VALUES
  ('Al', 'Al profile', 'Databricks - Queensland Australia'),
  ('Quentin', 'REDACTED_PROFILE', 'Databricks - Paris France');
SELECT * FROM dim_product;

4/ Let's add some data to the Fact Tables

We want to do that in SQL, to show multi-language support:

  1. Use the INSERT INTO command to insert some rows in the table
INSERT INTO
  fact_sales (product_id, store_id, customer_id, price_sold, units_sold, dollar_cost)
VALUES
  (1, 1, 0, 100.99, 2, 2.99),
  (2, 1, 0, 10.99, 2, 2.99),
  (1, 1, 0, 100.99, 2, 2.99),
  (1, 1, 10, 100.99, 2, 2.99),
  (2, 1, 10, 10.99, 2, 2.99);

Query the tables joining data

We can now imply query the tables to retrieve our data based on the FK:

SELECT * FROM fact_sales
  INNER JOIN dim_product  USING (product_id)
  INNER JOIN dim_customer USING (customer_id)
  INNER JOIN dim_store    USING (store_id)

5/ Primary Key and Foreign Key in Data Explorer


6/ Primary Key and Foreign Key in DBSQL - Code Completion


-- %python spark.sql(f"DROP DATABASE {catalog}.{db} CASCADE")
;