Lakehouse Federation

Lakehouse Federation#

This notebook are the examples from this video and gives the example of querying postgres via Databricks. It’s not the only thing you can query, as of right now your options are:

  • MySQL

  • PostgreSQL

  • Oracle

  • Amazon Redshift

  • Salesforce Data Cloud

  • Snowflake

  • Microsoft SQL Server

  • Azure Synapse (SQL Data Warehouse)

  • Google BigQuery

  • Other Databricks Workspaces

  • Hive metastore

Link to the formal docs here

Compute used: serverless warehouse or classic cluster with runtime 16.1

CREATE CONNECTION postgres_test TYPE postgrsql
OPTIONS (
  host 'k8s-hadronco-abcdef01-234567890abc-def0123456.elb.us-west-2.amazonaws.com', --not actually real
  port '5432',
  user 'holly',
  password 'password1' --use scopes for IRL code
)
CREATE FOREIGN CATALOG postgres_test_catalog
USING CONNECTION postgres_test
OPTIONS (databse 'holly'); --this is the database name in postgres
SELECT * FROM postgres_test_catalog.federated_schema.ham_types
ham_typeis_edible
sandwichtrue
Westfalse
CREATE MATERIALIZED VIEW postgresql_mv
SCHEDULE EVERY 1 HOUR
AS SELECT * FROM postgres_test_catalog.federated_schema.ham_types
result
The operation was successfully executed.
--create an empty table before inserting into
INSERT INTO postgres_bronze
SELECT *, --yes I know not best practice
current_timestamp() as ingest_time
FROM postgres_test_catalog.federated_schema.ham_types
num_affected_rowsnum_inserted_rows
22