Lakehouse Federation#
This notebook are the examples from this video and gives the example of querying postgres via Databricks. Itβs not the only thing you can query, as of right now your options are:
MySQL
PostgreSQL
Oracle
Amazon Redshift
Salesforce Data Cloud
Snowflake
Microsoft SQL Server
Azure Synapse (SQL Data Warehouse)
Google BigQuery
Other Databricks Workspaces
Hive metastore
Link to the formal docs here
Compute used: serverless warehouse or classic cluster with runtime 16.1
CREATE CONNECTION postgres_test TYPE postgrsql
OPTIONS (
host 'k8s-hadronco-abcdef01-234567890abc-def0123456.elb.us-west-2.amazonaws.com', --not actually real
port '5432',
user 'holly',
password 'password1' --use scopes for IRL code
)
CREATE FOREIGN CATALOG postgres_test_catalog
USING CONNECTION postgres_test
OPTIONS (databse 'holly'); --this is the database name in postgres
SELECT * FROM postgres_test_catalog.federated_schema.ham_types
ham_type | is_edible |
---|---|
sandwich | true |
West | false |
CREATE MATERIALIZED VIEW postgresql_mv
SCHEDULE EVERY 1 HOUR
AS SELECT * FROM postgres_test_catalog.federated_schema.ham_types
result |
---|
The operation was successfully executed. |
--create an empty table before inserting into
INSERT INTO postgres_bronze
SELECT *, --yes I know not best practice
current_timestamp() as ingest_time
FROM postgres_test_catalog.federated_schema.ham_types
num_affected_rows | num_inserted_rows |
---|---|
2 | 2 |