Intro to Lakeflow Declaritive Pipelines
Declarative Pipelines simplify batch and streaming ETL with automated reliability and built-in data quality. Let's give it a try!
Optimizing our bike rental business - ETL pipeline
Our fictional company operates bike rental stations across the city. The primary goal of this data pipeline is to transform raw operational data—such as ride logs, maintenance records, and weather information—into a structured and refined format, enabling comprehensive analytics.
This allows us to track key business metrics like total revenue, forecast future earnings, understand revenue contributions from members versus non-members, analyze customer behavior and lifetime value, and crucially, identify and quantify revenue loss due to maintenance issues.
By providing these insights, the pipeline empowers us to optimize operations, improve bike availability, and ultimately maximize profitability.
We'll be using as input a raw dataset containing information coming from our ride tracking system as well as data from our maintenence system, weather data, and customer CDC events. Our goal is to ingest this data in near real time and build table for our analyst team while ensuring data quality.
Getting started with the new pipeline editor
Databricks provides a [rich editor](https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/declarative-pipelines/declarative-pipelines-0.png?raw=true) to help you build and navigate through your different pipeline steps!

2/ Get started with Streaming Tables and Materialized view
Creating your pipeline is super simple! If you're new to the Declarative Pipelines, it's best to start with the [UI introduction from the documentation](https://docs.databricks.com/aws/en/dlt/dlt-multi-file-editor)!
**Your Lakeflow Declarative Pipeline has been installed and started for you!** Open the
Bike Rental Declarative Pipeline to see it in action.
*(Note: The pipeline will automatically start once the initialization job is completed, this might take a few minutes... Check installation logs for more details)*
3/ Ingesting and transforming your data
Now that we reviewed the data available to us, it's time to start creating our pipeline! We'll do it one step at a time.
Open the [00-pipeline-tutorial notebook]($./transformations/00-pipeline-tutorial) if you want to start with the basics behind Streaming Table and Materialized View.
Bronze: Raw data ingested into Delta tables. Our bronze layer contains our raw data loaded with minimal schema changes into tables using Autoloader.
Tables in our bronze layer: - maintenance_logs_raw - rides_raw - weather_raw - customers_cdc_raw |
Silver: Cleaned and enriched with data quality rules
Filter out invalid rides and maintenance logs, enrich data with ride revenue, categorize maintenance issues, and process customer CDC events using Auto CDC for SCD Type 2 (historical tracking).
Tables in our silver layer: - maintenance_logs - rides - weather - customers (SCD Type 2) |
Gold: Curated for analytics & AI.
Aggregates data for reporting by pre-calulating how much revenue each station makes as a origin and destination as well as calculates how much revenue loss each maintenance event costs.
Tables:
- maintenance_events - stations - bikes |
 |
 |
 |
Open transformations/01-bronze.sql |
Open transformations/01-silver.sql |
Open transformations/01-gold.sql |