Simple ETL with DLT
DLT makes Data Engineering accessible for all. Just declare your transformations in SQL or Python, and DLT will handle the Data Engineering complexity for you.

**Accelerate ETL development**
Enable analysts and data engineers to innovate rapidly with simple pipeline development and maintenance
**Remove operational complexity**
By automating complex administrative tasks and gaining broader visibility into pipeline operations
**Trust your data**
With built-in quality controls and quality monitoring to ensure accurate and useful BI, Data Science, and ML
**Simplify batch and streaming**
With self-optimization and auto-scaling data pipelines for batch or streaming processing
Simple ingestion with Lakeflow Connect
Lakeflow Connect offers built-in data ingestion connectors for popular SaaS applications, databases and file sources, such as Salesforce, Workday, and SQL Server to build incremental data pipelines at scale, fully integrated with Databricks.
To give it a try, check our [Lakeflow Connect Product Tour](https://www.databricks.com/resources/demos/tours/platform/discover-databricks-lakeflow-connect-demo)
Our DLT pipeline
We'll be using as input a raw dataset containing information on our customers Loan and historical transactions.
Our goal is to ingest this data in near real time and build table for our Analyst team while ensuring data quality.
**Your DLT Pipeline is ready!** Your pipeline was started using this notebook and is
available here.

Our datasets are coming from 3 different systems and saved under a cloud storage folder (S3/ADLS/GCS):
* `loans/raw_transactions` (loans uploader here in every few minutes)
* `loans/ref_accounting_treatment` (reference table, mostly static)
* `loans/historical_loans` (loan from legacy system, new data added every week)
Let's ingest this data incrementally, and then compute a couple of aggregates that we'll need for our final Dashboard to report our KPI.
Bronze layer: incrementally ingest data leveraging Databricks Autoloader

Our raw data is being sent to a blob storage.
Autoloader simplify this ingestion, including schema inference, schema evolution while being able to scale to millions of incoming files.
Autoloader is available in SQL using the `read_files` function and can be used with a variety of format (json, csv, avro...):
For more detail on Autoloader, you can see `dbdemos.install('auto-loader')`
STREAMING LIVE TABLE
Defining tables as `STREAMING` will guarantee that you only consume new incoming data. Without `STREAMING`, you will scan and ingest all the data available at once. See the [documentation](https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-incremental-data.html) for more details
Silver layer: joining tables while ensuring data quality

Once the bronze layer is defined, we'll create the sliver layers by Joining data. Note that bronze tables are referenced using the `LIVE` spacename.
To consume only increment from the Bronze layer like `BZ_raw_txs`, we'll be using the `stream` keyworkd: `stream(LIVE.BZ_raw_txs)`
Note that we don't have to worry about compactions, DLT handles that for us.
Expectations
By defining expectations (`CONSTRAINT
EXPECT `), you can enforce and track your data quality. See the [documentation](https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-expectations.html) for more detailsGold layer

Our last step is to materialize the Gold Layer.
Because these tables will be requested at scale using a SQL Endpoint, we'll add Zorder at the table level to ensure faster queries using `pipelines.autoOptimize.zOrderCols`, and DLT will handle the rest.
Next steps
Your DLT pipeline is ready to be started.
Click here to access the pipeline created for you using this notebook.
To create a new one, Open the DLT menu, create a pipeline and select this notebook to run it. To generate sample data, please run the [companion notebook]($./_resources/00-Loan-Data-Generator) (make sure the path where you read and write the data are the same!)
Datas Analyst can start using DBSQL to analyze data and track our Loan metrics. Data Scientist can also access the data to start building models to predict payment default or other more advanced use-cases.
Tracking data quality
Expectations stats are automatically available as system table.
This information let you monitor your data ingestion quality.
You can leverage DBSQL to request these table and build custom alerts based on the metrics your business is tracking.
See [how to access your DLT metrics]($./03-Log-Analysis)
Data Quality Dashboard example