End-to-End MLOps demo with MLFlow, and Models in Unity Catalog

Challenges moving ML project into production

Moving an ML project from a standalone notebook to a production-grade data pipeline is complex and requires multiple competencies.

Having a model up and running in a notebook isn't enough. We need to cover the end to end ML Project life cycle and solve the following challenges:

* Update data over time (production-grade ingestion pipeline)
* How to save, share, and re-use ML features in the organization
* How to ensure a new model version that respects quality standards and won't break the pipeline
* Model governance: what is deployed, how is it trained, by whom, and which data?
* How to monitor and re-train the model...

In addition, these projects typically involve multiple teams, creating friction and potential silos

* Data Engineers in charge of ingesting, preparing, and exposing the data
* Data Scientist, expert in data analysis, building ML model
* ML engineers, setup the ML infrastructure pipelines (similar to DevOps)

This has a real impact on the business, slowing down projects and preventing them from being deployed in production and bringing ROI.

What's MLOps?

MLOps is a set of standards, tools, processes, and methodology that aims to optimize time, efficiency, and quality while ensuring governance in ML projects.

MLOps orchestrate a project life-cycle between the project and the teams to implement such ML pipelines smoothly.

Databricks is uniquely positioned to solve this challenge with the Lakehouse pattern. Not only do we bring Data Engineers, Data Scientists, and ML Engineers together in a unique platform, but we also provide tools to orchestrate ML projects and accelerate the go to production.

MLOps process walkthrough

In this quickstart demo, we'll go through a few common steps in the MLOps process. The result of this process is a model used to power a dashboard for downstream business stakeholders, which is:
* preparing features
* training a model for deployment
* registering the model for its use to be governed
* validating the model in a champion-challenger analysis
* invoking a trained ML model as a pySpark UDF

In this first quickstart, we'll cover the foundation of MLOps.

The advanced section will go into more detail, including:
- Model Serving
- Realtime Feature serving with Online Tables
- A/B testing
- Infra setup and hooks with Databricks MLOps Stack
- ...

Customer churn detection

To explore MLOps, we'll implement a customer churn model.

Our marketing team asked us to create a Dashboard tracking Churn risk evolution. In addition, we need to provide our renewal team with a daily list of customers at Churn risk to increase our final revenue.

Our Data Engineer team provided us with a dataset collecting information on our customer base, including churn information. That's where our implementation starts.

Let's see how we can implement such a model and provide our marketing and renewal team with Dashboards to track and analyze our Churn prediction.

Ultimately, you'll be able to build a complete DBSQL Churn Dashboard containing all our customer & churn information but also start a Genie space to ask any question using plain English!

Feature Engineering

Our first job is to analyze the data and prepare a set of features.

Next: [Analyze the data and prepare features]($./01_feature_engineering)