Scaling your business with a GenAI-Powered Assistant

LLMs are disrupting the way we interact with information, from internal knowledge bases to external, customer-facing documentation or support.

While ChatGPT democratized LLM-based chatbots for consumer use, companies need to deploy personalized models that answer their needs:

- Privacy requirements on sensitive information
- Preventing hallucination
- Specialized content, not available on the Internet
- Specific behavior for customer tasks
- Control over speed and cost
- Deploy models on private infrastructure for security reasons

Introducing Databricks AI

To solve these challenges, custom knowledge bases and models need to be deployed. However, doing so at scale isn't simple and requires:

- Ingesting and transforming massive amounts of data
- Ensuring privacy and security across your data pipeline
- Deploying systems such as Vector Search Index
- Having access to GPUs and deploying efficient LLMs for inference serving
- Training and deploying custom models

This is where the Databricks AI comes in. Databricks simplifies all these steps so that you can focus on building your final model, with the best prompts and performance.

GenAI & Maturity curve

Deploying GenAI can be done in multiple ways:

- **Prompt engineering on public APIs (e.g. LLama 2, openAI)**: answer from public information, retail (think ChatGPT)
- **Retrieval Augmented Generation (RAG)**: specialize your model with additional content. *This is what we'll focus on in this demo*
- **OSS model Fine tuning**: when you have a large corpus of custom data and need specific model behavior (execute a task)
- **Train your own LLM**: for full control on the underlying data sources of the model (biomedical, Code, Finance...)

What is Retrieval Augmented Generation (RAG) for LLMs?

RAG is a powerful and efficient GenAI technique that allows you to improve model performance by leveraging your own data (e.g., documentation specific to your business), without the need to fine-tune the model.

This is done by providing your custom information as context to the LLM. This reduces hallucination and allows the LLM to produce results that provide company-specific data, without making any changes to the original LLM.

RAG has shown success in chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.

RAG and Vector Search

To be able to provide additional context to our LLM, we need to search for documents/articles where the answer to our user question might be.
To do so, a common solution is to deploy a vector database. This involves the creation of document embeddings, vectors of fixed size representing your document.

The vectors will then be used to perform real-time similarity search during inference.

Implementing RAG with Databricks AI Foundation models

In this demo, we will show you how to build and deploy your custom chatbot, answering questions on any custom or private information.

As an example, we will specialize this chatbot to answer questions over Databricks, feeding databricks.com documentation articles to the model for accurate answers.

Here is the flow we will implement:

1/ Ingest data and create your Vector Search index

The first step is to ingest and prepare the data before we can make use of our Vector Search index.

We'll use the Data Engineering Lakehouse capabilities to ingest our documentation pages, split them into smaller chunks, compute the chunk embeddings and save them as a Delta Lake table.

**What you will learn:**
- Use langchain and your LLM tokenizer to create chunks from your documents
- Compute embeddings (array<float>) representing our chunks with Databricks Foundation Models
- Create a Vector Search Index on top of your data to provide real-time similarity search

Start the data ingestion and create a Vector Search Index: open the [01-Data-Preparation-and-Index]($./01-Data-Preparation-and-Index) notebook.

2/ Deploying a RAG chatbot endpoint with databricks-llama-2-70b-chat Foundation Endpoint

Our data is ready and our Vector Search Index can answer similarity queries, finding documentation related to our user question.

We can now create a langchain model with an augmented prompt, accessing the LLama2 70B model to answer advanced Databricks questions.

**What you will learn:**
- Search documents with Databricks Langchain retriever
- Build a langchain chain with a custom prompt
- Deploy your chain as a serverless endpoint model and answer customer questions!

Conclusion

We've seen how Databricks AI is uniquely positioned to help you solve your GenAI challenge:

- Simplify Data Ingestion and preparation with Databricks Data Engineering capabilities
- Accelerate Vector Search Index deployment with fully managed indexes
- Leverages Open models, easy to fine-tune for custom requirements
- Access a Databricks AI LLama2-70B endpoint
- Deploy real-time model endpoints to generate answers which leverage your custom data

Interested in deploying your own models? Reach out to your account team!