Using the Foundation Model API Outside of Databricks

Using the Foundation Model API Outside of Databricks#

Introduction#

While Databricks provides a great environment for working with AI tools including the Foundation Model API, you can easily call on the API from outside of Databricks. This is useful if you want to use the Foundation Model API in your app or while writing code in another editor such as VSCode, Vim, or Emacs. This notebook shows how to do so. The workflow will be familiar to users of other AI inference products such as those offered by OpenAI or Anthropic: you need to generate an API key and pass it to the API calls in order to query the models. Additionally, you will need the URL of your Databricks workspace.

Setup#

1. Generate a Personal Access Token (PAT)#

First, generate a personal access token;

In your workspace, navigate to “User Settings” in the dropdown menu on the top right of the screen.
Click “Developer” in the menu on the left, and then click “Manage” under “Access Tokens.”
Click “Generate new token” to create a new PAT.
Copy the token and paste it to a secure location such as a .env file.

Warning

If you are tracking your code with git, make sure to add this file to your .gitignore file to avoid inadvertently sharing it with others.

Note

If you lose your PAT, you will need to generate a new one. You cannot access the PAT from this menu again.

2. Identify your Databricks host URL#

To use the Foundation Model API, you need to identify the URL of your Databricks workspace. The URL typically starts with https:// and includes your Databricks instance name. https://cust-success.cloud.databricks.com/ is an example of what the URL might look like.

Query the API with the Python SDK#

Once you have obtained your PAT and Databricks host URL, you can query the API with the Python SDK.

Install the SDK#

You can install the Python SDK with pip:

pip install databricks-genai-inference

Query#

Now you can query the API with the Python SDK. This is much the same as using the Python SDK in a Databricks notebook except that you need to provide the databricks_host and databricks_token parameters. There are a number of different ways to do this. A couple of common patterns for handling these parameters include:

export the environment variables:

This will make the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables available to your code. You will not need to pass them explicitly to your API calls.

export DATABRICKS_HOST="https://<instance_name>.cloud.databricks.com"
export DATABRICKS_TOKEN="<your_personal_access_token>"

load the variables from a .env file:

You can load the variables from a .env file using the load_dotenv function from the Python-dotenv library.

The .env file should look something like this:

DATABRICKS_HOST="https://<instance_name>.cloud.databricks.com"
DATABRICKS_TOKEN="<your_personal_access_token>"

Remember: do not commit this file to your git project; add it to your .gitignore file to avoid sharing your credentials with others.

You can load the variables from your .env file in python as follows:

import os
from dotenv import load_dotenv
load_dotenv()

from dotenv import load_dotenv # for loading environment variables from .env file
load_dotenv()

True

And now the hostname and token from the .env file are accessible to the Python SDK:

from databricks_genai_inference import ChatCompletion

response = ChatCompletion.create(model="llama-2-70b-chat",
                                 messages=[{"role": "system", "content": "You are a helpful assistant."},
                                           {"role": "user","content": "Knock knock."}],
                                 max_tokens=128)

response.message

"\nWho's there? I'm happy to help with anything you need!"

Query the REST API with curl#

To query the REST API with curl, we again must be mindful of the host and token. Requests are structured as follows. This expects that the DATABRICKS_TOKEN and DATABRICKS_HOST environment variables are set. As a reminder, you can do this with:

%%bash
export DATABRICKS_HOST="https://<instance_name>.cloud.databricks.com"
export DATABRICKS_TOKEN="<your_personal_access_token>"

With these variables set, you can query the REST API with:

%%bash

curl -u token:$DATABRICKS_TOKEN \
     -X POST \
     -H "Content-Type: application/json" \
     -d '{
           "messages": [
             {
               "role": "system",
               "content": "You are a helpful assistant. Keep your responses short and concise."
             },
             {
               "role": "user",
               "content": "What is MLflow autologging?"
             }
           ],
           "max_tokens": 128
         }' \
     $DATABRICKS_HOST/serving-endpoints/databricks-llama-2-70b-chat/invocations

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

100   851  100   494  100   357    245    177  0:00:02  0:00:02 --:--:--   422

{"id":"6f7b2001-583b-4554-9bd6-80a8984f64c4","object":"chat.completion","created":1707860729,"model":"llama-2-70b-chat","choices":[{"index":0,"message":{"role":"assistant","content":"\nMLflow autologging is a feature that automatically logs and tracks experiments, runs, and models in a centralized database, providing a seamless way to manage and track the end-to-end machine learning lifecycle."},"finish_reason":"stop"}],"usage":{"prompt_tokens":44,"completion_tokens":48,"total_tokens":92}}

Conclusion#

This notebook has shown the basics of using the Foundation Model API outside of a Databricks environment. You can now start using the API to build AI applications or to develop other projects in the development environment of your choice.