Using the Foundation Model API Outside of Databricks#
Introduction#
While Databricks provides a great environment for working with AI tools including the Foundation Model API, you can easily call on the API from outside of Databricks. This is useful if you want to use the Foundation Model API in your app or while writing code in another editor such as VSCode, Vim, or Emacs. This notebook shows how to do so. The workflow will be familiar to users of other AI inference products such as those offered by OpenAI or Anthropic: you need to generate an API key and pass it to the API calls in order to query the models. Additionally, you will need the URL of your Databricks workspace.
Setup#
1. Generate a Personal Access Token (PAT)#
First, generate a personal access token;
In your workspace, navigate to “User Settings” in the dropdown menu on the top right of the screen.
Click “Developer” in the menu on the left, and then click “Manage” under “Access Tokens.”
Click “Generate new token” to create a new PAT.
Copy the token and paste it to a secure location such as a
.env
file.
Warning
If you are tracking your code with git, make sure to add this file to your .gitignore
file to avoid inadvertently sharing it with others.
Note
If you lose your PAT, you will need to generate a new one. You cannot access the PAT from this menu again.
2. Identify your Databricks host URL#
To use the Foundation Model API, you need to identify the URL of your Databricks workspace. The URL typically starts with https://
and includes your Databricks instance name. https://cust-success.cloud.databricks.com/
is an example of what the URL might look like.
Query the API with the Python SDK#
Once you have obtained your PAT and Databricks host URL, you can query the API with the Python SDK.
Install the SDK#
You can install the Python SDK with pip:
pip install databricks-genai-inference
Query#
Now you can query the API with the Python SDK. This is much the same as using the Python SDK in a Databricks notebook except that you need to provide the databricks_host
and databricks_token
parameters. There are a number of different ways to do this. A couple of common patterns for handling these parameters include:
export the environment variables:
This will make the DATABRICKS_HOST
and DATABRICKS_TOKEN
environment variables available to your code. You will not need to pass them explicitly to your API calls.
export DATABRICKS_HOST="https://<instance_name>.cloud.databricks.com"
export DATABRICKS_TOKEN="<your_personal_access_token>"
load the variables from a .env
file:
You can load the variables from a .env
file using the load_dotenv
function from the Python-dotenv
library.
The .env
file should look something like this:
DATABRICKS_HOST="https://<instance_name>.cloud.databricks.com"
DATABRICKS_TOKEN="<your_personal_access_token>"
Remember: do not commit this file to your git project; add it to your .gitignore
file to avoid sharing your credentials with others.
You can load the variables from your .env
file in python as follows:
import os
from dotenv import load_dotenv
load_dotenv()
from dotenv import load_dotenv # for loading environment variables from .env file
load_dotenv()
True
And now the hostname and token from the .env
file are accessible to the Python SDK:
from databricks_genai_inference import ChatCompletion
response = ChatCompletion.create(model="llama-2-70b-chat",
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user","content": "Knock knock."}],
max_tokens=128)
response.message
"\nWho's there? I'm happy to help with anything you need!"
Query the REST API with curl#
To query the REST API with curl
, we again must be mindful of the host and token. Requests are structured as follows. This expects that the DATABRICKS_TOKEN
and DATABRICKS_HOST
environment variables are set. As a reminder, you can do this with:
%%bash
export DATABRICKS_HOST="https://<instance_name>.cloud.databricks.com"
export DATABRICKS_TOKEN="<your_personal_access_token>"
With these variables set, you can query the REST API with:
%%bash
curl -u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. Keep your responses short and concise."
},
{
"role": "user",
"content": "What is MLflow autologging?"
}
],
"max_tokens": 128
}' \
$DATABRICKS_HOST/serving-endpoints/databricks-llama-2-70b-chat/invocations
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 851 100 494 100 357 245 177 0:00:02 0:00:02 --:--:-- 422
{"id":"6f7b2001-583b-4554-9bd6-80a8984f64c4","object":"chat.completion","created":1707860729,"model":"llama-2-70b-chat","choices":[{"index":0,"message":{"role":"assistant","content":"\nMLflow autologging is a feature that automatically logs and tracks experiments, runs, and models in a centralized database, providing a seamless way to manage and track the end-to-end machine learning lifecycle."},"finish_reason":"stop"}],"usage":{"prompt_tokens":44,"completion_tokens":48,"total_tokens":92}}
Conclusion#
This notebook has shown the basics of using the Foundation Model API outside of a Databricks environment. You can now start using the API to build AI applications or to develop other projects in the development environment of your choice.