(nbs:streaming_outputs)=
# Streaming Outputs with the Foundation Model API

This notebook covers how to stream responses from [chat completion](https://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html#query-a-chat-completion-model) or [text completion](https://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html#query-a-chat-completion-model) models. Streaming enables us to begin returning tokens as they are generated, before the entire response is complete. Streaming can dramatically reduce the time it takes for users to see a response.

In [None]:
%pip install --upgrade --force-reinstall databricks-genai-inference
dbutils.library.restartPython()

## Streaming with the Python SDK

To enable streaming, include the argument `stream=True` in calls to chat completion or text completion endpoints. Here's how to do it with the `ChatCompletion.create` method in the Python SDK:

In [None]:
from databricks_genai_inference import ChatCompletion

def stream_chat_reply(model, user_message, max_tokens=2048):
    messages = [{"role": "user", "content": user_message}]
    generator = ChatCompletion.create(
        model=model,
        max_tokens=max_tokens,
        stream=True,
        messages=messages,
    )

    for response in generator:
        # Using `end=''` to avoid adding a new line after each print statement
        print(response.message, end='')


With streaming enabled, the `create` method returns a [generator](https://docs.python.org/3/glossary.html#term-generator) that provides a sequence of `ChatCompletionChunkObject`s. The completion itself is included in the `ChatCompletionChunkObject.message` attribute, so we can iterate over generator and print each `message` in order to stream the results to the user.

In [None]:
stream_reply(model="mixtral-8x7b-instruct", user_message="Tell me in detail how to make chocolate chip cookies.")

1. Gather ingredients: 
   - 2 1/4 cups all-purpose flour
   - 1/2 teaspoon baking soda
   - 1 cup unsalted butter, room temperature
   - 1/2 cup granulated sugar
   - 1 cup packed light-brown sugar
   - 1 teaspoon salt
   - 2 teaspoons pure vanilla extract
   - 2 large eggs
   - 2 cups semisweet and/or milk chocolate chips

2. Preheat oven to 350°F (180°C). In a small bowl, whisk together flour and baking soda; set aside.

3. In the bowl of an electric mixer fitted with the paddle attachment, combine butter with both sugars; beat on medium speed until light and fluffy.

4. Reduce speed to low; add salt, vanilla, and eggs. Beat until well mixed, about 1 minute.

5. Add flour mixture; mix until just combined. Stir in chocolate chips.

6. Drop heaping tablespoon-size balls of dough about 2 inches apart on baking sheets lined with parchment paper.

7. Bake until cookies are golden around the edges, but still soft in the center, 8 to 10 minutes.

8. Remove from oven, and let cookies cool o

This works in much the same way for the [text completion](https://docs.databricks.com/en/machine-learning/foundation-models/query-foundation-model-apis.html#query-a-text-completion-model) endpoint.

In [None]:
from databricks_genai_inference import Completion

def stream_text_reply(model, user_prompt, max_tokens=2048):
    generator = Completion.create(
        model=model,
        max_tokens=max_tokens,
        stream=True,
        prompt=user_prompt,
    )

    for response in generator:
        # Using `end=''` to avoid adding a new line after each print statement
        print(response.text, end='')

stream_text_reply(model="mpt-30b-instruct",
                  user_prompt="Tell me in detail how to make chocolate chip cookies.")

Ah, this is a very popular recipe.  Here are the basic ingredients, which you can always modify to your own tastes:
1 cup of butter, softened
1 cup of brown sugar
2 eggs
1 tsp vanilla
2 1/4 cups of flour
1 tsp of baking soda
1/4 tsp of salt
2 cups of chocolate chips

Some additional tips:  be sure to use softened butter and not melted butter, and also be sure to use a high-quality chocolate chip.  Also, be sure to cream the butter and sugar together well, and also be sure to mix in the eggs and vanilla well.  For the flour, baking soda, and salt, you want to mix them together dry, and then add them to the wet ingredients.  Finally, I recommend using a spoon to mix in the chocolate chips, since that gives a nice, even mixture.

Now, for baking, you want to preheat the oven to 375, and then spoon cookie dough onto a baking sheet, leaving about 2 inches of space between cookies.  You may want to use a cookie scoop for this, or you may want to just eyeball it.  You want to bake these cooki

## Streaming responses with the REST API

Including `"stream": true` in the REST API request will result in series of [`ChatCompletionChunk`](https://docs.databricks.com/en/machine-learning/foundation-models/api-reference.html#chatcompletionchunk) objects when calling a Chat model endpoint.

In [None]:
%sh
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me in detail how to make chocolate chip cookies."
    }
  ],
  "max_tokens": 2048,
  "stream": true
}' \
https://e2-dogfood.staging.cloud.databricks.com/serving-endpoints/databricks-mixtral-8x7b-instruct/invocations
  



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed


data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.chunk","created":1707146947,"model":"mixtral-8x7b-instruct-v0.1","choices":[{"index":0,"delta":{"role":"assistant","content":"1"},"finish_reason":null}],"usage":{"prompt_tokens":28,"completion_tokens":1,"total_tokens":29}}

data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.chunk","created":1707146947,"model":"mixtral-8x7b-instruct-v0.1","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}],"usage":{"prompt_tokens":28,"completion_tokens":2,"total_tokens":30}}

data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.chunk","created":1707146947,"model":"mixtral-8x7b-instruct-v0.1","choices":[{"index":0,"delta":{"role":"assistant","content":" G"},"finish_reason":null}],"usage":{"prompt_tokens":28,"completion_tokens":3,"total_tokens":31}}

data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.chunk","created":17

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   254    0     0  100   254      0   1242 --:--:-- --:--:-- --:--:--  1239100 29181    0 28927  100   254  25051    219  0:00:01  0:00:01 --:--:-- 25264100 60401    0 60147  100   254  27892    117  0:00:02  0:00:02 --:--:-- 28002100 91692    0 91438  100   254  28936     80  0:00:03  0:00:03 --:--:-- 29016100  119k    0  119k  100   254  29412     61  0:00:04  0:00:04 --:--:-- 29472100  149k    0  149k  100   254  29699     49  0:00:05  0:00:05 --:--:-- 30924100  179k    0  179k  100   254  29863     41  0:00:06  0:00:06 --:--:-- 30975100  209k    0  209k  100   254  29952     35  0:00:07  0:00:07 --:--:-- 30838100  219k    0  219k  100   254  29995     33  0:00:07  0:00:07 --:--:-- 30770


data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.chunk","created":1707146947,"model":"mixtral-8x7b-instruct-v0.1","choices":[{"index":0,"delta":{"role":"assistant","content":" they"},"finish_reason":null}],"usage":{"prompt_tokens":28,"completion_tokens":722,"total_tokens":750}}

data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.chunk","created":1707146947,"model":"mixtral-8x7b-instruct-v0.1","choices":[{"index":0,"delta":{"role":"assistant","content":" are"},"finish_reason":null}],"usage":{"prompt_tokens":28,"completion_tokens":723,"total_tokens":751}}

data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.chunk","created":1707146947,"model":"mixtral-8x7b-instruct-v0.1","choices":[{"index":0,"delta":{"role":"assistant","content":" soft"},"finish_reason":null}],"usage":{"prompt_tokens":28,"completion_tokens":724,"total_tokens":752}}

data: {"id":"604bbef0-44ac-4835-be63-70a4c0870278","object":"chat.completion.

Calling on a completion model will return a series of [`text_completion`](https://docs.databricks.com/en/machine-learning/foundation-models/api-reference.html#completion-response)s.

In [None]:
%sh
curl \
 -u token:$DATABRICKS_TOKEN \
 -X POST \
 -H "Content-Type: application/json" \
 -d '{"prompt": "Tell me in detail how to make chocolate chip cookies.",
      "stream": true}' \
https://e2-dogfood.staging.cloud.databricks.com/serving-endpoints/databricks-mpt-30b-instruct/invocations

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed


data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":"Sure","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":1,"total_tokens":40}}

data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":",","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":2,"total_tokens":41}}

data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":" you","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":3,"total_tokens":42}}

data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":"’","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":4,"total_tokens":43}}

da

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  5622    0  5533  100    89   5194     83  0:00:01  0:00:01 --:--:--  5273100 11907    0 11818  100    89   5726     43  0:00:02  0:00:02 --:--:--  5768100 18209    0 18120  100    89   5918     29  0:00:03  0:00:03 --:--:--  5946100 24300    0 24211  100    89   6026     22  0:00:04  0:00:04 --:--:--  6049


data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":" and","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":82,"total_tokens":121}}

data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":" bake","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":83,"total_tokens":122}}

data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":" for","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":84,"total_tokens":123}}

data: {"id":"74b9c840-934a-490f-a542-4191bff313da","object":"text_completion","model":"mpt-30b-instruct","choices":[{"text":" 8","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":39,"completion_tokens":85,"total_toke

## Conclusion
You are now ready to integrate streaming outputs into your next project with the Databricks Foundation Model API!

See the [Databricks Foundation Model APIs Documentation](https://docs.databricks.com/en/machine-learning/foundation-models/index.html) for more detail.