JSON Mode (Structured Output)

What is JSON Mode?

JSON mode enables you to provide a JSON schema to force any Fireworks language model to respond in

Why JSON Responses?

  1. Clarity and Precision: Responding in JSON ensures that the output from the LLM is clear, precise, and easy to parse. This is particularly beneficial in scenarios where the response needs to be further processed or analyzed by other systems.

  2. Ease of Integration: JSON, being a widely-used format, allows for easy integration with various platforms and applications. This interoperability is essential for developers looking to incorporate AI capabilities into their existing systems without extensive modifications.

End-to-end Example

This guide provides a step-by-step example of how to create a structured output response using the Fireworks.ai API. The example uses Python and the pydantic library to define the schema for the output.

Prerequisites

Before you begin, ensure you have the following:

  • Python installed on your system.

  • openai and pydantic libraries installed. You can install them using pip:

    pip install openai pydantic
    

Next, select the model you want to use. In this example, we use mixtral-8x7b-instruct, but all fireworks models support this feature. You can find your favorite model and get a JSON response out of it!

Step 1: Import Libraries

Start by importing the required libraries:

import openai
from pydantic import BaseModel, Field

Step 2: Configure the Fireworks.ai Client

You can use either Fireworks.ai or OpenAI SDK with this feature. Using OpenAI SDK with your API key and the base URL:

client = openai.OpenAI(
    base_url="https://api.fireworks.ai/inference/v1",
    api_key="Your_API_Key",
)

Replace "Your_API_Key" with your actual API key.

Step 3: Define the Output Schema

Define a Pydantic model to specify the schema of the output. For example:

class Result(BaseModel):
    winner: str

This model defines a simple schema with a single field winner. If you are not familiar with pydantic, please check the documentation here . Pydantic emits JSON Schema, and you can find more informations about it here.

Step 4: Specify Your Output Schema in your Chat Completion Request

Make a request to the Fireworks.ai API to get a JSON response. In your request, specify the output schema you used in step 3. For example, to ask who won the US presidential election in 2012:

chat_completion = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x7b-instruct",
    response_format={"type": "json_object", "schema": Result.schema_json()},
    messages=[
        {
            "role": "user",
            "content": "Who won the US presidential election in 2012? Reply just in one JSON.",
        },
    ],
)

Step 5: Display the Result

Finally, print the result:

print(repr(chat_completion.choices[0].message.content))

This will display the response in the format defined by the Result schema. We get just one nice json response:

'{\n "winner": "Barack Obama"\n}'

And you can parse that as a plain JSON, and hook it up with the rest of your system. Current we enforce a structure with a grammar based state machine, to make sure that the LLMs would always generate all the fields in the schema. If your provided output schema is not a valid json schema, we will fail the response.

Using Structured Response Modes

Fireworks support the following variants:

  • Arbitrary JSON. Similar to OpenAI, you can force the model to produce any valid json by providing {"type": "json_object"} as response_format in the request. This forces the model to output JSON but does not specify what specific JSON schema to use.
  • JSON with the given schema. To specify a given JSON schema, you can provide the schema according to JSON schema spec to be imposed on the model generation. See supported constructs in the next section.

Important: when using JSON mode, it's crucial also to instruct the model to produce JSON and describe the desired schema via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request.

To get the best outcome, you need to include the schema in both the prompt and the schema.

Technically, it means that when using "JSON with the given schema" mode, the model doesn't automatically "see" the schema passed in the response_format field. Adherence to the schema is forced upon the model during sampling. So for best results, you need to include the desired schema in the prompt in addition to specifying it as response_format. You may need to experiment with the best way to describe the schema in the prompt depending on the model: besides JSON schema, describing it in plain English might work well too, e.g. "extract name and address of the person in JSON format".

Note: that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. In this case the return value might not be a valid JSON.

Structured response modes work for both Completions and Chat Completions APIs.

If you use Function Calling, JSON mode is enabled automatically and function schema is added to the prompt. So none of the comments above apply.

Supported JSON schema constructs

Fireworks supports a subset of JSON schema specification.

Supported:

  • Nested schemas composition, including anyOf and $ref
  • type: string, number, integer boolean, object, array, null
  • properties and required for objects
  • items for arrays

Fireworks API doesn't error out on unsupported constructs. They just won't be enforced. Not yet supported constraints include:

  • Sophisticated composition with oneOf
  • Length/size constraints for objects and arrays
  • Regular expressions via pattern

Note: JSON specification allows for arbitrary field names to appear in an object with the properties constraint unless "additionalProperties": false or "unevaluatedProperties": false is provided. It's a poor default for LLM constrained generation since any hallucination would be accepted. Thus Fireworks treats any schema with properties constraint as if it had "unevaluatedProperties": false.

An example of response_format field with the schema accepting an object with two fields - a required string and an optional integer:

{
  "type": "json_object",
  "schema": {
    "type": "object",
    "properties": {
      "foo": {"type": "string"},
      "bar": {"type": "integer"}
    },
    "required": ["foo"]
  }
}

Similar Features

Check out our function calling model if you're interested in use cases like:

  • Multi-turn capabilities: For example, the ability for the model to ask for clarifying information about parameters
  • Routing: The ability for the model to route across multiple different options or models. Instead of just having one possible JSON Schema, you have many different JSON schema to work across.

Check out grammar mode if you want structured output specified not through JSON, but rather through an arbitrary grammar (limit output to specific words, character limits, or character types, etc).