Grammar mode

What is grammar-based structured output?

Grammar mode is the ability to specify a forced output schema for any Fireworks model via an extended BNF formal grammar (GBNF format). This method is popularly used to constrain model outputs in llama.cpp. What is a formal grammar? It's a way to define rules to declare strings to be valid or invalid. See the "Syntax for Describing Grammars" below for more info. Similar to our JSON mode format, you provide response_format field in the request like {"type": "grammar", "grammar": <your BNF grammar string> }.

For best results, we still recommend that you do some prompt engineering and describe the desired output to the model to guide decision-making.

Why grammar-based structured output?

  • Relying solely on system prompt engineering is finicky and time-consuming. It can be difficult to coerce the model to do certain things, for example
    • Behave like a classifier, only output from a predefined list
    • Output only Japanese, Chinese a specified programming language or otherwise prevent the model from generating a large set of of tokens
  • Sometimes JSON is not what you need (e.g. it may be finicky with string escaping) and you need some other structured output
  • Small models may have difficulty in following instructions

End-to-end examples

This guide provides a step-by-step example of creating a structured output response with grammar using the API. The example uses Python and the OpenAI library to define the schema for the output.


Before you begin, ensure you have the following:

  • Python installed on your system.

  • openai libraries installed. You can install them using pip:

    pip install openai

Next, select the model you want to use. In this example, we use mixtral-8x7b-instruct, but all fireworks models support this feature. You can find your favorite model and get structured response out of it!

Step 1: Configure the client

You can use either or OpenAI SDK with this feature. Using OpenAI SDK with your API key and the base URL:

import openai
client = openai.OpenAI(

Replace "Your_API_Key" with your actual API key.

Step 2: Define the output grammar

Define a grammar to restrict the specified output. Let's say you have a model that is a classifier and classifies patient request into a few predefined classes:

root      ::= diagnosis
diagnosis ::= "arthritis" | "dengue" | "urinary tract infection" | "impetigo" | "cervical spondylosis"

Then you can ask the model to only respond within these classes.

Step 3: Specify your output grammar in your chat completions request

from fireworks.client import Fireworks

client = Fireworks(

diagnosis_grammar = """
root      ::= diagnosis
diagnosis ::= "arthritis" | "dengue" | "urinary tract infection" | "impetigo" | "cervical spondylosis"

chat_completion =
    response_format={"type": "grammar", "grammar": diagnosis_grammar},
            "role": "system",
            "content": "Given the symptoms try to guess the possible diagnosis. Possible choices: arthritis, dengue, urinary tract infection, impetigo, cervical spondylosis. Answer with a single word",
            "role": "user",
            "content": "I have been having trouble with my muscles and joints. My neck is really tight and my muscles feel weak. I have swollen joints and it is hard to move around without becoming stiff. It is also really uncomfortable to walk.",

and for the response, we will only get one of the 5 classes we specified, in this case, the model output is


Note, that we still have done some prompt engineering to instruct the model about possible diagnoses in free form. Alternatively, we may have used one of the fine-tuned models for the medical domain.

Advanced examples

Japanese and Chinese

Make a request to the API to get a structured response. In your request, specify the output schema you used in step 3. For example, we are pretending

cjk_grammar = """
root        ::= jp-char+ ([ \t\n] jp-char+)*
jp-char     ::= hiragana | katakana | punctuation | cjk
hiragana    ::= [ぁ-ゟ]
katakana    ::= [ァ-ヿ]
punctuation ::= [、-〾]
cjk         ::= [一-鿿]
chat_completion =
    response_format={"type": "grammar", "schema": cjk_grammar},
            "role": "user",
            "content": "You are a Japanese tour guide who speaks fluent Japanese. Please tell me what are some good places for me to visit in Kyoto",

The model will reply in Japanese


And since the grammar is actually more lenient than Japanese and covers Chinese as well, we can also just prompt the model to be a fluent Chinese speaker.

You are a Japanese tour guide who speaks fluent Chinese. Please tell me what are some good places for me to visit in Shanghai?",

And you can see here that we are trying something a little difficult, asking a Japanese tour guide to speak Chinese. But with the help from the grammar, the model replied in Chinese, with the same grammar specified


Without the help from the grammar, here is the model reply in a mix of Chinese and English

你好!uming as a Japanese tour guide who speaks fluent Chinese, I would be happy to recommend some places for you to visit in Shanghai!\\n\\n1. The Bund: This is a famous waterfront area in Shanghai that offers stunning views of the city's skyline, including the iconic Oriental Pearl Tower. You can take a leisurely stroll along the promenade and enjoy the beautiful scenery.\\n\\n2. Yuyuan Garden: This is a beautiful classical Chinese garden that dates back to the Ming Dynasty. It features pavilions, halls, rockeries, ponds, and cloisters, and is a great place to experience traditional Chinese architecture and garden design.\\n\\n3. Shanghai Tower: This is the tallest building in China and the second-tallest building in the world. You can take the elevator up to the observation deck on the 128th floor for breatht

C code generation

This is one of the community contribution on llama.cpp. You can hook that with our Mixtral model and try to come up with a good solution for a coding problem you have.

c_grammar = """
root ::= (declaration)*

declaration ::= dataType identifier "(" parameter? ")" "{" statement* "}"

dataType  ::= "int" ws | "float" ws | "char" ws
identifier ::= [a-zA-Z_] [a-zA-Z_0-9]*

parameter ::= dataType identifier

statement ::=
    ( dataType identifier ws "=" ws expression ";" ) |
    ( identifier ws "=" ws expression ";" ) |
    ( identifier ws "(" argList? ")" ";" ) |
    ( "return" ws expression ";" ) |
    ( "while" "(" condition ")" "{" statement* "}" ) |
    ( "for" "(" forInit ";" ws condition ";" ws forUpdate ")" "{" statement* "}" ) |
    ( "if" "(" condition ")" "{" statement* "}" ("else" "{" statement* "}")? ) |
    ( singleLineComment ) |
    ( multiLineComment )

forInit ::= dataType identifier ws "=" ws expression | identifier ws "=" ws expression
forUpdate ::= identifier ws "=" ws expression

condition ::= expression relationOperator expression
relationOperator ::= ("<=" | "<" | "==" | "!=" | ">=" | ">")

expression ::= term (("+" | "-") term)*
term ::= factor(("*" | "/") factor)*

factor ::= identifier | number | unaryTerm | funcCall | parenExpression
unaryTerm ::= "-" factor
funcCall ::= identifier "(" argList? ")"
parenExpression ::= "(" ws expression ws ")"

argList ::= expression ("," ws expression)*

number ::= [0-9]+

singleLineComment ::= "//" [^\n]* "\n"
multiLineComment ::= "/*" ( [^*] | ("*" [^/]) )* "*/"

ws ::= ([ \t\n]+)"""
chat_completion =
    response_format={"type": "grammar", "schema": c_grammar},
            "role": "user",
            "content": "You are an expert in writing C code. Can you write a program that prints hello world?",

In this case, we get a cute little valid C program as the output:

char\nc(int a){return 2*a;}



Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. Fireworks API uses an extension of BNF with a few modern regex-like features, inspired by Llama.cpp's implementation.


In BNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence....

Consider an example of a small chess notation grammar:

# `root` specifies the pattern for the overall output
root ::= (
    # it must start with the characters "1. " followed by a sequence
    # of characters that match the `move` rule, followed by a space, followed
    # by another move, and then a newline
    "1. " move " " move "\n"

    # it's followed by one or more subsequent moves, numbered with one or two digits
    ([1-9] [0-9]? ". " move " " move "\n")+

# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?

pawn ::= ...
nonpawn ::= ...
castle ::= ...

Non-terminals and terminals

Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move, castle, or check-mate.

Terminals are actual characters (code points). They can be specified as a sequence like "1" or "O-O" or as ranges like [1-9] or [NBKQR].

Characters and character ranges

Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ], or with escapes: 8-bit (\xXX), 16-bit (\uXXXX) or 32-bit (\UXXXXXXXX).

Character ranges can be negated with ^:

single-line ::= [^\n]+ "\n"`

Dot . symbol matches any character:

any-three-symbol-sequence ::= ...

Sequences and alternatives

The order of symbols in a sequence matter. For example, in "1. " move " " move "\n", the "1. " must come before the first move, etc.

Alternatives, denoted by |, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle, move can be a pawn move, a nonpawn move, or a castle.

Parentheses () can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.

Repetition and optional symbols

  • * after a symbol or sequence means that it can be repeated zero or more times.
  • + denotes that the symbol or sequence should appear one or more times.
  • ? makes the preceding symbol or sequence optional.

Comments and newlines

Comments can be specified with #:

# defines optional whitespace
ws ::= [ \t\n]+

Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker | will continue the current rule, even outside of parentheses.

The root rule

In a full grammar, the root rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.

# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"