Enabling the chat completions API

By default, text models only have the completions API enabled. To enable the chat completions API, the model must configure a conversation config which tells the API how to format the prompt from the list of messages.

Setting a conversation config

The conversation config is set in your model's fireworks.json and is immutable after the model is uploaded. It has one required field, style which configures the conversation style. See below for a list of valid styles.

Here is an example configuration for a Llama chat model:

  "base_model": "meta-llama/Llama-2-70b-chat-hf", 
  "conversation_config": {
    "style": "llama-chat"



messages must contain exactly one message from the "user" role. The generated prompt will be

### Instruction:

### Response:


This style uses the ChatML system prompt: "You are a helpful assistant."

The completion prompt template for the following messages

"messages": [
    "role": "system",
    "content": "You are a helpful assistant."
    "role": "user",
    "content": {{ question_1 }}
    "role": "assistant",
    "content": {{ answer_1 }}
    "role": "user",
    "content": {{ question_2 }}

will be

You are a helpful assistant.<|im_end|>
{{ question_1 }}<|im_end|>
{{ answer_1 }}
{{ question_2 }}<|im_end|>

The messages must adhere to the ChatML format as specified here. The first message must be a system prompt and the last message must be a user prompt. User and assistant prompts must interleave each other.


By default, this style uses the default LLaMA system prompt:

You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

A custom (possibly empty) system prompt may be specified in the first message in the request by setting the role to system. The first non-system message must be from the "user" role, then alternate with the "assistant" role, and end with the "user" role.

See generation.py for a reference implementation.


messages must contain exactly two messages, the first from the "prefix" role and the second from the "suffix" role. The generated prompt will be

<PRE> {messages[0].content} <SUF>{messages[1].content} <MID>


See https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1#instruction-format


messages must contain exactly one message whose content is passed to the model directly. role and name are ignored.