Enabling the chat completions API
By default, text models only have the completions API enabled. To enable the chat completions API, the model must configure a conversation config which tells the API how to format the prompt from the list of messages.
Setting a conversation config
The conversation config is set in your model's fireworks.json
and is immutable after the model is uploaded. It has one required field, style
which configures the conversation style. See below for a list of valid styles.
Here is an example configuration for a Llama chat model:
{
"base_model": "meta-llama/Llama-2-70b-chat-hf",
"conversation_config": {
"style": "llama-chat"
}
}
Styles
alpaca
messages
must contain exactly one message from the "user" role
. The generated prompt will be
### Instruction:
{messages[0].content.strip()}
### Response:
chatml
This style uses the ChatML system prompt: "You are a helpful assistant."
The completion prompt template for the following messages
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": {{ question_1 }}
},
{
"role": "assistant",
"content": {{ answer_1 }}
},
{
"role": "user",
"content": {{ question_2 }}
},
]
will be
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{{ question_1 }}<|im_end|>
<|im_start|>assistant
{{ answer_1 }}
<|im_start|>user
{{ question_2 }}<|im_end|>
<|im_start|>assistant
The messages must adhere to the ChatML format as specified here. The first message must be a system prompt and the last message must be a user prompt. User and assistant prompts must interleave each other.
llama-chat
By default, this style uses the default LLaMA system prompt:
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
A custom (possibly empty) system prompt may be specified in the first message in the request by setting the role to system
. The first non-system message must be from the "user" role, then alternate with the "assistant" role, and end with the "user" role.
See generation.py for a reference implementation.
llama-infill
messages
must contain exactly two messages, the first from the "prefix" role
and the second from the "suffix" role
. The generated prompt will be
<PRE> {messages[0].content} <SUF>{messages[1].content} <MID>
mistral-chat
See https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1#instruction-format
passthrough
messages
must contain exactly one message whose content
is passed to the model directly. role
and name
are ignored.
Updated 17 days ago