Querying embedding models

Fireworks hosts one embedding model, the E5 Mistral model. Specifically, we host the intfloat/e5-mistral-7b-instruct variant, a highly adaptable language model that's currently the state-of-the-art model on the Hugging Face leaderboard. It has 32 layers and an embedding size of 4096, making it well-suited for complex embedding tasks.

Embedding documents

Our embeddings service is OpenAI compatible. Use OpenAI's embeddings guide and OpenAI's embeddings documentation for more detailed information on our embedding model usage.

The embedding model inputs text and outputs a vector (list) of floating point numbers to use for tasks like similarity comparisons and search.

import openai

client = openai.OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key="<FIREWORKS_API_KEY>",
)
response = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input="Spiderman was a particularly entertaining movie with...",
)

print(response)

This code embeds the text "Spiderman was a particularly entertaining movie with..." and returns the following

CreateEmbeddingResponse(data=[Embedding(embedding=[0.006380197126418352, 0.011841800063848495,...], index=0, object='embedding')], model='intfloat/e5-mistral-7b-instruct', object='list', usage=Usage(prompt_tokens=12, total_tokens=12))

Querying documents

Unlike most embedding models, the E5 Mistral model we use is unique because it allows users to give instructions with their query. This enables users to use the same document store for multiple tasks with higher accuracy. Lets say I previously used the embedding model to embed many movie reviews that I stored in a vector database. I now want to create a movie recommender that takes in a user query and outputs recommendations based on this data. The code below demonstrates how to embed the user query and system prompt.

import openai

client = openai.OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key="<FIREWORKS_API_KEY>",
)

query = "I love superhero movies, any recommendations?"
task_description="Given a user query for movies, retrieve the relevant movie that can fulfill the query. "
query_emb = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input=f"Instruct: {task_description}\nQuery: {query}"
)

To view this example end-to-end and see how to use a MongoDB vector store and Fireworks-hosted generation model for RAG, see our full guide.

List of available models

Model namemodel size
nomic-ai/nomic-embed-text-v1.5 (recommended)137M
nomic-ai/nomic-embed-text-v1137M
WhereIsAI/UAE-Large-V1335M
thenlper/gte-large335M
thenlper/gte-base109M