Querying embedding models

Fireworks hosts many embedding models, and we will walk through an example of using nomic-ai/nomic-embed-text-v1.5 today to see how to query Fireworks with embeddings API.

Embedding documents

Our embeddings service is OpenAI compatible. Use OpenAI's embeddings guide and OpenAI's embeddings documentation for more detailed information on our embedding model usage.

The embedding model inputs text and outputs a vector (list) of floating point numbers to use for tasks like similarity comparisons and search.

import openai

client = openai.OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key="<FIREWORKS_API_KEY>",
)
response = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input="search_document: Spiderman was a particularly entertaining movie with...",
)

print(response)

This code embeds the text "search_document: Spiderman was a particularly entertaining movie with..." and returns the following

CreateEmbeddingResponse(data=[Embedding(embedding=[0.006380197126418352, 0.011841800063848495,...], index=0, object='embedding')], model='intfloat/e5-mistral-7b-instruct', object='list', usage=Usage(prompt_tokens=12, total_tokens=12))

However, you might have noticed the interesting prefix with search_document: , what is that supposed to mean?

Embedding queries and document

Nomic models have been fine tuned to take prefixes, and for user query you will need to prefix it with search_query: , and for documents you need to prefix with search_document: . What does that mean exactly?

  • Lets say I previously used the embedding model to embed many movie reviews that I stored in a vector database. All the documents should come with a prefix of search_document:
  • I now want to create a movie recommender that takes in a user query and outputs recommendations based on this data. The code below demonstrates how to embed the user query and system prompt.
import openai

client = openai.OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key="<FIREWORKS_API_KEY>",
)

query = "I love superhero movies, any recommendations?"
task_description="Given a user query for movies, retrieve the relevant movie that can fulfill the query. "
query_emb = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input=f"search_query: {query}"
)

To view this example end-to-end and see how to use a MongoDB vector store and Fireworks-hosted generation model for RAG, see our full guide. For more information on what kind of prefixes are possible with nomic, please check out this guide from nomic.

Variable dimensions

The model also supports variable embedding dimension sizes. In this case, we can provide dimension as a query to the embeddings.create request

import openai
client = openai.OpenAI(
  api_key=input(),
  base_url="https://api.fireworks.ai/inference/v1"
)

response = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input="search_document: I like Christmas movies, can you make any recommendations?",
  dimensions=128,
)
print(len(response.data[0].embedding))

You will see that the returned results are embeddings with dimension 128.

List of available models

Model namemodel size
nomic-ai/nomic-embed-text-v1.5 (recommended)137M
nomic-ai/nomic-embed-text-v1137M
WhereIsAI/UAE-Large-V1335M
thenlper/gte-large335M
thenlper/gte-base109M