Deploying LoRA models

This guide will walk you through uploading and deploying your own fine-tuned LoRA models.

Installing firectl

The firectl command-line interface (CLI) will be used to manage your LLM
models.

curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-arm64.gz -o firectl.gz
gzip -d firectl.gz && chmod a+x firectl
sudo mv firectl /usr/local/bin/firectl
sudo chown root: /usr/local/bin/firectl
curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-amd64.gz -o firectl.gz
gzip -d firectl.gz && chmod a+x firectl
sudo mv firectl /usr/local/bin/firectl
sudo chown root: /usr/local/bin/firectl
wget -O firectl.gz https://storage.googleapis.com/fireworks-public/firectl/stable/linux-amd64.gz
gunzip firectl.gz
sudo install -o root -g root -m 0755 firectl /usr/local/bin/firectl

Signing in

Run the following command to sign into Fireworks:

firectl signin

Confirm that you have successfully signed in by listing your account:

firectl list accounts

You should see your account ID.

Uploading a fine-tuned model

Make sure to review the requirements for a fine-tuned model. Sample configs for supported models are available here.

To upload a fine-tuned model located at /tmp/falcon-7b-addon/, run:

firectl create model my-model /tmp/falcon-7b-addon/

Once uploaded, you can see your model with:

firectl list models

Deploying your model

To deploy the model for inference, run:

firectl deploy my-model

Testing your model

Once your model is deployed, you can query it on the model page.

  1. Visit the list of your models.
  2. Click on the model you deployed.
  3. Enter your text prompt and click "Generate Completion"

You should see your model's response streamed below.

Using the API

You can also directly query the model using the /v1/completions API:

curl \
  -H "Authorization: Bearer ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"model": "accounts/<ACCOUNT_ID>/models/my-model", "prompt": "hello, the sky is"}' \
  https://api.fireworks.ai/inference/v1/completions
import fireworks.client

fireworks.client.configure(
		api_base="https://api.fireworks.ai/inference",
		api_key="<API_KEY>",
)

fireworks.client.Completion.create(
	model="falcon-7b-addon",
	prompt="Say this is a test",
	max_tokens=7,
	temperature=0,
)

Cleaning up

Now that you are finished with the guide, you can undeploy the models to avoid accruing charges on your account:

firectl undeploy my-model

You can also delete the model from your account:

firectl delete model my-model

Deployment limits

Non-enterprise accounts are limited to a maximum of 100 deployed models.