Text embeddings are the foundation of semantic search and retrieval augmented generation (RAG) pipelines. And unlike text generation models, there are many open-source state-of-the-art (SOTA) embedding models that are small enough to run on-device, saving significant amounts on AI spend:

Cost of OpenAI text-embedding-3-large vs. Function

Check out other cloud vs. on-device AI cost comparisons on fxn.ai.

Installing Function LLM

We have created a tiny utility library, Function LLM that patches the OpenAI client to generate embeddings on-device. First, install the library:

# Run this in Terminal
$ pip install --upgrade fxn-llm

Generate Embeddings Locally

We will be using Nomic’s @nomic/nomic-embed-text-v1.5-quant embedding model which boasts better performance than OpenAI’s text-embedding-3-small:

from fxn_llm import locally
from openai import OpenAI

# 💥 Create your OpenAI client
openai = OpenAI()

# 🔥 Make it local
openai = locally(openai)

# 🚀 Generate embeddings
embeddings = openai.embeddings.create(
    model="@nomic/nomic-embed-text-v1.5-quant",
    input="search_query: Hello world!"
)

Nomic’s embedding model requires prepending the input string with a prefix indicating the embedding task (e.g. search_query). See the predictor card for more info.