Text embeddings are the foundation of semantic search and retrieval augmented generation (RAG) pipelines. And unlike text generation models, there are many open-source state-of-the-art (SOTA) embedding models that are small enough to run on-device, saving significant amounts on AI spend:

Cost of OpenAI text-embedding-3-large vs. Function

Check out other cloud vs. on-device AI cost comparisons on fxn.ai.

Installing Function LLM

We have created a tiny utility library, Function LLM that patches the OpenAI client to generate embeddings on-device. First, install the library:

# Run this in Terminal
$ npm install fxn-llm

Generate Embeddings Locally

We will be using Nomic’s @nomic/nomic-embed-text-v1.5-quant embedding model which boasts better performance than OpenAI’s text-embedding-3-small:

import { locally } from "fxn-llm"
import { OpenAI } from "openai"

// 💥 Create your OpenAI client
let openai = new OpenAI();

// 🔥 Make it local
openai = locally(openai);

// 🚀 Generate embeddings
const embeddings = openai.embeddings.create({
  model: "@nomic/nomic-embed-text-v1.5-quant",
  input: "search_query: Hello world!"
});

Nomic’s embedding model requires prepending the input string with a prefix indicating the embedding task (e.g. search_query). See the predictor card for more info.