Embeddings API

The Embeddings API provides a simple way to retrieve embeddings for a given text, which can then be used for Retrieval-Augmented Generation (RAG), fine-tuning, semantic search, clustering, recommendations, anomaly detection, classification, and many other AI applications.

Quick Start

Step 1: Get an access token

Sign up for an Embedefy account. It is free. Once you sign in, go to Access Tokens page an click on the "New Token" button to create a new access token.

Step 2: Get embeddings

Once you have an access token open a terminal and export the access key for the current session:

export EMBEDEFY_ACCESS_TOKEN=<your access token>

Now you can get embeddings for a given text:

curl "https://api.embedefy.com/v1/embeddings" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $EMBEDEFY_ACCESS_TOKEN" \
  --data '{
      "model": "e5-small-v2",
      "inputs": ["hello there"]
  }'

Step 3: Use embeddings

There are many ways to use embeddings, and it is impossible to cover them all in this quick start guide. Therefore, we will implement a basic app to demonstrate RAG (Retrieval-Augmented Generation).


embeddings

Let's build an API service called FoodTruckGPT:

We are going to build a simple API service that will recommend food trucks based on the user prompt. The API service will:

  1. Take a user prompt as input and retrieves the most relevant food trucks using embeddings.
  2. Generate a combined prompt using the user prompt and the retrieved food trucks data.
  3. Send the combined prompt to an LLM and returns the response to the user.

For the following steps, you will need:


To simplify this document, we will clone a GitHub repository containing code for an API service.

git clone https://github.com/embedefy/docs.git

The API service in this repository uses the GPT-3.5 Large Language Model (LLM) provided by OpenAI to generate responses. That being said, you can access open-source models via HuggingFace or run them on your own device using tools like llama.cpp.

Go to examples/foodtruck directory and run the following commands. It will:

  1. Launch a PostgreSQL database server in a container.
  2. Import San Francisco food trucks data into the database.
  3. Generate embeddings for the food trucks data using Embedefy.
  4. Launch a REST API service in a container and expose it on port 3003.

export EMBEDEFY_ACCESS_TOKEN=<your Embedefy access token>
export OPENAI_API_KEY=<your OpenAI API key>

docker compose --file docker-compose.yml up

If the command fails with an error, try running it again using the --build argument. For a successful result, you should see an output in the terminal similar to the following:

...
 ✔ Network foodtruck_default         Created                                                                                                                                                                     0.0s
 ✔ Volume "foodtruck_postgres_data"  Created                                                                                                                                                                     0.0s
 ✔ Container foodtruck-postgres      Created                                                                                                                                                                     0.0s
 ✔ Container foodtruck-init          Created                                                                                                                                                                     0.0s
 ✔ Container foodtruck-app           Created
...
foodtruck-postgres  | server started
...
foodtruck-app       | server listening on port 0.0.0.0:3003
...
foodtruck-init      | initializing database schema...
foodtruck-init      | importing locations...
foodtruck-init      | importing trucks...
foodtruck-init      | importing foods...
foodtruck-init      | importing schedules...
foodtruck-init      | generating embeddings...
foodtruck-init      | done
...

Now open a new terminal window and make a request to your API service using the following command (note that it may take little bit time):

curl "http://localhost:3003/" \
  --header "Content-Type: application/json" \
  --data '{"query": "Where can I eat chicken quesadilla?"}'

The response should be similar to the following:

{
  "response": "You can enjoy chicken quesadillas at the following food trucks:\n\n1. Food Truck: Buenafe\n   Location: 220 RANKIN ST\n   Schedule: Tuesday 09:00 AM - 05:00 PM\n\n2. Food Truck: CARDONA'S FOOD TRUCK\n   Location: 1800 MISSION ST\n   Schedule: Tuesday 08:00 AM - 05:00 PM\n\n3. Food Truck: Plaza Garibaldy\n   Location: 540 HOWARD ST\n   Schedule: Tuesday 03:00 PM - 08:00 PM\n\n4. Food Truck: El Alambre\n   Location: 1800 FOLSOM ST\n   Schedule: Tuesday 10:00 AM - 06:00 PM\n\nPlease note that the schedules provided are subject to change. It is advisable to check with the food trucks directly for any updates. Enjoy your chicken quesadilla!"
}

What we observe here is that although the OpenAI GPT-3.5 model does not have up-to-date information (its knowledge cutoff date is January 2022) about food trucks in San Francisco, where the data updates daily, it was able to generate an accurate and current response regarding locations and schedules. We achieved this by combining related data (using embeddings) with the user prompt and then sending it to the LLM.

You can clean up the containers with docker compose --file docker-compose.yml down

To keep this document short we will not go deep into details about the code. However, we will point out a few things:

Generating embeddings from user queries

We use the Embedefy API to get embeddings for a user query, such as 'Where can I eat chicken quesadilla?'. Note that it is possible to chunk the user query into multiple parts, obtaining embeddings for each chunk separately, and then querying the database for each chunk. However, for simplicity, we are obtaining embeddings for the entire query. Afterward, we query the database to find the top five most relevant food items. See the following code snippet here:

const userEmbedding = JSON.stringify(await embedefyRequest(embedefyAccessToken, userQuery))

// Query the database for the most similar food items
const { rows: foodRows } = await pool.query(
  `
  SELECT id, name, 1 - (embedding <=> $1) AS cosine_similarity
  FROM foods
  ORDER BY cosine_similarity DESC
  LIMIT 5
  `,
  [userEmbedding]
)
for (const row of foodRows) {
  foods.push({
    id: row.id,
    name: row.name,
    cosine_similarity: row.cosine_similarity,
  })
}

Once we have the relevant food trucks, locations, and schedules, we combine them with the user query and send it to the LLM. See the following code snippet here:

let content = `Current date: ${now}\n`
for (const truck of truckList) {
  content = `${content}-\nFood Truck: ${truck.name}`
  content = `${content}\nMenu: ${truck.foodItems}`
  for (const location of truck.locations) {
    content = `${content}\nLocation: ${location.address}`
    for (const schedule of location.schedules) {
      content = `${content} - ${schedule.day_of_week} ${schedule.start_time} - ${schedule.end_time}`
    }
  }
  content = `${content}\n`
}
content = `${content}-\n\nUser query: ${userQuery}`


If you have any questions, visit our GitHub repo and open an issue.