Creating an AI powered search using OpenAI text embeddings and Supabase

May 14, 2024


Here's the steps I took to get AI-powered search locally using OpenAI text embeddings and Supabase. Next steps, get it working in production.

Step 0: Enable pgvector and setup your column

The Supabase docs recommend enabling on the extensions schema, but it didn't show up as a data type for me until I enabled it on public. 🤷

create extension if not exists "vector" with schema "public" version '0.4.0';

On my recipes table, I created an embeddings column.

alter table "public"."recipes" add column "embeddings" vector;

This is the column we'll store the OpenAI embeddings in after we prepare our data in the next step.

Step 1: Prepare your data

I was using my in progress recipe website as an example. This site contains a bunch of recipes and I thought it would be cool to add an AI-powered search. My recipes table contains columns for the name, ingredients, instructions, and a bunch more, but primarily I'm interested in those three columns.

I created a super quick and dirty Postgres function to prepare the data that would be used to send to the OpenAI API to create embeddings.


CREATE OR REPLACE FUNCTION public.get_cleaned_recipes()
RETURNS TABLE(recipe_id integer, cleaned_recipe_text text)
LANGUAGE plpgsql
STABLE
AS $function$
BEGIN
RETURN QUERY
SELECT
r.recipe_id,
trim(both ' ,' from
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
'Recipe Name: ' || name || '. Ingredients:' || ingredients || ' Instructions:' || instructions,
E'<[^>]+>', ', ', 'g'
-- Replaces HTML tags with commas and a space
),
E'[\\x80-\\xFF]', '', 'g'
-- Removes non-basic ASCII characters
),
E'[\\n\\r]+', ' ', 'g'
-- Replaces new lines and carriage returns with space
),
E'(,\\s*)+', ', ', 'g'
-- Reduces multiple commas and spaces to a single comma and a space
)
) AS cleaned_recipe_text
FROM
public.recipe as r;
END;
$function$
;

There's probably a better way to do this but I wanted to just get something working ASAP. I may revisit this. This function does what I need though, so I'm running with it for now. I have a workspace in my project called embeddings that contains a node.js script that calls this function and, gets the embeddings and adds them to the embeddings column.

import { createClient } from "@supabase/supabase-js";
import OpenAI from "openai";

const openai = new OpenAI({
apiKey: "<secret-openAI-key>",
});
let recipes = [];

const supabase = createClient(
"http://localhost:54321",
"<secret-supabase-key>"
);

async function getRecipes() {
const { data, error } = await supabase.rpc("get_cleaned_recipes");

if (error) {
console.error(error.message);
throw new Error("Failed to fetch recipes");
}

return data;
}

async function main() {
recipes = await getRecipes();

for (const recipe of recipes) {
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: recipe.cleaned_recipe_text,
encoding_format: "float",
});

const { data, error } = await supabase
.from("recipes")
.upsert({
recipe_id: recipe.recipe_id,
embeddings: embedding.data[0].embedding,
})
.select();

if (error) {
console.error(error.message);
throw new Error("Failed to update recipe embedding");
}
}
}

main();

When I run this script, it successfully populates my database with embeddings for each row. Next, I need to create a function that can actually do the embedding search.

This is the function that does the actual search.

CREATE OR REPLACE FUNCTION public.search_by_vector(query_vector vector)
RETURNS TABLE(recipe_id integer, name character varying, distance double precision)
LANGUAGE plpgsql
AS $function$
BEGIN
RETURN QUERY
SELECT r.recipe_id, r.name, r.embeddings <-> query_vector AS distance
FROM public.recipes as r
ORDER BY distance;
END;
$function$
;

This function will be called after I get the query embeddings back from OpenAI (the text embeddings from the user inputted search).

Step 3: Add the search to the website

Note: This is obviously not something you'd do in production. This is just for local tinkering and experimentation. This method would expose your OpenAI API key to the world if you did this on a public server.

<script setup lang="ts">
import BaseLayout from "@layouts/base-layout.vue";
import Button from "primevue/button";
import InputText from "primevue/inputtext";
import { ref } from "vue";
import OpenAI from "openai";
import { supabase } from "../supabase";
import { useToast } from "primevue/usetoast";
import { useRecipeStore } from "@stores/recipe-store";
import { storeToRefs } from "pinia";
import RecipeCard from "@components/recipe/recipe-card.vue";

const openai = new OpenAI({
apiKey: "<secret-OpenAI-key>",
dangerouslyAllowBrowser: true, // OpenAI makes you set this 🤪
});

const recipeStore = useRecipeStore();
const { recipes: storeRecipes } = storeToRefs(recipeStore);
const toast = useToast();
const searchText = ref("");
const searchResults = ref([]);
const searchForResults = async () => {
// get query embedding
const queryEmbedding = await getQueryEmbedding();

// get search results
const { data, error } = await supabase.rpc("search_by_vector", {
query_vector: queryEmbedding,
});
if (error) {
toast.add({ severity: "error", summary: "Error", detail: error.message });
}
searchResults.value = data;
};

const getQueryEmbedding = async () => {
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: searchText.value,
encoding_format: "float",
});

return embedding.data[0].embedding;
};

const getRecipe = (recipe_id: number) => {
let result;
storeRecipes.value.forEach((recipe, _key) => {
if (recipe.recipe_id === recipe_id) {
result = recipe;
}
});
return result;
};

await recipeStore.loadRecipes();
</script>

<template>
<BaseLayout>
<div>
<div>
<div class="flex p-inputgroup my-4">
<InputText
type="text"
placeholder="Search recipe names"
v-model="searchText"
class="w-full"
/>

<button
class="flex-shrink-0"
@click="searchForResults"
icon="i-prime-lightbulb"
label="Magic AI Search"
/>

<button
class="flex-shrink-0"
@click="searchText = ''"
icon="i-prime-times"
label="Clear"
/>

</div>
</div>
<div class="grid grid-cols-12 gap-4">
<div
class="card col-span-12 sm:col-span-6 md:col-span-4 lg:col-span-3"
v-for="recipe in searchResults"
:key="`${recipe.recipe_id}recipe`"
>

<RecipeCard :recipe="getRecipe(recipe.recipe_id)" />
</div>
</div>
</div>
</BaseLayout>
</template>

In this application, I have my recipes stored in a Map<string, Recipe> . The string is a page slug so loading data for specific routes is easy. I can check the slug of the page trying to load from the router, look at the recipe map, if no recipe data exists for that slug, I can then fetch it. Perhaps I should change my vector search to return the slug instead of recipe_id, this would make getRecipe() way more efficient.

Step 4: Profit