Vector search with Rails and SQLite

Linked in logoX logoFacebook logo
Sergio Alvarez
October 22, 2024

In this two-part blog post series, we will use Rails 8 and SQLite to search over a collection of movies using full-text and vector search. You can find the code in this repository. In this second part, we're searching for movies using semantic search.

Why SQLite?

Much work has been done to make the SQLite adapter and Ruby driver suitable for real production use in Rails 8. Rails 8 transforms SQLite from a lightweight development tool to a reliable choice for production use. Moreover, SQLite now has the capability to effectively power Action Cable, Rails.cache, and Active Job.

If you are interested in learning more about running SQLite in production. I suggest taking a look at Stephen Margheim's Rails World talk SQLite on Rails.

The Application

The app is going to show a catalog of movies I got from the Movie Database (TMDB). Here's the Movie model that we will use.

# == Schema Information
#
# Table name: movies
#
#  id         :integer          not null, primary key
#  overview   :text
#  poster_url :string
#  title      :string           not null
#  created_at :datetime         not null
#  updated_at :datetime         not null
#  tmdb_id    :bigint           not null
#
class Movie < ApplicationRecord
  include FullTextSearch
  include VectorSearch
end


What is vector search?

Vector search is a technique for finding similar items or data points, typically represented as vectors. In vector search, a vector is a fixed-length array of numerical values that represent the features or characteristics of the data in a way that makes it possible to compare it with other data items. Vectors are typically generated using machine learning models. Vectors capture the semantic relationships between elements, which we can use to understand the meaning and intent behind a query rather than simply matching keywords.

Implementing vector search

Overall, the steps that we'll take are:
 1. Install gems
 2. Create the virtual table
 3. Generate embeddings
 4. Write the search query

1. Install gems

To enable SQLite to perform vector search, we'll use the sqlite-vec extension. The neighbor gem is to load the sqlite-vec extension easily and to make it easier to perform the KNN search. Finally, we'll use ruby-openai to generate the embeddings.

gem "sqlite-vec", platform: :ruby_33
gem "neighbor"
gem "ruby-openai"

# config/initializers/neighbor.rb
Neighbor::SQLite.initialize!

# config/initializers/openai.rb
OpenAI.configure do |config|
  config.access_token = ENV["OPENAI_ACCESS_TOKEN"]
end

2. Create the virtual table

sqlite-vec offers two ways to perform KNN queries: using vec0 virtual tables or manually with regular tables. We'll use the virtual table approach because it's faster and more compact. To enable joins with the "movies" table, we’ll set the primary key to the movie ID. Since our embedding model outputs vectors with 1536 dimensions, we'll use that as the embedding size. For the similarity algorithm, we’ll opt for cosine similarity, which works well for text-based searches.

class CreateVirtualMovieVectors < ActiveRecord::Migration[8.0]
  def change
    create_virtual_table :movie_vectors, :vec0, [
      "movie_id integer primary key",
      "embedding float[1536] distance_metric=cosine"
    ]
  end
end

# app/models/movie_vector.rb
class MovieVector < ApplicationRecord
  self.primary_key = "movie_id"

  has_neighbors :embedding, dimensions: 1536

  def movie = Movie.find(movie_id)
end

3. Generate embeddings

We'll iterate over every movie and create an embedding based on the movie's description. I'm using OpenAI's "text-embedding-3-small", but you can use any model you want. The embedding will be stored on the virtual table.

# db/seeds.rb
Movie.find_each do |movie|
  movie.find_or_create_movie_vector
end

# app/models/movie/vector_search.rb
module Movie::VectorSearch
  extend ActiveSupport::Concern

  included do
    has_one :movie_vector
  end

  def find_or_create_movie_vector
    MovieVector.find_or_create_by!(movie_id: id) do |movie_vector|
      movie_vector.embedding = Embedding.create(overview).embedding
    end
  end
end

# app/models/embedding.rb
class Embedding
  attr_reader :embedding

  def initialize(embedding)
    @embedding = embedding
  end

  def self.create(input)
    embedding = OpenAI::Client.new
      .embeddings(parameters: {model: "text-embedding-3-small", input:})
      .fetch("data")[0]["embedding"]
    new(embedding)
  end

  def to_s = embedding.to_s
end

4. Write the search query

In this last step, we'll take the user's query, create an embedding, and use it to perform the vector search. The vector_search method will query the vec0 virtual table to find the closest records to the search embedding. We'll order the results by distance to get the most similar results at the top.

class MoviesController < ApplicationController
  def index
    @movies =
      if params[:search_type] == "vector" && params[:search].present?
        embedding = Embedding.create(params[:search])
        Movie.vector_search(embedding: embedding, limit: 40)
      elsif params[:search].present?
        Movie.full_text_search(input: params[:search], limit: 40)
      else
        Movie.all.limit(40)
      end
  end
end

# app/models/movie/vector_search.rb
module Movie::VectorSearch
  extend ActiveSupport::Concern

  class_methods do
    def vector_search(embedding:, limit: 10)
      where("embedding MATCH ? AND k = ?", embedding.to_s, limit)
        .joins(:movie_vector)
        .order(:distance)
        .distinct
    end
  end
end

Similar Movies

As an added bonus, we can implement a simple recommendations system by leveraging embeddings. By finding the nearest neighbors of a movie's embedding, we can find similar movies to the one the user is currently viewing.

class MoviesController < ApplicationController
  def show
    @movie = Movie.find(params[:id])
  end
end

# app/models/movie/vector_search.rb
module Movie::VectorSearch
  extend ActiveSupport::Concern

  included do
    has_one :movie_vector
  end

  def similar(limit = 8)
    Movie
      .vector_search(embedding: movie_vector.embedding, limit: limit + 1)
      .where.not("movies.id = ?", id)
  end
end

That's it, I hope you found this article helpful. If you have any questions or feedback you can find me at sergio@teloslabs.co.

READY FOR
YOUR UPCOMING VENTURE?

We are.
Let's start a conversation.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Our latest
news & insights