In this two-part blog post series, we will use Rails 8 and SQLite to search over a collection of movies using full-text and vector search. You can find the code in this repository. In this second part, we're searching for movies using semantic search.
Why SQLite?
Much work has been done to make the SQLite adapter and Ruby driver suitable for real production use in Rails 8. Rails 8 transforms SQLite from a lightweight development tool to a reliable choice for production use. Moreover, SQLite now has the capability to effectively power Action Cable, Rails.cache, and Active Job.
If you are interested in learning more about running SQLite in production. I suggest taking a look at Stephen Margheim's Rails World talk SQLite on Rails.
The Application
The app is going to show a catalog of movies I got from the Movie Database (TMDB). Here's the Movie
model that we will use.
# == Schema Information
#
# Table name: movies
#
# id :integer not null, primary key
# overview :text
# poster_url :string
# title :string not null
# created_at :datetime not null
# updated_at :datetime not null
# tmdb_id :bigint not null
#
class Movie < ApplicationRecord
include FullTextSearch
include VectorSearch
end
What is vector search?
Vector search is a technique for finding similar items or data points, typically represented as vectors. In vector search, a vector is a fixed-length array of numerical values that represent the features or characteristics of the data in a way that makes it possible to compare it with other data items. Vectors are typically generated using machine learning models. Vectors capture the semantic relationships between elements, which we can use to understand the meaning and intent behind a query rather than simply matching keywords.
Implementing vector search
Overall, the steps that we'll take are:
1. Install gems
2. Create the virtual table
3. Generate embeddings
4. Write the search query
1. Install gems
To enable SQLite to perform vector search, we'll use the sqlite-vec extension. The neighbor gem is to load the sqlite-vec extension easily and to make it easier to perform the KNN search. Finally, we'll use ruby-openai to generate the embeddings.
gem "sqlite-vec", platform: :ruby_33
gem "neighbor"
gem "ruby-openai"
# config/initializers/neighbor.rb
Neighbor::SQLite.initialize!
# config/initializers/openai.rb
OpenAI.configure do |config|
config.access_token = ENV["OPENAI_ACCESS_TOKEN"]
end
2. Create the virtual table
sqlite-vec offers two ways to perform KNN queries: using vec0
virtual tables or manually with regular tables. We'll use the virtual table approach because it's faster and more compact. To enable joins with the "movies" table, we’ll set the primary key to the movie ID. Since our embedding model outputs vectors with 1536 dimensions, we'll use that as the embedding size. For the similarity algorithm, we’ll opt for cosine similarity, which works well for text-based searches.
class CreateVirtualMovieVectors < ActiveRecord::Migration[8.0]
def change
create_virtual_table :movie_vectors, :vec0, [
"movie_id integer primary key",
"embedding float[1536] distance_metric=cosine"
]
end
end
# app/models/movie_vector.rb
class MovieVector < ApplicationRecord
self.primary_key = "movie_id"
has_neighbors :embedding, dimensions: 1536
def movie = Movie.find(movie_id)
end
3. Generate embeddings
We'll iterate over every movie and create an embedding based on the movie's description. I'm using OpenAI's "text-embedding-3-small", but you can use any model you want. The embedding will be stored on the virtual table.
# db/seeds.rb
Movie.find_each do |movie|
movie.find_or_create_movie_vector
end
# app/models/movie/vector_search.rb
module Movie::VectorSearch
extend ActiveSupport::Concern
included do
has_one :movie_vector
end
def find_or_create_movie_vector
MovieVector.find_or_create_by!(movie_id: id) do |movie_vector|
movie_vector.embedding = Embedding.create(overview).embedding
end
end
end
# app/models/embedding.rb
class Embedding
attr_reader :embedding
def initialize(embedding)
@embedding = embedding
end
def self.create(input)
embedding = OpenAI::Client.new
.embeddings(parameters: {model: "text-embedding-3-small", input:})
.fetch("data")[0]["embedding"]
new(embedding)
end
def to_s = embedding.to_s
end
4. Write the search query
In this last step, we'll take the user's query, create an embedding, and use it to perform the vector search. The vector_search
method will query the vec0
virtual table to find the closest records to the search embedding. We'll order the results by distance to get the most similar results at the top.
class MoviesController < ApplicationController
def index
@movies =
if params[:search_type] == "vector" && params[:search].present?
embedding = Embedding.create(params[:search])
Movie.vector_search(embedding: embedding, limit: 40)
elsif params[:search].present?
Movie.full_text_search(input: params[:search], limit: 40)
else
Movie.all.limit(40)
end
end
end
# app/models/movie/vector_search.rb
module Movie::VectorSearch
extend ActiveSupport::Concern
class_methods do
def vector_search(embedding:, limit: 10)
where("embedding MATCH ? AND k = ?", embedding.to_s, limit)
.joins(:movie_vector)
.order(:distance)
.distinct
end
end
end
Similar Movies
As an added bonus, we can implement a simple recommendations system by leveraging embeddings. By finding the nearest neighbors of a movie's embedding, we can find similar movies to the one the user is currently viewing.
class MoviesController < ApplicationController
def show
@movie = Movie.find(params[:id])
end
end
# app/models/movie/vector_search.rb
module Movie::VectorSearch
extend ActiveSupport::Concern
included do
has_one :movie_vector
end
def similar(limit = 8)
Movie
.vector_search(embedding: movie_vector.embedding, limit: limit + 1)
.where.not("movies.id = ?", id)
end
end
That's it, I hope you found this article helpful. If you have any questions or feedback you can find me at sergio@teloslabs.co.