Building Semantic Search in Ruby on Rails Using the Neighbor Gem
Text by Takafumi Endo
Published
The rise of AI-driven applications has transformed the way modern apps interact with data, moving beyond simple keyword searches to intelligent, meaning-based search. This shift is where vector embeddings—a set of floating-point numbers representing the characteristics of text or images—become essential. Embeddings power semantic search by enabling systems to interpret and match content based on meaning rather than exact keyword matches, which allows applications to deliver highly relevant search results even for complex queries. In Ruby on Rails applications, the Neighbor gem stands out as a game-changer for integrating vector-based semantic search seamlessly with PostgreSQL.
Initially developed to support pgvector, a PostgreSQL extension tailored for handling high-dimensional data, Neighbor enables Rails applications to store and query embeddings efficiently. With its capabilities, developers can tap into the power of vector search to implement features like personalized recommendations, contextual search, and content similarity analysis. In this article, we’ll dive into how the Neighbor gem can revolutionize search functionalities within Rails applications, giving teams an efficient way to implement semantic search with minimal friction.
Why Use the Neighbor Gem in Ruby on Rails?
The Neighbor gem was created to address a fundamental challenge in AI-driven applications: integrating high-dimensional vector data into a relational database like PostgreSQL. Traditionally, Rails applications using PostgreSQL required custom SQL queries to handle vector data, which often led to brittle and complex code. Neighbor transforms this experience by providing native support for vector data within Rails, allowing developers to use vector embeddings as a core part of their data models.
Seamless Integration with PostgreSQL and pgvector
Neighbor leverages the pgvector extension in PostgreSQL, which enables storing vectors in a dedicated column type. This integration allows for operations such as nearest neighbor search, which is pivotal for semantic applications. For instance, a Rails application can store embeddings generated by OpenAI or similar AI services, then perform searches to find the most similar records based on cosine similarity or Euclidean distance. This functionality is particularly powerful for apps that need real-time search or recommendation features, like suggesting related products in an e-commerce store or providing relevant articles in a content app.
The Neighbor gem also simplifies the development workflow. Without Neighbor, developers needed to work around limitations in Rails’ ActiveRecord schema dumps, which could not handle vector data types, leading to potential schema mismatches or errors. Neighbor mitigates this by ensuring vector columns are included in schema files, aligning with Rails’ conventions and simplifying database migrations.
Benefits for AI-Driven Rails Applications
For Rails applications, Neighbor and pgvector offer unique advantages. First, they bring high-dimensional data storage into the familiar Rails ecosystem. By incorporating vector-based search directly in Rails, developers can use ActiveRecord, Rails’ ORM layer, to query and manipulate vector data just like any other data type. This reduces the learning curve and maintenance burden, allowing teams to stay productive within Rails’ conventions while building sophisticated AI-driven features.
Moreover, Neighbor opens up Rails applications to a broad spectrum of AI possibilities. By supporting large vector embeddings—such as the 1536-dimension vectors from OpenAI’s models—applications can implement semantic search, enhance recommendations, and personalize user experiences based on complex data characteristics. This approach is not only effective but also scalable, making Neighbor a valuable asset for Rails applications that aim to leverage AI and machine learning.
Setting Up Neighbor Gem in a Rails Project
Integrating the Neighbor gem into a Rails project is straightforward but involves a few essential steps to ensure that your environment is fully optimized for handling vector embeddings. Let’s break down the setup process.
Step 1: Installing Neighbor and Enabling pgvector in PostgreSQL
To start, you’ll need to add the Neighbor gem to your Rails application. This gem simplifies working with vector embeddings by creating a seamless bridge between Rails and PostgreSQL’s pgvector extension.
-
Add Neighbor to the Gemfile:
After adding it, run
bundle install
to install the gem. -
Enable pgvector in PostgreSQL:
The Neighbor gem requires the pgvector extension to store and query vector embeddings efficiently. You can enable pgvector by running the following SQL command:Alternatively, use the Rails migration generator provided by Neighbor to automatically add the extension:
With this setup, PostgreSQL is now capable of storing high-dimensional vector data, laying the groundwork for semantic search.
Step 2: Generating Vector Embeddings Using AI APIs
Once the Neighbor gem and pgvector are configured, the next step is to generate embeddings that will be stored in your database. These embeddings typically represent text, images, or other data types in a high-dimensional vector format that enables similarity searches. Popular options for generating embeddings include APIs from OpenAI and Anthropic.
To generate embeddings:
-
Choose an API Provider: OpenAI’s models like
text-embedding-ada-002
offer high-dimensional vectors (1536 dimensions) suited for capturing semantic nuances in text data. -
Create a Client to Fetch Embeddings: Set up a service class in Rails to interact with the API. For instance:
-
Store Embeddings in Your Model: Using ActiveRecord, create a model with a vector column to store these embeddings. Neighbor allows you to specify the vector field and the dimensions required for your data:
Step 3: Storing and Querying Vector Embeddings in Rails with Neighbor
With your model set up to handle embeddings, the final step is to store the embeddings generated in Step 2 and query them as needed.
-
Saving Embeddings: After generating an embedding, save it in the appropriate model. Neighbor’s integration with ActiveRecord simplifies this process:
-
Querying for Similar Embeddings: With Neighbor, you can perform nearest neighbor searches directly within Rails. For example, to find documents similar to a specific embedding, use:
Neighbor makes it easy to handle the complexity of querying vector embeddings by providing built-in methods to perform these similarity searches. With this setup, you are ready to implement semantic search and other AI-driven functionalities in your Rails application.
Implementing Semantic Search with Neighbor
After setting up Neighbor, you can leverage it to build semantic search and recommendation systems. Here’s how you can use Neighbor to create a robust recommendation feature in an e-commerce app.
Step-by-Step Guide for Semantic Search
-
Generate Product Embeddings: For each product in the catalog, generate an embedding based on its description or specifications. Store these embeddings in a
Product
model, which has anembedding
column. -
Configure the Search Query: When a user searches for a product, convert the search term to an embedding using the same AI API. This embedding represents the user’s query in a high-dimensional space.
-
Perform Similarity Search: With Neighbor, find products whose embeddings are closest to the query embedding. This enables results based on semantic relevance rather than exact keyword matches:
Neighbor in Rails transforms traditional search capabilities by allowing semantic understanding, bringing applications closer to the AI-driven experiences that today’s users expect.
Performance Optimization and Best Practices
With the Neighbor gem enabling high-dimensional data storage and similarity searches in Rails, optimizing performance is crucial for maintaining speed and reliability, especially as the volume of data scales. Here’s a closer look at database query optimization techniques and caching strategies to keep AI-based searches responsive.
Database Query Optimization Techniques for High-Dimensional Data
Handling vector embeddings can be computationally intensive. Here are some techniques to optimize database queries for high-dimensional data:
-
Use Approximate Nearest Neighbor (ANN) Indexing:
PostgreSQL’s pgvector extension supports approximate nearest neighbor searches, which speed up queries by returning results close to the desired accuracy but with less computational load. The Neighbor gem enables configuring ANN search options, reducing the number of required calculations without sacrificing user experience. -
Optimize Distance Calculations:
Distance metrics, such as cosine similarity or Euclidean distance, are crucial in semantic search but can be resource-intensive. To reduce computational overhead, limit the scope of the distance calculation to the top relevant results. Indexing with the vectorL2
operator in PostgreSQL helps streamline this process. -
Efficient Memory Management with Connection Pooling:
Since high-dimensional data searches often require multiple connections, configure your Rails application’s database connection pool to ensure efficient memory use. This configuration can improve responsiveness by avoiding unnecessary reconnections and managing concurrent queries effectively. -
Leverage Partitioning for Large Datasets:
If your Rails application handles vast amounts of data (such as a large product catalog), consider partitioning tables that store embeddings. Partitioning distributes data across smaller, more manageable tables, resulting in faster query times, particularly for high-frequency searches.
Effective Caching Strategies for AI-Based Search Functionalities
Caching is essential to reduce redundant computations and response times for frequent queries. Here’s how to implement caching effectively in a Rails application using Neighbor:
-
Query Caching for Repeated Searches:
For popular or high-frequency queries, use Rails’ built-in query caching or an external cache (e.g., Redis) to store the results of similarity searches. By caching embeddings and their nearest neighbors, you avoid recalculating distances for repeated queries, improving response times significantly. -
Session-Based Caching for Personalized Results:
In applications like e-commerce, where search results are personalized, store session-based results to improve the user experience. With session-based caching, users receive faster response times when revisiting or refining their previous searches. -
Layered Caching for Embedding Operations:
Vector embeddings often go through preprocessing and transformation stages before storage. Cache intermediate results (e.g., transformed embeddings) to minimize the need for reprocessing, especially if these embeddings are reused across multiple queries or sessions.
Case Study Example: AI-Powered Recommendation Engine in Rails
To illustrate how the Neighbor gem powers a recommendation engine, let’s consider a hypothetical e-commerce platform using Neighbor for personalized recommendations.
Scenario: Enhanced Product Recommendations
Imagine an e-commerce platform where users frequently search for or browse products based on specific features, such as "lightweight running shoes" or "organic skincare products." Using the Neighbor gem, the platform stores vector embeddings of each product based on its features and descriptions.
-
Embedding Generation: For each product, an embedding is generated using OpenAI’s API, capturing the unique characteristics of the product in a high-dimensional vector format.
-
Similarity Search: When a user searches for “water-resistant hiking shoes,” the platform converts the query into a vector embedding, and Neighbor finds the nearest matches in the database. By leveraging the pgvector extension, the platform retrieves the most semantically relevant products, presenting them as recommendations.
-
Real-Time Personalized Recommendations: As the user interacts with the platform, browsing specific categories or favoriting items, the system uses these actions to refine recommendations further, leveraging cached embeddings for faster responses.
Measuring Success: Metrics for User Engagement and Development Efficiency
Implementing this recommendation engine with Neighbor yields valuable success metrics:
- User Engagement: Improved search relevance leads to longer session durations and higher click-through rates on recommended products.
- Conversion Rates: By providing better product matches, users are more likely to find products that meet their needs, translating to higher purchase rates.
- Development Time Savings: The Neighbor gem simplifies complex embedding and search setups, allowing the engineering team to focus on other high-impact features, reducing development time by up to 30% compared to building custom semantic search solutions.
Conclusion
The Neighbor gem transforms Rails applications, enabling them to deliver AI-powered, highly relevant search results through seamless integration with pgvector. By bringing vector embeddings into the Rails ecosystem, Neighbor supports developers in building applications that go beyond keyword search to understand user intent, enhancing personalization and search accuracy.
Looking to the future, advancements in pgvector and Rails are likely to continue, with potential improvements in indexing, embedding support, and ANN search techniques. These developments will make it easier for Rails applications to leverage AI-driven features, setting new standards for search capabilities and user experience across industries. As organizations increasingly recognize the value of semantic search, tools like Neighbor will play a pivotal role in making these sophisticated features accessible to Rails developers everywhere.
Reference:
- Crunchy Data | Ruby on Rails Neighbor Gem for AI Embeddings
- FireHydrant | Semantic Search with Ruby on Rails
- TopEndDevs | Ruby Rogues Podcast on Vector Search in Rails
- Rently | Vector-Based Image Search in Ruby on Rails with Weaviate
Please Note: This article reflects information available at the time of writing. Some code examples and implementation methods may have been created with the support of AI assistants. All implementations should be appropriately customized to match your specific environment and requirements. We recommend regularly consulting official resources and community forums for the latest information and best practices.
Text byTakafumi Endo
Takafumi Endo, CEO of ROUTE06. After earning his MSc from Tohoku University, he founded and led an e-commerce startup acquired by a major retail company. He also served as an EIR at a venture capital firm.
Last edited on
Categories
- studies