Image of Building Semantic Search in Ruby on Rails Using the Neighbor Gem
( Semantic Searh )( Ruby on Rails )

Building Semantic Search in Ruby on Rails Using the Neighbor Gem

Text by Takafumi Endo

Published

Explore semantic search in Rails with the Neighbor gem. Learn how vector embeddings and PostgreSQL enable intelligent, meaning-based search in modern apps.
Table of Contents

The rise of AI-driven applications has transformed the way modern apps interact with data, moving beyond simple keyword searches to intelligent, meaning-based search. This shift is where vector embeddings—a set of floating-point numbers representing the characteristics of text or images—become essential. Embeddings power semantic search by enabling systems to interpret and match content based on meaning rather than exact keyword matches, which allows applications to deliver highly relevant search results even for complex queries. In Ruby on Rails applications, the Neighbor gem stands out as a game-changer for integrating vector-based semantic search seamlessly with PostgreSQL.

Initially developed to support pgvector, a PostgreSQL extension tailored for handling high-dimensional data, Neighbor enables Rails applications to store and query embeddings efficiently. With its capabilities, developers can tap into the power of vector search to implement features like personalized recommendations, contextual search, and content similarity analysis. In this article, we’ll dive into how the Neighbor gem can revolutionize search functionalities within Rails applications, giving teams an efficient way to implement semantic search with minimal friction.

Why Use the Neighbor Gem in Ruby on Rails?

The Neighbor gem was created to address a fundamental challenge in AI-driven applications: integrating high-dimensional vector data into a relational database like PostgreSQL. Traditionally, Rails applications using PostgreSQL required custom SQL queries to handle vector data, which often led to brittle and complex code. Neighbor transforms this experience by providing native support for vector data within Rails, allowing developers to use vector embeddings as a core part of their data models.

Seamless Integration with PostgreSQL and pgvector

Neighbor leverages the pgvector extension in PostgreSQL, which enables storing vectors in a dedicated column type. This integration allows for operations such as nearest neighbor search, which is pivotal for semantic applications. For instance, a Rails application can store embeddings generated by OpenAI or similar AI services, then perform searches to find the most similar records based on cosine similarity or Euclidean distance. This functionality is particularly powerful for apps that need real-time search or recommendation features, like suggesting related products in an e-commerce store or providing relevant articles in a content app.

The Neighbor gem also simplifies the development workflow. Without Neighbor, developers needed to work around limitations in Rails’ ActiveRecord schema dumps, which could not handle vector data types, leading to potential schema mismatches or errors. Neighbor mitigates this by ensuring vector columns are included in schema files, aligning with Rails’ conventions and simplifying database migrations.

Benefits for AI-Driven Rails Applications

For Rails applications, Neighbor and pgvector offer unique advantages. First, they bring high-dimensional data storage into the familiar Rails ecosystem. By incorporating vector-based search directly in Rails, developers can use ActiveRecord, Rails’ ORM layer, to query and manipulate vector data just like any other data type. This reduces the learning curve and maintenance burden, allowing teams to stay productive within Rails’ conventions while building sophisticated AI-driven features.

Moreover, Neighbor opens up Rails applications to a broad spectrum of AI possibilities. By supporting large vector embeddings—such as the 1536-dimension vectors from OpenAI’s models—applications can implement semantic search, enhance recommendations, and personalize user experiences based on complex data characteristics. This approach is not only effective but also scalable, making Neighbor a valuable asset for Rails applications that aim to leverage AI and machine learning.

Setting Up Neighbor Gem in a Rails Project

Integrating the Neighbor gem into a Rails project is straightforward but involves a few essential steps to ensure that your environment is fully optimized for handling vector embeddings. Let’s break down the setup process.

Step 1: Installing Neighbor and Enabling pgvector in PostgreSQL

To start, you’ll need to add the Neighbor gem to your Rails application. This gem simplifies working with vector embeddings by creating a seamless bridge between Rails and PostgreSQL’s pgvector extension.

  1. Add Neighbor to the Gemfile:

    gem 'neighbor'

    After adding it, run bundle install to install the gem.

  2. Enable pgvector in PostgreSQL:
    The Neighbor gem requires the pgvector extension to store and query vector embeddings efficiently. You can enable pgvector by running the following SQL command:

    CREATE EXTENSION IF NOT EXISTS "vector";

    Alternatively, use the Rails migration generator provided by Neighbor to automatically add the extension:

    rails generate neighbor:vector
    rails db:migrate

With this setup, PostgreSQL is now capable of storing high-dimensional vector data, laying the groundwork for semantic search.

Step 2: Generating Vector Embeddings Using AI APIs

Once the Neighbor gem and pgvector are configured, the next step is to generate embeddings that will be stored in your database. These embeddings typically represent text, images, or other data types in a high-dimensional vector format that enables similarity searches. Popular options for generating embeddings include APIs from OpenAI and Anthropic.

To generate embeddings:

  1. Choose an API Provider: OpenAI’s models like text-embedding-ada-002 offer high-dimensional vectors (1536 dimensions) suited for capturing semantic nuances in text data.

  2. Create a Client to Fetch Embeddings: Set up a service class in Rails to interact with the API. For instance:

    class OpenAIClient
      def initialize(api_key)
        @api_key = api_key
      end
     
      def generate_embedding(text)
        response = Faraday.post("https://api.openai.com/v1/embeddings") do |req|
          req.headers['Authorization'] = "Bearer #{@api_key}"
          req.body = { model: "text-embedding-ada-002", input: text }.to_json
        end
        JSON.parse(response.body)["data"].first["embedding"]
      end
    end
  3. Store Embeddings in Your Model: Using ActiveRecord, create a model with a vector column to store these embeddings. Neighbor allows you to specify the vector field and the dimensions required for your data:

    class Document < ApplicationRecord
      has_neighbors :embedding
    end

Step 3: Storing and Querying Vector Embeddings in Rails with Neighbor

With your model set up to handle embeddings, the final step is to store the embeddings generated in Step 2 and query them as needed.

  1. Saving Embeddings: After generating an embedding, save it in the appropriate model. Neighbor’s integration with ActiveRecord simplifies this process:

    doc = Document.new(title: "AI in Rails", content: "Exploring AI and embeddings")
    doc.embedding = OpenAIClient.new(api_key).generate_embedding(doc.content)
    doc.save
  2. Querying for Similar Embeddings: With Neighbor, you can perform nearest neighbor searches directly within Rails. For example, to find documents similar to a specific embedding, use:

    similar_docs = doc.nearest_neighbors(:embedding, distance: "cosine")

Neighbor makes it easy to handle the complexity of querying vector embeddings by providing built-in methods to perform these similarity searches. With this setup, you are ready to implement semantic search and other AI-driven functionalities in your Rails application.

Implementing Semantic Search with Neighbor

After setting up Neighbor, you can leverage it to build semantic search and recommendation systems. Here’s how you can use Neighbor to create a robust recommendation feature in an e-commerce app.

  1. Generate Product Embeddings: For each product in the catalog, generate an embedding based on its description or specifications. Store these embeddings in a Product model, which has an embedding column.

  2. Configure the Search Query: When a user searches for a product, convert the search term to an embedding using the same AI API. This embedding represents the user’s query in a high-dimensional space.

  3. Perform Similarity Search: With Neighbor, find products whose embeddings are closest to the query embedding. This enables results based on semantic relevance rather than exact keyword matches:

    similar_products = Product.nearest_neighbors(:embedding, embedding: query_embedding, distance: "cosine")

Neighbor in Rails transforms traditional search capabilities by allowing semantic understanding, bringing applications closer to the AI-driven experiences that today’s users expect.

Performance Optimization and Best Practices

With the Neighbor gem enabling high-dimensional data storage and similarity searches in Rails, optimizing performance is crucial for maintaining speed and reliability, especially as the volume of data scales. Here’s a closer look at database query optimization techniques and caching strategies to keep AI-based searches responsive.

Database Query Optimization Techniques for High-Dimensional Data

Handling vector embeddings can be computationally intensive. Here are some techniques to optimize database queries for high-dimensional data:

  1. Use Approximate Nearest Neighbor (ANN) Indexing:
    PostgreSQL’s pgvector extension supports approximate nearest neighbor searches, which speed up queries by returning results close to the desired accuracy but with less computational load. The Neighbor gem enables configuring ANN search options, reducing the number of required calculations without sacrificing user experience.

  2. Optimize Distance Calculations:
    Distance metrics, such as cosine similarity or Euclidean distance, are crucial in semantic search but can be resource-intensive. To reduce computational overhead, limit the scope of the distance calculation to the top relevant results. Indexing with the vector L2 operator in PostgreSQL helps streamline this process.

  3. Efficient Memory Management with Connection Pooling:
    Since high-dimensional data searches often require multiple connections, configure your Rails application’s database connection pool to ensure efficient memory use. This configuration can improve responsiveness by avoiding unnecessary reconnections and managing concurrent queries effectively.

  4. Leverage Partitioning for Large Datasets:
    If your Rails application handles vast amounts of data (such as a large product catalog), consider partitioning tables that store embeddings. Partitioning distributes data across smaller, more manageable tables, resulting in faster query times, particularly for high-frequency searches.

Effective Caching Strategies for AI-Based Search Functionalities

Caching is essential to reduce redundant computations and response times for frequent queries. Here’s how to implement caching effectively in a Rails application using Neighbor:

  1. Query Caching for Repeated Searches:
    For popular or high-frequency queries, use Rails’ built-in query caching or an external cache (e.g., Redis) to store the results of similarity searches. By caching embeddings and their nearest neighbors, you avoid recalculating distances for repeated queries, improving response times significantly.

  2. Session-Based Caching for Personalized Results:
    In applications like e-commerce, where search results are personalized, store session-based results to improve the user experience. With session-based caching, users receive faster response times when revisiting or refining their previous searches.

  3. Layered Caching for Embedding Operations:
    Vector embeddings often go through preprocessing and transformation stages before storage. Cache intermediate results (e.g., transformed embeddings) to minimize the need for reprocessing, especially if these embeddings are reused across multiple queries or sessions.

Case Study Example: AI-Powered Recommendation Engine in Rails

To illustrate how the Neighbor gem powers a recommendation engine, let’s consider a hypothetical e-commerce platform using Neighbor for personalized recommendations.

Scenario: Enhanced Product Recommendations

Imagine an e-commerce platform where users frequently search for or browse products based on specific features, such as "lightweight running shoes" or "organic skincare products." Using the Neighbor gem, the platform stores vector embeddings of each product based on its features and descriptions.

  1. Embedding Generation: For each product, an embedding is generated using OpenAI’s API, capturing the unique characteristics of the product in a high-dimensional vector format.

  2. Similarity Search: When a user searches for “water-resistant hiking shoes,” the platform converts the query into a vector embedding, and Neighbor finds the nearest matches in the database. By leveraging the pgvector extension, the platform retrieves the most semantically relevant products, presenting them as recommendations.

  3. Real-Time Personalized Recommendations: As the user interacts with the platform, browsing specific categories or favoriting items, the system uses these actions to refine recommendations further, leveraging cached embeddings for faster responses.

Measuring Success: Metrics for User Engagement and Development Efficiency

Implementing this recommendation engine with Neighbor yields valuable success metrics:

  • User Engagement: Improved search relevance leads to longer session durations and higher click-through rates on recommended products.
  • Conversion Rates: By providing better product matches, users are more likely to find products that meet their needs, translating to higher purchase rates.
  • Development Time Savings: The Neighbor gem simplifies complex embedding and search setups, allowing the engineering team to focus on other high-impact features, reducing development time by up to 30% compared to building custom semantic search solutions.

Conclusion

The Neighbor gem transforms Rails applications, enabling them to deliver AI-powered, highly relevant search results through seamless integration with pgvector. By bringing vector embeddings into the Rails ecosystem, Neighbor supports developers in building applications that go beyond keyword search to understand user intent, enhancing personalization and search accuracy.

Looking to the future, advancements in pgvector and Rails are likely to continue, with potential improvements in indexing, embedding support, and ANN search techniques. These developments will make it easier for Rails applications to leverage AI-driven features, setting new standards for search capabilities and user experience across industries. As organizations increasingly recognize the value of semantic search, tools like Neighbor will play a pivotal role in making these sophisticated features accessible to Rails developers everywhere.


Reference:

Please Note: This article reflects information available at the time of writing. Some code examples and implementation methods may have been created with the support of AI assistants. All implementations should be appropriately customized to match your specific environment and requirements. We recommend regularly consulting official resources and community forums for the latest information and best practices.


Text byTakafumi Endo

Takafumi Endo, CEO of ROUTE06. After earning his MSc from Tohoku University, he founded and led an e-commerce startup acquired by a major retail company. He also served as an EIR at a venture capital firm.

Last edited on

Categories

  • Knowledge

Tags

  • Semantic Searh
  • Ruby on Rails