Image of PostgreSQL Materialized Views: From Basics to Practical Examples in Mastodon

PostgreSQL Materialized Views: From Basics to Practical Examples in Mastodon

Text by Hirotaka Miyagi

Published

Explore PostgreSQL Materialized Views from basics to real-world applications, featuring how Mastodon leverages them for performance optimization.
Table of Contents

Materialized views in PostgreSQL provide a powerful mechanism for caching the results of complex queries, offering significant performance benefits for read-heavy applications. This article explores materialized views from fundamental concepts to practical implementation, using Mastodon's database schema as a real-world example.

Basics of Materialized Views

What is a Materialized View?

A materialized view is a database object that stores the result set of a query. Unlike regular views, which compute their results each time they're queried, materialized views cache the results physically on disk. This makes them particularly useful for complex queries that are expensive to compute but don't require real-time data.

Advantages of Materialized Views

  1. Performance: Queries against materialized views return cached results, eliminating the need to recompute complex calculations or joins
  2. Resource Efficiency: Reduces database load by avoiding repeated execution of resource-intensive queries
  3. Data Denormalization: Allows strategic denormalization without duplicating data maintenance code
  4. Query Simplification: Complex queries can be encapsulated in materialized views, simplifying application code

Drawbacks and Differences from Regular Views

  1. Data Freshness: Materialized views don't automatically reflect source table changes
  2. Storage Requirements: They consume disk space to store the cached results
  3. Maintenance Overhead: Require explicit refreshing to update their data
  4. Atomic Updates: Unlike regular views, updates aren't instantaneous and require a complete refresh

Utilizing Materialized Views in PostgreSQL

Creating a Materialized View

Basic syntax for creating a materialized view:

CREATE MATERIALIZED VIEW view_name
AS
  SELECT columns
  FROM tables
  WHERE conditions
WITH [NO] DATA;

The WITH DATA clause populates the view immediately, while WITH NO DATA creates an empty view that can be populated later.

Updating and Refreshing

To refresh a materialized view:

-- Complete refresh
REFRESH MATERIALIZED VIEW view_name;
 
-- Concurrent refresh (if supported)
REFRESH MATERIALIZED VIEW CONCURRENTLY view_name;
Note: Concurrent refresh requires a unique index on the materialized view.

Optimizing Performance

  1. Indexing: Create appropriate indexes on the materialized view
  2. Refresh Strategy: Choose between complete and concurrent refresh based on your needs
  3. Refresh Timing: Schedule refreshes during low-traffic periods
  4. Query Planning: Use EXPLAIN ANALYZE to verify the materialized view is being used effectively

Practical Examples of Materialized Views in Mastodon

What is Mastodon

Mastodon is an open-source decentralized social networking platform that implements the ActivityPub protocol. As a federated service, it allows users to create accounts on different servers (instances) while still being able to interact with users from other instances. This architectural choice creates unique challenges for data management and query optimization, which is where materialized views come into play.

Mastodon's Schema and Use of Materialized Views

Mastodon's database schema demonstrates sophisticated use of materialized views to optimize complex queries that would otherwise be computationally expensive to execute in real-time. Let's examine three key materialized views from Mastodon's schema:

account_summaries

CREATE MATERIALIZED VIEW public.account_summaries AS
 SELECT accounts.id AS account_id,
    mode() WITHIN GROUP (ORDER BY t0.language) AS language,
    mode() WITHIN GROUP (ORDER BY t0.sensitive) AS sensitive
   FROM (public.accounts
     CROSS JOIN LATERAL ( SELECT statuses.account_id,
            statuses.language,
            statuses.sensitive
           FROM public.statuses
          WHERE ((statuses.account_id = accounts.id)
            AND (statuses.deleted_at IS NULL)
            AND (statuses.reblog_of_id IS NULL))
          ORDER BY statuses.id DESC
         LIMIT 20) t0)
  WHERE ((accounts.suspended_at IS NULL)
    AND (accounts.silenced_at IS NULL)
    AND (accounts.moved_to_account_id IS NULL)
    AND (accounts.discoverable = true)
    AND (accounts.locked = false))
  GROUP BY accounts.id;

This materialized view serves several important purposes:

  1. Content Classification: It determines the predominant language and content sensitivity level for each account based on their recent posts.
  2. Performance Optimization: Instead of analyzing all posts, it focuses on the 20 most recent original posts (excluding reblogs and deleted content).
  3. Account Filtering: It only includes active, discoverable accounts that haven't been suspended or silenced.

The use of mode() aggregate function is particularly clever here, as it identifies the most common language and sensitivity settings for an account's content.

global_follow_recommendations

CREATE MATERIALIZED VIEW public.global_follow_recommendations AS
 SELECT t0.account_id,
    sum(t0.rank) AS rank,
    array_agg(t0.reason) AS reason
   FROM ( SELECT account_summaries.account_id,
            ((count(follows.id))::numeric /
             (1.0 + (count(follows.id))::numeric)) AS rank,
            'most_followed'::text AS reason
           FROM follows
             JOIN account_summaries ON account_summaries.account_id = follows.target_account_id
             JOIN users ON users.account_id = follows.account_id
          WHERE users.current_sign_in_at >= (now() - '30 days'::interval)
            AND account_summaries.sensitive = false
          GROUP BY account_summaries.account_id
         HAVING count(follows.id) >= 5
        UNION ALL
         SELECT account_summaries.account_id,
            (sum(status_stats.reblogs_count + status_stats.favourites_count) /
             (1.0 + sum(status_stats.reblogs_count + status_stats.favourites_count))) AS rank,
            'most_interactions'::text AS reason
           FROM status_stats
             JOIN statuses ON statuses.id = status_stats.status_id
             JOIN account_summaries ON account_summaries.account_id = statuses.account_id
          WHERE statuses.id >= ((date_part('epoch'::text, (now() - '30 days'::interval))
            * (1000)::double precision))::bigint << 16)
            AND account_summaries.sensitive = false
          GROUP BY account_summaries.account_id
         HAVING sum(status_stats.reblogs_count + status_stats.favourites_count) >= 5) t0
  GROUP BY t0.account_id
  ORDER BY (sum(t0.rank)) DESC;

This complex materialized view implements Mastodon's follow recommendation system:

  1. Dual Ranking System: It combines two ranking methods:

    • Follow-based ranking: Accounts with many followers from active users
    • Interaction-based ranking: Accounts whose posts receive high engagement
  2. Smart Normalization: Both ranking methods use a normalization formula (n/(1+n)) to prevent extreme values from dominating the recommendations.

  3. Recent Activity Focus: Only considers data from the last 30 days to maintain relevance.

instances

CREATE MATERIALIZED VIEW public.instances AS
 WITH domain_counts(domain, accounts_count) AS (
         SELECT accounts.domain,
            count(*) AS accounts_count
           FROM public.accounts
          WHERE accounts.domain IS NOT NULL
          GROUP BY accounts.domain
        )
 SELECT domain_counts.domain,
    domain_counts.accounts_count
   FROM domain_counts
UNION
 SELECT domain_blocks.domain,
    COALESCE(domain_counts.accounts_count, (0)::bigint) AS accounts_count
   FROM domain_blocks
     LEFT JOIN domain_counts ON domain_counts.domain = domain_blocks.domain
UNION
 SELECT domain_allows.domain,
    COALESCE(domain_counts.accounts_count, (0)::bigint) AS accounts_count
   FROM domain_allows
     LEFT JOIN domain_counts ON domain_counts.domain = domain_allows.domain;

This materialized view provides a comprehensive overview of Mastodon's federation landscape:

  1. Domain Statistics: Tracks the number of accounts from each domain.
  2. Federation Status: Includes both blocked and allowed domains, even if they currently have no accounts.
  3. Efficient Lookups: Enables quick access to instance statistics without complex joins.

These materialized views demonstrate how Mastodon optimizes complex queries that would be expensive to compute in real-time, particularly for features like content discovery and federation management that are central to the platform's functionality.

The implementation shows careful consideration of performance trade-offs, utilizing techniques like:

  • Limiting the amount of data processed (e.g., 20 most recent posts)
  • Smart normalization of ranking factors
  • Efficient use of joins and aggregations
  • Careful filtering of sensitive or inactive accounts

Conclusion

In this article, we introduced the basics of PostgreSQL Materialized Views and explored practical examples in Mastodon. While Materialized Views can be powerful tools for improving performance and handling complex data aggregation, they should not be used in every case. By carefully evaluating when to apply them and managing refresh strategies and resource usage, you can leverage Materialized Views effectively.

References

Text byHirotaka Miyagi

Hirotaka Miyagi is Tech Lead at ROUTE06. A full-stack engineer specializing in web development, he has built extensive experience through multiple startups before joining ROUTE06 to lead its technical initiatives.

Last edited on

Categories

  • studies
Glossary

There are currently no glossary entries for this blog post.