
PostgreSQL Materialized Views: From Basics to Practical Examples in Mastodon
Text by Hirotaka Miyagi
Published
Materialized views in PostgreSQL provide a powerful mechanism for caching the results of complex queries, offering significant performance benefits for read-heavy applications. This article explores materialized views from fundamental concepts to practical implementation, using Mastodon's database schema as a real-world example.
Basics of Materialized Views
What is a Materialized View?
A materialized view is a database object that stores the result set of a query. Unlike regular views, which compute their results each time they're queried, materialized views cache the results physically on disk. This makes them particularly useful for complex queries that are expensive to compute but don't require real-time data.
Advantages of Materialized Views
- Performance: Queries against materialized views return cached results, eliminating the need to recompute complex calculations or joins
- Resource Efficiency: Reduces database load by avoiding repeated execution of resource-intensive queries
- Data Denormalization: Allows strategic denormalization without duplicating data maintenance code
- Query Simplification: Complex queries can be encapsulated in materialized views, simplifying application code
Drawbacks and Differences from Regular Views
- Data Freshness: Materialized views don't automatically reflect source table changes
- Storage Requirements: They consume disk space to store the cached results
- Maintenance Overhead: Require explicit refreshing to update their data
- Atomic Updates: Unlike regular views, updates aren't instantaneous and require a complete refresh
Utilizing Materialized Views in PostgreSQL
Creating a Materialized View
Basic syntax for creating a materialized view:
CREATE MATERIALIZED VIEW view_name
AS
SELECT columns
FROM tables
WHERE conditions
WITH [NO] DATA;
The WITH DATA
clause populates the view immediately, while WITH NO DATA
creates an empty view that can be populated later.
Updating and Refreshing
To refresh a materialized view:
-- Complete refresh
REFRESH MATERIALIZED VIEW view_name;
-- Concurrent refresh (if supported)
REFRESH MATERIALIZED VIEW CONCURRENTLY view_name;
Optimizing Performance
- Indexing: Create appropriate indexes on the materialized view
- Refresh Strategy: Choose between complete and concurrent refresh based on your needs
- Refresh Timing: Schedule refreshes during low-traffic periods
- Query Planning: Use
EXPLAIN ANALYZE
to verify the materialized view is being used effectively
Practical Examples of Materialized Views in Mastodon
What is Mastodon
Mastodon is an open-source decentralized social networking platform that implements the ActivityPub protocol. As a federated service, it allows users to create accounts on different servers (instances) while still being able to interact with users from other instances. This architectural choice creates unique challenges for data management and query optimization, which is where materialized views come into play.
Mastodon's Schema and Use of Materialized Views
Mastodon's database schema demonstrates sophisticated use of materialized views to optimize complex queries that would otherwise be computationally expensive to execute in real-time. Let's examine three key materialized views from Mastodon's schema:
account_summaries
CREATE MATERIALIZED VIEW public.account_summaries AS
SELECT accounts.id AS account_id,
mode() WITHIN GROUP (ORDER BY t0.language) AS language,
mode() WITHIN GROUP (ORDER BY t0.sensitive) AS sensitive
FROM (public.accounts
CROSS JOIN LATERAL ( SELECT statuses.account_id,
statuses.language,
statuses.sensitive
FROM public.statuses
WHERE ((statuses.account_id = accounts.id)
AND (statuses.deleted_at IS NULL)
AND (statuses.reblog_of_id IS NULL))
ORDER BY statuses.id DESC
LIMIT 20) t0)
WHERE ((accounts.suspended_at IS NULL)
AND (accounts.silenced_at IS NULL)
AND (accounts.moved_to_account_id IS NULL)
AND (accounts.discoverable = true)
AND (accounts.locked = false))
GROUP BY accounts.id;
This materialized view serves several important purposes:
- Content Classification: It determines the predominant language and content sensitivity level for each account based on their recent posts.
- Performance Optimization: Instead of analyzing all posts, it focuses on the 20 most recent original posts (excluding reblogs and deleted content).
- Account Filtering: It only includes active, discoverable accounts that haven't been suspended or silenced.
The use of mode()
aggregate function is particularly clever here, as it identifies the most common language and sensitivity settings for an account's content.
global_follow_recommendations
CREATE MATERIALIZED VIEW public.global_follow_recommendations AS
SELECT t0.account_id,
sum(t0.rank) AS rank,
array_agg(t0.reason) AS reason
FROM ( SELECT account_summaries.account_id,
((count(follows.id))::numeric /
(1.0 + (count(follows.id))::numeric)) AS rank,
'most_followed'::text AS reason
FROM follows
JOIN account_summaries ON account_summaries.account_id = follows.target_account_id
JOIN users ON users.account_id = follows.account_id
WHERE users.current_sign_in_at >= (now() - '30 days'::interval)
AND account_summaries.sensitive = false
GROUP BY account_summaries.account_id
HAVING count(follows.id) >= 5
UNION ALL
SELECT account_summaries.account_id,
(sum(status_stats.reblogs_count + status_stats.favourites_count) /
(1.0 + sum(status_stats.reblogs_count + status_stats.favourites_count))) AS rank,
'most_interactions'::text AS reason
FROM status_stats
JOIN statuses ON statuses.id = status_stats.status_id
JOIN account_summaries ON account_summaries.account_id = statuses.account_id
WHERE statuses.id >= ((date_part('epoch'::text, (now() - '30 days'::interval))
* (1000)::double precision))::bigint << 16)
AND account_summaries.sensitive = false
GROUP BY account_summaries.account_id
HAVING sum(status_stats.reblogs_count + status_stats.favourites_count) >= 5) t0
GROUP BY t0.account_id
ORDER BY (sum(t0.rank)) DESC;
This complex materialized view implements Mastodon's follow recommendation system:
-
Dual Ranking System: It combines two ranking methods:
- Follow-based ranking: Accounts with many followers from active users
- Interaction-based ranking: Accounts whose posts receive high engagement
-
Smart Normalization: Both ranking methods use a normalization formula (n/(1+n)) to prevent extreme values from dominating the recommendations.
-
Recent Activity Focus: Only considers data from the last 30 days to maintain relevance.
instances
CREATE MATERIALIZED VIEW public.instances AS
WITH domain_counts(domain, accounts_count) AS (
SELECT accounts.domain,
count(*) AS accounts_count
FROM public.accounts
WHERE accounts.domain IS NOT NULL
GROUP BY accounts.domain
)
SELECT domain_counts.domain,
domain_counts.accounts_count
FROM domain_counts
UNION
SELECT domain_blocks.domain,
COALESCE(domain_counts.accounts_count, (0)::bigint) AS accounts_count
FROM domain_blocks
LEFT JOIN domain_counts ON domain_counts.domain = domain_blocks.domain
UNION
SELECT domain_allows.domain,
COALESCE(domain_counts.accounts_count, (0)::bigint) AS accounts_count
FROM domain_allows
LEFT JOIN domain_counts ON domain_counts.domain = domain_allows.domain;
This materialized view provides a comprehensive overview of Mastodon's federation landscape:
- Domain Statistics: Tracks the number of accounts from each domain.
- Federation Status: Includes both blocked and allowed domains, even if they currently have no accounts.
- Efficient Lookups: Enables quick access to instance statistics without complex joins.
These materialized views demonstrate how Mastodon optimizes complex queries that would be expensive to compute in real-time, particularly for features like content discovery and federation management that are central to the platform's functionality.
The implementation shows careful consideration of performance trade-offs, utilizing techniques like:
- Limiting the amount of data processed (e.g., 20 most recent posts)
- Smart normalization of ranking factors
- Efficient use of joins and aggregations
- Careful filtering of sensitive or inactive accounts
Conclusion
In this article, we introduced the basics of PostgreSQL Materialized Views and explored practical examples in Mastodon. While Materialized Views can be powerful tools for improving performance and handling complex data aggregation, they should not be used in every case. By carefully evaluating when to apply them and managing refresh strategies and resource usage, you can leverage Materialized Views effectively.
References
Text byHirotaka Miyagi
Hirotaka Miyagi is Tech Lead at ROUTE06. A full-stack engineer specializing in web development, he has built extensive experience through multiple startups before joining ROUTE06 to lead its technical initiatives.
Last edited on
There are currently no glossary entries for this blog post.