Cheat Sheet for Rails + PostgreSQL Count Performance

2024/07/31 · 1680 words · 8 minutes to read

The convenience and power of Active Record allows you to interact with your database in a more Ruby-like way, making it easier to write efficient and readable code. However, as with any complex system, performance can sometimes take a hit when using Active Record. One common scenario where performance may suffer is when counting records in your database. In any Rails app, A simple count method call can quickly become a bottleneck, especially for larger datasets. In this post, we’ll dive into some optimization techniques for using Active Record’s count method in your Rails app.

So, what happens behind the scenes when you call count on an Active Record query? In short, count is translated into a SQL query that returns the total number of rows that match your conditions. While this might seem simple, it can quickly become performance-intensive, especially for larger datasets. When you call count, the database has to scan every row in the table to determine the total count.

Here are a few tips that are helpful when doing count:

1. Never Use `count` For Checking Existence 🔗

In many rails applications I have worked on, I found this repeated pattern. Using count to check if a record exists. It is inefficient and can lead to performance issues, Here’s why:

Unnecessary full table scan: .count typically performs a full table scan, counting all matching records.
Overkill for existence check: You only need to know if at least one record exists, not the total count.
Slower performance: .count is slower than alternatives, particularly on large tables.

Instead, use exists?:

exists? is optimized for checking existence, It stops after finding the first matching record. Also, it won’t select any value for DB. It generates an SQL similar to this:

SELECT 1 AS one FROM "users" WHERE status = 'active' LIMIT 1;
copy

This looks very efficient.

# Inefficient
User.where(status: 'active').count > 0

# Efficient
User.where(status: 'active').exists?
copy

2. Know When To Use `size` 🔗

The size method is very similar to count. It will also return the count. There is one difference,

The size is implemented as follows.

If the collection is loaded, it returns the size of the loaded array.
If the collection is not loaded, it executes a COUNT query.

Whereas count always queries the database, which can be slower, especially for large tables.

Use count when you specifically want to get the count from the database and not rely on any pre-loaded data. Use size when you want to avoid unnecessary database queries and prefer to use the already loaded data if available. Also, note that size is beneficial only after loading the items to count.

# Don't do this
posts = Post.includes(:comments)
posts.each do |post|
  puts post.comments.count
end

# Instead use this
posts = Post.includes(:comments)
posts.each do |post|
  puts post.comments.size
end
copy

In this case, since comments have been already loaded. Using size will return the count without an additional database query. Here, we prevented the N+1 query issue by using size. By contrast, #count always performs a new SQL COUNT query, regardless of whether the collection is loaded or not. This can lead to unnecessary database hits if used carelessly.

3. Counter Cache 🔗

Counter caching is a technique used in Rails to improve performance when dealing with frequent count operations on associations. It involves storing the count of associated objects in a dedicated column of the parent model, eliminating the need for expensive COUNT queries. When you have a has_many association and frequently need to count the number of associated records, counter caching can significantly reduce database load. Instead of executing a COUNT query each time, Rails can simply read the cached value from the parent record.

To implement counter caching:

Add a counter cache column to the parent model in a migration:

add_column :authors, :books_count, :integer, default: 0
copy

Enable counter_cache from the child model:

class Book < ApplicationRecord
 belongs_to :author, counter_cache: true
end
copy

Rails will automatically increment or decrement the books_count column when books are added or removed from an author. This is commonly used for tasks like displaying counts in lists or index pages and ordering or when you need filtering based on association counts. It is an effective way to reduce database load in high-traffic applications.

However, be cautious when using counter caches with large datasets or frequently updated associations, as they can introduce additional write operations. Also know that since it relies on rails callbacks to increment and decrement counts, You have to manually update when doing operations like update_all.

4. Estimated count 🔗

You can use this trick when you want to get a total number of items in a table and you only want an estimate count that only needs to be eventually consistent. PostgreSQL stores stats about your tables for various reasons. Here we are making use of that.

class Post < ApplicationRecord
  def self.estimated_count
    query = <<-SQL
      SELECT
      (reltuples / relpages) *
      (pg_relation_size('#{table_name}') / current_setting('block_size')::integer)
      AS count
       FROM
      pg_class
        WHERE
      relname = '#{table_name}';
    SQL

    result = ActiveRecord::Base.connection.execute(query).first
    result.first['count'].to_i
  end
end
copy

Here, we use data stored by PostgreSQL pg_class system catalog table, which stores statistics about the tables. reltuples column in pg_class provides an estimate of the number of rows in a table, and relpages will give that of pages.

Here’s a breakdown of the query:

current_setting('block_size')::integer: Fetches the PostgreSQL block size (usually 8KB) and converts it to an integer.
pg_relation_size('#{table_name}'): Returns the size of the table in bytes
reltuples / relpages: This ratio gives the average number of tuples per page.
pg_relation_size('#{table_name}') / current_setting('block_size')::integer: This calculates the total number of pages in the table by dividing the total size of the table by the block size.

Multiplying these two values provides an estimated total row count.

You can also make it a concern:

module FastCountable
  extend ActiveSupport::Concern

 class_methods do
    def estimated_count
     query = <<-SQL
      SELECT
        (reltuples / relpages) *
        (pg_relation_size('#{table_name}') / current_setting('block_size')::integer)
      AS count
      FROM
      pg_class
      WHERE
      relname = '#{table_name}';
     SQL

     result = ActiveRecord::Base.connection.execute(query).first
     result['count'].to_i
    end
  end
end
copy

5. Making Use of Index Only Scan for Faster Counting 🔗

Index-Only Scans in PostgreSQL is a query optimization technique that can significantly improve the performance of certain types of queries. An Index-Only Scan is a method where PostgreSQL can retrieve the required information directly from the index without needing to access the actual table data. This is possible when all the columns needed for the query are present in the index itself.

When a query can be satisfied entirely by using the information stored in an index, PostgreSQL uses an Index-Only Scan. So that the database only needs to read the index pages, avoiding the need to fetch data from the table pages. This reduces I/O operations, resulting in faster query execution.

Suppose we have a Sale model with a column product_id and the following index.

class AddIndexToSalesProductId < ActiveRecord::Migration[7.0]
  def change
    add_index :sales, :product_id
  end
end
copy

When we do Sale.distinct.count(:product_id) It will generate the following SQL: SELECT COUNT(DISTINCT product_id) FROM sales;

If our table has a large enough amount of data. If we do an EXPLAIN ANALYZE on this query, It will be slow since it will do a sequential scan.

Now consider this,

Sale.select("DISTINCT product_id").size # SELECT COUNT(*) FROM (SELECT DISTINCT product_id FROM "sales") subquery_for_count
copy

This will only work if we use the #size method. (not #count). See the generated query. Now the database will be able to do an index only scan resulting in better performance. This is because the database engine was able to identify it could leverage the above index to find all of the distinct product_ids.

6. Cache Infrequently Updated Counts 🔗

For counts that don’t change frequently, caching can significantly improve performance. Instead of querying the database every time a count is needed, you can store the count in the cache and update it only when necessary.

class Post < ApplicationRecord
 has_many :comments

  def cached_comments_count
    Rails.cache.fetch("post/#{id}/comments_count", expires_in: 1.hour) do
      comments.count
    end
  end

  def update_cached_comments_count
    Rails.cache.write("post/#{id}/comments_count", comments.count)
  end
end
copy

In this example, cached_comments_count retrieves the count from the cache if it exists. If not, it calculates the count and stores it in the cache for 1 hour. The update_cached_comments_count method can be called whenever a comment is added or removed to keep the cache up-to-date. We can update the cached count in a controller like this.

class CommentsController < ApplicationController
  def create
    @post = Post.find(params[:post_id])
    @comment = @post.comments.create(comment_params)
    @post.update_cached_comments_count
  # ...
copy

This approach is helpful for counts that are frequently accessed but infrequently updated. It reduces database load and improves response times, especially for pages that display multiple counts.

7. Don’t Count Separately When Using `find_in_batches` 🔗

When doing find_in_batches, Don’t forget you can count in parallel. You don’t have to count separately. Also, make sure you use the #size method here and not #count.

Here is how you should do it:

total_count = 0
Model.find_in_batches(batch_size: 1000) do |batch|
 total_count += batch.size
end
copy

8. Make Use of Grouped Count 🔗

Suppose you have to count by different enum values of a DB column. Usually, you do this.


draft_count = Article.where(status: "draft").count
published_count = Article.where(status: "published").count
unspecified_count = Article.where(status: nil).count
copy

Here instead of having multiple queries, You can make use #group method in active record.

Article.group("COALESCE(status, 'unspecified')").count # { "draft" => 2, "published" => 1, "unspecified" => 0 }
copy

This way, we are only sending one query into the DB and the database doesn’t have to load everything multiple times.

Conclusion 🔗

Optimizing Active Record’s count operations is crucial for maintaining the performance of your Rails application, especially as your dataset grows. From using exists? for existence checks, leveraging size appropriately, implementing counter caches, to utilizing PostgreSQL’s estimated counts and index-only scans - you can significantly reduce database load and improve response times. Remember, the key is to choose the right tool for each specific scenario. Sometimes, a simple cached value or an estimated count can be more beneficial than an exact, real-time count. In other cases, using group counts or batched operations can help you efficiently process large datasets. As with all optimizations, it’s important to profile your application and identify where count operations are causing bottlenecks. Use these techniques judiciously, always considering the trade-offs between accuracy, performance, and implementation complexity.

Your Feedback Matters!

I'd love to hear your thoughts on this. Connect with me on LinkedIn Or Twitter

Cheat Sheet for Rails + PostgreSQL Count Performance

1. Never Use count For Checking Existence 🔗

2. Know When To Use size 🔗

3. Counter Cache 🔗

4. Estimated count 🔗

5. Making Use of Index Only Scan for Faster Counting 🔗

6. Cache Infrequently Updated Counts 🔗

7. Don’t Count Separately When Using find_in_batches 🔗

8. Make Use of Grouped Count 🔗

Conclusion 🔗

Your Feedback Matters!

Recent Posts

1. Never Use `count` For Checking Existence 🔗

2. Know When To Use `size` 🔗

7. Don’t Count Separately When Using `find_in_batches` 🔗