The convenience and power of Active Record allows you to interact with your database in a more Ruby-like way, making it easier to write efficient and readable code. However, as with any complex system, performance can sometimes take a hit when using Active Record. One common scenario where performance may suffer is when counting records in your database. In any Rails app, A simple count
method call can quickly become a bottleneck, especially for larger datasets. In this post, we’ll dive into some optimization techniques for using Active Record’s count
method in your Rails app.
So, what happens behind the scenes when you call count
on an Active Record query? In short, count
is translated into a SQL query that returns the total number of rows that match your conditions. While this might seem simple, it can quickly become performance-intensive, especially for larger datasets. When you call count
, the database has to scan every row in the table to determine the total count.
Here are a few tips that are helpful when doing count:
1. Never Use count
For Checking Existence π
In many rails applications I have worked on, I found this repeated pattern. Using count
to check if a record exists. It is inefficient and can lead to performance issues, Here’s why:
- Unnecessary full table scan: .count typically performs a full table scan, counting all matching records.
- Overkill for existence check: You only need to know if at least one record exists, not the total count.
- Slower performance: .count is slower than alternatives, particularly on large tables.
Instead, use exists?
:
exists?
is optimized for checking existence, It stops after finding the first matching record. Also, it won’t select any value for DB. It generates an SQL similar to this:
SELECT 1 AS one FROM "users" WHERE status = 'active' LIMIT 1;
This looks very efficient.
# Inefficient
User.where(status: 'active').count > 0
# Efficient
User.where(status: 'active').exists?
2. Know When To Use size
π
The size
method is very similar to count
. It will also return the count. There is one difference,
The size
is implemented as follows.
- If the collection is loaded, it returns the size of the loaded array.
- If the collection is not loaded, it executes a COUNT query.
Whereas count
always queries the database, which can be slower, especially for large tables.
Use count
when you specifically want to get the count from the database and not rely on any pre-loaded data. Use size
when you want to avoid unnecessary database queries and prefer to use the already loaded data if available. Also, note that size
is beneficial only after loading the items to count.
# Don't do this
posts = Post.includes(:comments)
posts.each do |post|
puts post.comments.count
end
# Instead use this
posts = Post.includes(:comments)
posts.each do |post|
puts post.comments.size
end
In this case, since comments have been already loaded. Using size
will return the count without an additional database query. Here, we prevented the N+1 query issue by using size
. By contrast, #count
always performs a new SQL COUNT query, regardless of whether the collection is loaded or not. This can lead to unnecessary database hits if used carelessly.
3. Counter Cache π
Counter caching is a technique used in Rails to improve performance when dealing with frequent count operations on associations. It involves storing the count of associated objects in a dedicated column of the parent model, eliminating the need for expensive COUNT queries.
When you have a has_many
association and frequently need to count the number of associated records, counter caching can significantly reduce database load. Instead of executing a COUNT query each time, Rails can simply read the cached value from the parent record.
To implement counter caching:
- Add a counter cache column to the parent model in a migration:
add_column :authors, :books_count, :integer, default: 0
- Enable
counter_cache
from the child model:
class Book < ApplicationRecord
belongs_to :author, counter_cache: true
end
Rails will automatically increment or decrement the books_count
column when books are added or removed from an author. This is commonly used for tasks like displaying counts in lists or index pages and ordering or when you need filtering based on association counts. It is an effective way to reduce database load in high-traffic applications.
However, be cautious when using counter caches with large datasets or frequently updated associations, as they can introduce additional write operations. Also know that since it relies on rails callbacks to increment and decrement counts, You have to manually update when doing operations like update_all
.
4. Estimated count π
You can use this trick when you want to get a total number of items in a table and you only want an estimate count that only needs to be eventually consistent. PostgreSQL stores stats about your tables for various reasons. Here we are making use of that.
class Post < ApplicationRecord
def self.estimated_count
query = <<-SQL
SELECT
(reltuples / relpages) *
(pg_relation_size('#{table_name}') / current_setting('block_size')::integer)
AS count
FROM
pg_class
WHERE
relname = '#{table_name}';
SQL
result = ActiveRecord::Base.connection.execute(query).first
result.first['count'].to_i
end
end
Here, we use data stored by PostgreSQL pg_class
system catalog table, which stores statistics about the tables. reltuples
column in pg_class
provides an estimate of the number of rows in a table, and relpages
will give that of pages.
Hereβs a breakdown of the query:
current_setting('block_size')::integer
: Fetches the PostgreSQL block size (usually 8KB) and converts it to an integer.pg_relation_size('#{table_name}')
: Returns the size of the table in bytesreltuples / relpages
: This ratio gives the average number of tuples per page.pg_relation_size('#{table_name}') / current_setting('block_size')::integer
: This calculates the total number of pages in the table by dividing the total size of the table by the block size.
Multiplying these two values provides an estimated total row count.
You can also make it a concern:
module FastCountable
extend ActiveSupport::Concern
class_methods do
def estimated_count
query = <<-SQL
SELECT
(reltuples / relpages) *
(pg_relation_size('#{table_name}') / current_setting('block_size')::integer)
AS count
FROM
pg_class
WHERE
relname = '#{table_name}';
SQL
result = ActiveRecord::Base.connection.execute(query).first
result['count'].to_i
end
end
end
5. Making Use of Index Only Scan for Faster Counting π
Index-Only Scans in PostgreSQL is a query optimization technique that can significantly improve the performance of certain types of queries. An Index-Only Scan is a method where PostgreSQL can retrieve the required information directly from the index without needing to access the actual table data. This is possible when all the columns needed for the query are present in the index itself.
When a query can be satisfied entirely by using the information stored in an index, PostgreSQL uses an Index-Only Scan. So that the database only needs to read the index pages, avoiding the need to fetch data from the table pages. This reduces I/O operations, resulting in faster query execution.
Suppose we have a Sale
model with a column product_id
and the following index.
class AddIndexToSalesProductId < ActiveRecord::Migration[7.0]
def change
add_index :sales, :product_id
end
end
When we do Sale.distinct.count(:product_id)
It will generate the following SQL: SELECT COUNT(DISTINCT product_id) FROM sales;
If our table has a large enough amount of data. If we do an EXPLAIN ANALYZE
on this query, It will be slow since it will do a sequential scan.
Now consider this,
Sale.select("DISTINCT product_id").size # SELECT COUNT(*) FROM (SELECT DISTINCT product_id FROM "sales") subquery_for_count
This will only work if we use the #size
method. (not #count
). See the generated query. Now the database will be able to do an index only scan resulting in better performance. This is because the database engine was able to identify it could leverage the above index to find all of the distinct product_id
s.
6. Cache Infrequently Updated Counts π
For counts that don’t change frequently, caching can significantly improve performance. Instead of querying the database every time a count is needed, you can store the count in the cache and update it only when necessary.
class Post < ApplicationRecord
has_many :comments
def cached_comments_count
Rails.cache.fetch("post/#{id}/comments_count", expires_in: 1.hour) do
comments.count
end
end
def update_cached_comments_count
Rails.cache.write("post/#{id}/comments_count", comments.count)
end
end
In this example, cached_comments_count
retrieves the count from the cache if it exists. If not, it calculates the count and stores it in the cache for 1 hour. The update_cached_comments_count method
can be called whenever a comment is added or removed to keep the cache up-to-date. We can update the cached count in a controller like this.
class CommentsController < ApplicationController
def create
@post = Post.find(params[:post_id])
@comment = @post.comments.create(comment_params)
@post.update_cached_comments_count
# ...
This approach is helpful for counts that are frequently accessed but infrequently updated. It reduces database load and improves response times, especially for pages that display multiple counts.
7. Don’t Count Separately When Using find_in_batches
π
When doing find_in_batches
, Don’t forget you can count in parallel. You don’t have to count separately. Also, make sure you use the #size
method here and not #count
.
Here is how you should do it:
total_count = 0
Model.find_in_batches(batch_size: 1000) do |batch|
total_count += batch.size
end
8. Make Use of Grouped Count π
Suppose you have to count by different enum values of a DB column. Usually, you do this.
draft_count = Article.where(status: "draft").count
published_count = Article.where(status: "published").count
unspecified_count = Article.where(status: nil).count
Here instead of having multiple queries, You can make use #group
method in active record.
Article.group("COALESCE(status, 'unspecified')").count # { "draft" => 2, "published" => 1, "unspecified" => 0 }
This way, we are only sending one query into the DB and the database doesn’t have to load everything multiple times.
Conclusion π
Optimizing Active Record’s count
operations is crucial for maintaining the performance of your Rails application, especially as your dataset grows. From using exists?
for existence checks, leveraging size
appropriately, implementing counter caches, to utilizing PostgreSQL’s estimated counts and index-only scans - you can significantly reduce database load and improve response times. Remember, the key is to choose the right tool for each specific scenario. Sometimes, a simple cached value or an estimated count can be more beneficial than an exact, real-time count. In other cases, using group
counts or batched operations can help you efficiently process large datasets. As with all optimizations, it’s important to profile your application and identify where count operations are causing bottlenecks. Use these techniques judiciously, always considering the trade-offs between accuracy, performance, and implementation complexity.