PostgreSQL is one of the most popular OLTP (Online Transaction Processing) databases in the world. It is optimized for handling large numbers of small, concurrent transactions with short response times. This kind of workload sits at the heart of most businesses today.
However, the real world isn’t neat and tidy. Most organizations inevitably find themselves needing to run long-running OLAP (Online Analytical Processing) queries against their primary OLTP database for various purposes, such as:
These long-running queries represent a fundamentally different workload than what PostgreSQL was optimized for, creating tension between operational needs and these other requirements.
When long-running queries share resources with your primary OLTP workload, several problems quickly emerge.
Long-running analytical queries often need to read large portions of your database, which can lead to lock contention with your application's normal operations. By default, Postgres uses row-level locking, a strategy that is ideal for individual row access that make up most OLTP workloads.
However, acquiring a large number of row-level locks to service a long-running query has two problems. First, acquiring so many locks becomes expensive and slow. Second, holding these locks can create lock contention with ongoing mutations to the table.
Lock contention results in queuing delays, which in turn result in degraded performance for your primary application.
Beyond lock contention, these long-running queries can also consume substantial computational, memory and I/O resources.
As a single node database, Postgres has a limit on the total resources it has available, meaning any CPU cycles used by long-running queries are a trade-off against servicing queries for your primary application. In periods of peak utilization, this can result in increased queueing delays while transactions wait for a CPU to run on.
Memory is another concern, as long-running queries tend to examine a lot of data, wreaking havoc on most OLTP caching strategies. This results in more of the OLTP queries needing to go to disk for their data, dramatically slowing those queries.
Disk I/O is also a bottleneck. As disk utilization increases, it becomes a contended resource like CPU and memory, introducing similar delays and constraints.
Long-running queries place strains on all of the key contended resources of a database, and in smaller or already-constrained database instances where resources are limited, it can have a dramatic effect.
As your business grows, so does the number of long-running queries and batch jobs. This creates a scheduling puzzle.
Ideally, long-running queries would run when the primary workload is at its lightest (typically nights and weekends). Unfortunately, as a business grows, the demand for long-running queries increases while the window of reduced utilization decreases (especially as a business goes global).
This tension inevitably results in a no-win situation that leads to resource contention.
Organizations have traditionally addressed these challenges in a few ways, each with their own tradeoffs.
The simplest solution to resource contention is to get more resources, which is why vertical scaling is inevitably the first solution that everyone uses. Modern cloud providers have made upgrading your database to a machine with more CPU, RAM and storage as easy as a reboot, making it the fastest and easiest way to solve resource contention.
But upgrading your database comes at a cost — of literal dollars and cents.
Costs scale linearly with instance size. And because it requires downtime, you can’t simply upgrade and downgrade at will to save money.
Often, businesses end up paying for oversized database instances just to handle peak workloads, even though those resources sit idle most of the time. Eventually, the cost of scaling up becomes hard to justify. And beyond a certain point, you start to hit the physical limits of how far a single machine can go.
A more robust solution to long-running queries is to create read-only copies of your database, often referred to as read replicas. By running your long-running queries on read replica nodes, you isolate them from your existing OLTP workload, removing all of the contention and scheduling challenges discussed above.
However, read replicas come with their own set of challenges. Traditional Postgres read replicas must match the hardware of the primary database. This means that your unit of scaling is fixed to the hardware required to handle peak OLTP workload, regardless of your long-query needs. That mis-match often leads to over-provisioning and higher than necessary costs.
Starting replicas is slow. When a replica starts, it must get a full snapshot of your primary database. Depending on the size of your database, that can be measured in minutes, hours, or days.
Once a replica is registered with the primary, it must stay on and continue to consume the write-ahead log (WAL) messages from the primary. If it doesn’t, you run the risk of filling the primary’s storage, as it can’t expire records from the WAL. If you de-register the replica and start it again later then you must reacquire a fresh snapshot of the primary.
The end result is that, once provisioned, replicas you need must stay on at all times, regardless of their utilization.
Applications also need to be aware of where they should run queries. Long-running queries must be explicitly sent to a replica node, which means that developers keep a close eye on the requirements of each transaction. Retro-fitting this logic to an existing application can be complex and error-prone since every transaction path needs to be reviewed manually.
Some ORMs can handle read/write splitting for simple cases, but for anything more complex, it’s easy to make subtle mistakes that only surface at run-time. A load-balancing proxy can help distribute long-running queries among a set of replicas, but the application still needs to know which queries are safe to offload to a replica and which must run on the primary.
Another common approach to dealing with long-running queries is to offload them entirely from your production environment to a data warehouse.
Data warehousing databases are tuned to handle long-running OLAP queries over extremely large data sets, and can be a great way to deal with that class of queries. Many businesses set up a data warehouse to extract, transform and load (ETL) data not only from their existing OLTP databases, but other business systems and tools, providing a central place to run reports.
If such a system is in place, it may be a great place to offload long-running queries. However, it must be valid to run your queries on stale data.
Because OLAP systems are tuned for operating on large sets of data, they struggle to keep up with the many small mutations that OLTP systems handle. As a result, ETL systems, even those leveraging CDC, will batch mutations together and apply them together on a periodic basis so that the OLAP system can keep up, resulting in data staleness.
If such a central data warehouse is not in place, setting up a new OLAP database, a new ETL pipeline, and potentially updating your queries to work within the new database all involve increased engineering effort and costs, which may or may not make sense given the volume and nature of the queries you’re dealing with.
Another solution for dealing with long-running queries in your database is to use a database designed for Hybrid Transactional/Analytical Processing (HTAP). These databases usually offer a scale-out approach to growth and are designed to handle a mix of OLTP and OLAP workloads, but come with a new set of trade-offs.
Because HTAP databases are more oriented toward OLAP workloads, their individual small transaction latencies tend to increase, although they are usually scale-out and so with additional resources they can maintain the overall transaction throughput. As a result, you must be sure that your primary applications can accept increased transaction latencies and understand that there will be increased costs to achieve the same transaction throughput.
HTAP databases are also not completely immune to the caching impacts seen when running OLAP queries on your OLTP database. You must either dedicate resources specifically to the OLAP workload, or accept further reductions in individual transaction performance. Again, this can be absorbed through additional scale-out resources, but this comes with additional cost.
Finally, taking advantage of an HTAP database likely involves a migration off of your existing OLTP database. This can be a massive undertaking, involving months (or even years) of planning, preparation, and testing beforehand.
Overall, HTAP databases offer a possible solution to applications that are more latency in-sensitive, but come with increased costs, increased management overheads, and what is often a costly and complex data migration.
Similar to HTAP solutions, there are a growing number of database extensions that offer OLAP-like capabilities within your OLTP database.
The best-in-breed solutions here offer resource isolation between the OLTP and OLAP workloads, removing the concerns around resource contention. They also simplify management since you maintain the database infrastructure you already have.
The challenges of these systems is that they share the vertically scaled resources of the existing database, meaning you will often have underutilized resources separately dedicated to the peak workload of both your OLTP and OLAP needs, resulting in overall larger total cost of ownership.
Also, because the OLAP capabilities usually require alternate storage layouts (e.g., columnar data layout), you may find yourself performing internal ETL operations to maintain those OLAP tables, creating challenges similar to a more traditional ETL + DW solution.
Although there are a wide array of possible solutions to dealing with long-running queries, all of them come with significant trade-offs involving a mix of increased management overhead, increased costs, increased data staleness, increased query latencies, and increased application complexity.
If you have spent time wrestling with read replica provisioning, dealing with stale ETL pipelines, or struggling to schedule batch jobs during off-hours, you're not alone. These are common and frustrating parts of trying to make analytical queries coexist with OLTP workloads.
Springtail offers a different approach.
Springtail provides seamless scale-out replicas for your PostgreSQL database. It connects to your existing PostgreSQL instance using logical replication and maintains a real-time copy of your data.
Unlike data warehousing databases, Springtail is designed to process small mutations at a high-volume, making it ideal for replicating an OLTP database. This design keeps data fresh, similar to what you get from a traditional read replica, without the lag and staleness that often comes with ETL pipelines into a warehouse.
Instead of storing a full copy of the data at each replica node, Springtail maintains a single shared copy across all nodes. This allows it to horizontally scale replica nodes up and down on-demand.
This approach allows you to isolate long-running queries from your primary database without permanently running underutilized replicas. If you only need extra read capacity for a few hours each day, Springtail can support that without wasting resources the rest of the time.
Springtail also comes with a built-in read-write splitting proxy. The Springtail proxy tracks the state of each connection. If a mutation occurs, even mid-transaction, it automatically reroutes the query to the primary database. This unique ability allows Springtail to offload more than just long-running queries, but also a wide array of short-lived read-only transactions.
In practice, this means you can point your application at Springtail’s endpoint and let it handle the rest. You don’t need to setup replicas, proxies, determine instance sizes, query schedules, resizing schemes, etc.
Long-running queries in PostgreSQL create real challenges for growing organizations. While traditional solutions offer viable paths forward, they often come with significant trade-offs in cost, complexity, or data freshness.
Springtail takes a different approach by combining the best aspects of each solution. You get the simplicity of vertical scaling, the workload isolation of read replicas, and the cost efficiency of on-demand infrastructure, all without needing to change your application architecture.
Whether you're dealing with live dashboards, scheduled reports, ETL processes, or ad-hoc analytics, Springtail ensures your PostgreSQL database can handle both your existing operational workload and your long-running queries without compromise.
Check out our docs and request access to get started.