We put together this report because, as software developers, we noticed a lack of information about the current state of startup data infrastructure. While plenty of reports look at which databases developers use or admire, we couldn't find much that delved deeper into specifics like read replication, sharding, or workloads.
So, we decided to run our own database survey and share the results with the startup community. We're excited to present our findings and hope they provide insights for anyone interested in database infrastructure trends.
— The Springtail Team
This report is based on a database infrastructure survey we conducted from late 2023 through early 2024.
The survey was distributed through various developer communities, social media platforms, and professional networks. It was shared with hundreds of startup professionals, including software developers, technical founders, and DevOps engineers, across a wide range of early-stage and growth-stage companies.
Although we made every effort to gather a diverse group of respondents, the voluntary nature of the survey might have introduced selection bias.
A significant proportion of respondents use PostgreSQL as their primary production database, with most opting for managed cloud services.
Most respondents operate with a single primary database. However, replication practices vary, with many having 2-5 read replicas or replicating for failover.
The primary workload for most respondents is read-heavy with some write operations, and there is a clear preference for short-lived queries.
PostgreSQL was the most popular production database, likely driven by its open-source community and extensive features suitable for various applications.
MySQL and SQL Server also had significant usage, reflecting the preference for traditional relational databases in production settings.
A variety of other databases had single-digit representation, indicating a smaller contingent of startups favoring niche solutions for additional flexibility or specialized used cases like massive volumes or unstructured data.
Our survey results highlight a clear trend towards cloud-based database deployment. The preference for managed cloud services underscores the importance of convenience, scalability, and security for startup organizations.
Self-hosting in the cloud and on-premise deployments were notably less common, suggesting the benefits of managed hosting far outweigh customization, data sovereignty, or compliance concerns.
Most respondents favored simple database configurations without sharding or minimal sharding. Understandably, a single primary or a small number of primaries is easier to manage and often sufficient for many applications.
Few respondents reported complex sharding configurations, indicating a lack of database optimization or infrastructure expertise at early stage companies.
The uncertainty among some respondents may highlight the potential reliance on managed services, where the underlying architecture may be less obvious or understood.
Our survey results highlighted a fairly balanced approach to read replication.
Many organizations prioritized failover replication to ensure that their databases remain available during an unforeseen event. Maintaining a moderate number of read replicas was also common, reflecting a need to distribute the read load and improve performance for read-heavy applications.
Interestingly, fourteen percent of respondents indicated they did not have read replicas in place, relying solely on their primary database to keep up with demand.
Most respondents maintain a relatively low number of connections to their primary database, with over half having only 1 to 2 applications connected.
A smaller but significant number indicated 3 to 5 applications connected to their database and nearly a quarter had six or more. This points to a more integrated environment where multiple services or applications interact with the database, likely reflecting a more extensive and distributed application infrastructure.
The survey results show an equal split between respondents who describe their workloads as "mostly read with some write/update" and those with a "roughly 50/50 mix of read and write." This indicates that many organizations have balanced workloads or tend towards read-heavy operations, which is typical in many database applications.
The uncertainty among some respondents regarding primary workloads might suggest a need for better monitoring and analysis tools to understand database usage patterns.
The majority of respondents only run short-lived queries from their primary database. This approach helps maintain performance and responsiveness by avoiding resource-intensive operations on their primary.
Another prevalent strategy involves using ETL processes to offload analytic workloads from the primary database to a dedicated analytics warehouse.
A smaller contingent of respondents run heavy analytics directly on their primary database, which can require robust database infrastructure to handle the additional load without affecting transactional performance.
We want to extend our heartfelt gratitude to all the readers, participants, and supporters who contributed to this report. We sincerely appreciate the time and energy you invested in sharing our database survey with your networks — your generous efforts have greatly enriched our research.
Enter your email to download a PDF of the results — including bonus content. Plus, you'll be the first to hear about upcoming surveys and new reports.