Have you seen people affiliated with NoSQL databases aggressively advertising them as a solution for everything? Maybe early adopters swearing allegiance to the NoSQL and claiming to pass the point of no return to SQL? Unsurprisingly, one may think of NoSQL as the silver bullet which solves all the problems. Is it? Should we always use NoSQL?
Being a DDD worshipper and active NoSQL user/developer for years, I have no illusions (OK, maybe a little :)) and see that SQL is not obsolete, it’s still a better tool in some cases, and here they are.
Reason #1. Small project / Simple DB structure
For a start, a simple non-hierarchical key-value structure with well-defined data types would fit better in the traditional SQL environment. Though, there are some exceptions, like if you are not persisting data on disk, then Redis or Apache Ignite could be a way to go.
You may have come across many NoSQL key-value databases (Amazon DynamoDB, Oracle NoSQL Database, Couchbase, etc.), which provide heaps of features and flexibility. But for simple needs in a small project, you’re less likely to gain any benefits. And if we get down to SQLite, it just doesn’t get simpler and cheaper from here.
Reason #2. No need to scale
While any manager would claim that scaling is important for the project, in reality, not all projects need to consider it (at least, upfront).
Quote from Rick Houlihan (ex. Amazon DynamoDB and later, MongoDB guy ):
I hear a lot: ‘We use NoSQL, because it’s so flexible’. I’ve done thousands of NoSQL applications, I can tell you that nothing can be further from the truth. NoSQL is not flexible, it’s efficient to use at scale.
I would add — scaling at a lower price.
SQL databases are optimised for a smaller storage size when NoSQL — for less CPU and RAM on the most important requests. And now CPU and RAM is the most expensive resource in data centres when storage is the cheapest. At the same time, the most often operation against the DB is querying data, where all the
GROUP BYs on a normalised DB are hammering RAM with the CPU on finding matching rows (especially in a hash join).
But the difference in hosting costs is not linear and a clear NoSQL win only happens at a bigger scale. So, no need to scale in the cloud — avoid the hustle with NoSQL.
Reason #3. Immediate consistency
Denormalised data structure implies duplicated data in multiple collections. Subsequently, it leads to a higher data volume and longer time to update all the indexes, synchronising nodes, etc. It wouldn’t be an issue, as NoSQL servers were designed for Eventual Consistency with no need to wait for the change to propagate through indexes and nodes. But what if the project requires Immediate Consistency (aka Strong Consistency)?
Firstly, double check if it’s the case, as most of the domains don’t mind dealing with Eventual Consistency outside the Aggregate boundaries (check out this chapter from ”Implementing Domain-Driven Design” book by Vaughn Vernon).
Secondly, consider the performance requirements on creating/updating the records. Most of the NoSQL servers have a way to provide Immediate Consistency (usually with limitations). CosmosDB even provides a spectrum of Consistency levels. However, the trade-off is always higher latency of write requests. If it’s not acceptable, then SQL is your choice.
Reason #4. Ad hoc queries
The traditional SQL DBs make adding ad hoc queries quite a trivial task. Thanks to normalised data and flexible indexes (plus the internal query optimiser).
It’s a different story in the NoSQL realm. First, denormalised data structure makes it difficult to choose the right source of data. The structure is optimised for your business processes, which might be far off the needs of your ad hoc query. If you overcome the first hurdle, then the implementation depends on the NoSQL server.
Some impose “smart” behaviour and index every property for all items by default (e.g. CosmosDB and PostgreSQL JSONB indexes). It makes simpler to run ad hoc queries, but such flexibility comes with a cost — either more limited querying language (e.g. try nested groupings in CosmosDB) or slower performance (e.g. groupings in JSON fields of PostgreSQL).
Other NoSQL servers are designed to run a query against a specified index. And limitation of using only one index per query is common (e.g. MongoDB, RavenDB, Amazon DynamoDB). If your ad hoc query happens to filter on a non-indexed field, then the administrator needs to create/change the index. If your query has aggregates, then a new index is the only way to go. And building an index takes time. Though, worth adding that some members of the NoSQL family can disguise building new indexes (e.g. RavenDB creates auto indexes), others can scan all data without an index (e.g. dynamic queries in MongoDB). Of course, performance will suffer.
Reason #5. Unclear requirements / access patterns
Usually, devs/architects/whoever can invest time into understanding the domain, business processes, etc. and apply the Domain-Driven Design (DDD). But in some projects, we are mere mortals without much of the decision-making power, who have been forced to develop & ship a product in a hostile environment with no collaboration between the domain experts and the dev team. My condolences if it’s your case, hope you get out of this mess soon.
A NoSQL database has to be designed to meet the access patterns. If they remain unknown or tend to change often, it would lead to reconsidering the DB structure and may become a very expensive exercise. Putting things upside down is much less of an issue in a normalised RDBMS.
See ”NoSQL Data Model through the DDD prism” for best practices in designing NoSQL schema with examples.
Reason #6. Inexperienced developers
Talking about an imperfect world, sadly, most of the devs still don’t have experience of working with NoSQL and live in a world of MS SQL / MySQL / etc. surrounded by the Iron Curtain.
Two most common felonies of NoSQL devs:
- Not considering the data access patterns in the DB design
- Dishonouring the Eventual Consistency.
To make it harder, NoSQL is not just a new shiny tool. It demands knowledge of DDD, CQRS and other important concepts to design and maintain a database (at least for Key-Value and Document-based categories of NoSQL).
Those concepts aren’t new. Eric Evans coined the DDD term and published ”Domain Driven Design” book in 2003. Ages ago for the IT world. And still this concept is a novelty for the majority of developers. Along with CQRS, introduced by Greg Young in 2010 based on the CQS (Command–Query Separation) described by Bertrand Meyer in the 1980s.
Even if a NoSQL expert designs the DB, but its maintenance is given to inexperienced (can we say incompetent?) devs, be sure the DB will be butchered. Along with the back-end code. Eventually, it’ll have queries full of
JOINs, deteriorated performance and a frustrated team blaming the NoSQL for all their sins.
So if hiring is not an option, then either inspire and educate devs (”Domain-Driven Design Distilled” by Vaughn Vernon can be a good start) or go with the SQL. Hope, the general awareness will change soon, as we’re moving along the Technology Adoption Curve.
Reason #7. OLAP/OLTP/HTAP
Usually, Document-oriented NoSQL databases are tailored to your access patterns. And as a rule of thumb, they are not intended to be consumed at the atomic level, which OLAP system requires to slice and dice your data.
The traditional enterprise-level RDBMS have impressive On-Line Analytical Processing (OLAP) tools developed over the last 2+ decades:
- integrated in MS SQL Server since 1998,
- available in Oracle Database since 1996,
- open-source citus extension for PostgreSQL appeared in 2016.
The big NoSQL players usually provide links to the OLAP world, like CosmosDB promoting Synapse Analytics (via Synapse Link) and MongoDB providing Connector for BI allowing users to create queries with SQL and run Relational On-Line Analytical Processing (ROLAP). And it’s as far as it gets.
Of course, there’s a choice between Columnar NoSQL and RDBMS + columnar index, but it requires tons of considerations which I hope to blog about later.
What’s left outside
ACID (Atomicity, Consistency, Isolation, Durability) is way less important in Document-based NoSQL with properly constructed entities and aggregates. Heavy transaction use may flag a need to change the data model to get the best out of using documents in a non-relational database.
Nonetheless, all main NoSQL vendors support full ACID compliant transactions (CosmosDB, MongoDB, RavenDB, etc.). Amazon DynamoDB was a bit late to the party, as it has become ACID compliant only last year.
- correctly designed DDD aggregates, so the number of references outside the aggregate boundaries is drastically reduced;
- update the outside references in ACID transactions to enforce all-or-nothing execution.
To Be Continue
After all of that if you are curious about why one would use a document-oriented NoSQL database, check out my next post 2 reasons to NoSQL.