Have you seen people affiliated with NoSQL databases aggressively advertising them as a solution for everything? Maybe early adopters swearing allegiance to the NoSQL and claiming of passing the point of no return to SQL? Unsurprisingly, one may think of NoSQL as the silver bullet, which solves all the problems. Is it? Should we always use NoSQL?
Being a DDD worshipper and active NoSQL user/developer for years, I have no illusions (OK, maybe a little :)) and see that SQL is not obsolete, it’s still a better tool in some cases and here they are.
Reason #1. OLAP/OLTP
The traditional enterprise-level relational database servers have impressive On-Line Analytical Processing (OLAP) tools developed over last 2+ decades (integrated in MS SQL Server since 1998 and available in Oracle Database since 1996).
The NoSQL world has Columnar Databases like Apache Cassandra, that are well-suited for OLAP-like workloads (maybe only for OLAPs). But for other types of NoSQL the perspective of OLAP abilities looks rather bleak. There are workarounds for some Document-based NoSQL. For instance, MongoDB provides MongoDB Connector for BI allowing users to create queries with SQL and run Relational On-Line Analytical Processing (ROLAP). And it’s as far as it gets.
Reason #2. Small project / Simple DB structure
In spite of contrasting differences with the OLAP complexity, very simple non-hierarchical key-value structures with well-defined data types would also fit better in the traditional SQL environment. Though, there are some exceptions, like if you are not persisting data on disk, then Redis or Apache Ignite could be a way to go.
You may have come across many NoSQL key-value databases (Amazon DynamoDB, Oracle NoSQL Database, Couchbase, etc.), which provide heaps of features and flexibility. But for simple needs in a small project you would not gain any benefits and usually hosting costs of the NoSQL infrastructure are a bit higher.
Reason #3. No need to scale
Quote from the Amazon DynamoDB guy Rick Houlihan:
I hear a lot: ‘We use NoSQL, because it’s so flexible’. I’ve done thousands NoSQL applications, I can tell you that nothing can be further from the truth. NoSQL is not flexible, it’s efficient to use at scale.
I would add — scaling at a lower price.
SQL databases are optimising storage usage, when NoSQL — CPU. And now CPU is the most expensive resource in data centres, when storage is the cheapest. At the same time, the most often operation against the DB is querying data, where all the
GROUP BYs on a normalised DB are hammering the CPU.
But the difference in hosting costs is not linear and NoSQL wins only at a bigger scale. So, no need to scale in the cloud — avoid the hustle with NoSQL.
Reason #4. Ad hoc queries
The traditional SQL DBs makes adding ad hoc queries quite a trivial task. Thanks to normalised data and flexible indexes (plus the internal query optimiser).
It’s a different story in the NoSQL realm. First, denormalised data structure makes it difficult to choose the right source of data. The structure is optimised for your business processes, which might be far off the needs of your ad hoc query. If you overcome the first hurdle, then the implementation depends on the NoSQL server.
Some impose “smart” behaviour and index every property for all items by default (e.g. CosmosDB and PostgreSQL JSONB indexes). It makes simpler to run ad hoc queries, but such flexibility comes with a cost — either more limited querying language (e.g. try nested groupings in CosmosDB) or slower performance (e.g. groupings in JSON fields of PostgreSQL).
Other NoSQL servers are designed to run a query against a specified index. And limitation of using only one index per query is common (e.g. MongoDB, RavenDB, Amazon DynamoDB). If your ad hoc query happens to filter on a non-indexed field, then the administrator needs to create/change the index. If your query has aggregates, then a new index is the only way to go. And building an index takes time. Though, worth adding that some members of the NoSQL family can disguise building new indexes (e.g. RavenDB creates auto indexes), others can scan all data without an index (e.g. dynamic queries in MongoDB). Of course, performance will suffer.
Reason #5. Immediate consistency
Denormalised data structure implies duplicated records in multiple collections. Subsequently, it leads to a higher data volume and longer time to update all the indexes, synchronising nodes, etc. It wouldn’t be an issue, as NoSQL servers were designed for Eventual Consistency with no need to wait for the change to propagate through indexes and nodes. But what if the project requires Immediate Consistency (aka Strong Consistency)?
Firstly, double check if it’s the case, as most of domains don’t mind dealing with Eventual Consistency outside the Aggregate boundaries (check out this chapter from ”Implementing Domain-Driven Design” book by Vaughn Vernon).
Secondly, consider the performance requirements on creating/updating the records. Most of NoSQL servers have a way to provide Immediate Consistency (usually with limitations). CosmosDB even provides a spectrum of Consistency levels. However, the trade-off is always higher latency of write requests. If it’s not acceptable, then SQL is your choice.
Reason #6. Unclear requirements
Usually, devs/architects/whoever can invest time into understanding the domain, business processes, etc. and apply the Domain-Driven Design (DDD). But in some projects we are mere mortals without much of a decision-making power, who have been forced to develop & ship a product in a hostile environment with no collaboration between the domain experts and the dev team. My condolences if it’s your case, hope you get out of this mess soon.
In the NoSQL world, drastic changes in the business requirements are likely to cause a big overhaul of the whole DB structure and become a very expensive exercise. Putting things upside down is much less of an issue in a normalised database.
Reason #7. Inexperienced developers
Keep talking about an imperfect world. Sadly, most of the devs still don’t have experience of working with NoSQL and live in a world of MS SQL / MySQL / etc. surrounded by the Iron Curtain.
To make it harder, NoSQL is not just a new shiny tool. It demands knowledge of DDD, CQRS and other important concepts to design and maintain the databases (at least for Key-Value and Document-based categories of NoSQL).
Those concepts aren’t new. Eric Evans coined the DDD term and published ”Domain Driven Design” book in 2003. Ages ago for the IT world. And still this concept is a novelty for the majority of developers. Along with CQRS, introduced by Greg Young in 2010 based on the CQS (Command–Query Separation) described by Bertrand Meyer in the 1980s.
Even if a NoSQL expert designs the DB, but its maintenance is given to inexperienced (can we say incompetent?) devs, be sure the DB will be butchered. Along with the back-end code. Eventually, it’ll have queries full of
JOINs, deteriorated performance and a frustrated team blaming the NoSQL for all their sins.
So if hiring is not an option, then either inspire and educate devs (”Domain-Driven Design Distilled” by Vaughn Vernon can be a good start) or go with the SQL. Hope, the general awareness will change soon, as we’re moving along the Technology Adoption Curve.
What’s left outside
ACID (Atomicity, Consistency, Isolation, Durability) is way less important in Document-based NoSQL with properly constructed entities and aggregates. Heavy transaction use may flag a need to change the data model to get the best out of using documents in a non-relational database.
Nonetheless, all main NoSQL vendors support full ACID compliant transactions (CosmosDB, MongoDB, RavenDB, etc.). Amazon DynamoDB was a bit late to the party, as it has become ACID compliant only last year.
- correctly designed DDD aggregates, so number of references outside the aggregate boundaries is drastically reduced;
- update the outside references in ACID transactions to enforce all-or-nothing execution.