Two reasons to NoSQL

10 November 2019

This post is a follow-up on my 7 reasons not to NoSQL. Hope, the last one wasn’t too discouraging and didn’t shake believes in NoSQL.

While relational databases remain to be the default choice, here are my two reasons to go with NoSQL in enterprise projects (here I focus on document-oriented NoSQL databases only).

But first, some popular points often presented as NoSQL benefits.

Dismissed reasons

I bet that NoSQL advocates and vendors would come up with tons of reasons supporting their products, starting from benefits like

Having little to no structure.
No need to decompose the objects, just store them.
High speed of development.

Call me a sceptic, but there is a very narrow niche in the enterprise world where devs don’t need to care about the structure of the persistence layer. Most of the time it’s required to define the interface/contract for external consumers (e.g. API end-points) or indexing the data, etc. Structure is good.

Object decomposition… Technically, yes, you can store the object “as is” in the DB, but most of the time you should NOT. The database is designed to respond quickly on the most often operation(s), which usually is querying data. Hence, to persist a new object, devs not only have to validate the incoming data, but also process/decompose it and store in a denormalised way tailored to the requirements.

Regarding the speed of development… At its best it’s the same as for relational databases. For instance, let’s compare the DB design stages:

#	SQL	NoSQL
1.	Understand the business problem	Learn the domain. Document all workflows; Understand every access pattern: read/write patterns; query dimensions and aggregations.
2.	Define the Entity-Relationship Model	Design the Data Model, which is tuned to the discovered access patterns
3.	Use normalisation and produce the Data Model	Review -> Repeat -> Review, as it’s unlikely to get it right on the first attempt

Just understanding access patterns gives a handicap to NoSQL. Once the design has settled, other processes can go smoother with NoSQL, but it would be overoptimistic to expect a higher development speed in the short run.

However, there are a couple of stronger reasons in support of NoSQL.

Reason #1. Cheaper to scale

Scaling a big database either requires a super expensive machine (scaling up) or a cluster of smaller ones (scaling out). The later ends up being cheaper.

So, what are the problems with scaling SQL clusters and what are the options?

SQL option 1. Shared-disk system

Shared-disk file system uses a SAN (Storage Area Network) to allow cluster nodes to gain direct disk access. The distributed storage appears as a mounted disk in the eyes of the SQL server.

Example in the realm of MS SQL Server would be FCI (Failover Cluster Instance) running in WSFC (Windows Server Failover Clustering), which leverages Cluster Shared Volumes under the hood. It can be quite a beast to manage (check out most often issues).

Problem: Shared storage is a single point of failure. If your data is corrupted, it’s going to be corrupted no matter which node you access it from.

SQL option 2. Sharding

Sharding puts different data on separate nodes, each of which does its own reads and writes. Ideally, different users would talk to different server nodes, but such cases are pretty rare.

Usually, sharding is used to allocate data either based on a physical location (close to where it’s being accessed) or to evenly distribute data across nodes to keep equal load on each server. It affects the application logic and complicates the programming model, as the app must know in what database a given piece of data resides.

Problem: Sharding makes more difficult to query the data (e.g. joining data across shards) and to perform transactional operations to control data integrity across shards.

Not to mention a need in rebalancing the sharding from time to time and the fact that sharding does little to improve resilience, as a node failure makes that shard’s data unavailable.

BTW, there are solutions to make sharding a bit easier for devs. For example, Azure SQL offers Elastic queries and Elastic transactions to simplify querying and transactional operations in sharded databases, but their adoption requires investments.

If only we had a better clustering solution without all these downsides…

Clustering in NoSQL

First of all, if you want sharding, then many NoSQL databases provide it as an option (e.g. see sharding in MongoDB). And sharding makes more sense in NoSQL, where aggregate orientation really comes in handy. The whole point of aggregates is to design data structures in a way to combine data that’s commonly accessed together. So, aggregates become a natural unit of distribution, significantly reducing the number of cross-node JOINs and the necessity in cross-nodes transactional operations.

But way more interesting solution provided by NoSQL vendors is their Replication implementations. In the early days, it was either Master-Slave Replication or Peer-to-Peer Replication, but now the leaders of the industry supply out-of-the-box custom-build complex solutions completely transparent to the devs.

The implementation details differ from vendor to vendor (e.g. see descriptions in Azure Cosmos DB, MongoDb or RavenDB), but what’s in common is that modern NoSQL clusters transparently replicate your data, distribute workload across all the nodes and handle failures with automatic failover.

If you’re curious, here is a video illustrating an auto-recovery of a RavenDB cluster. And below is a screenshot of Raven’s administration console showing a cluster status.

The key feature is that each NoSQL node is self-sufficient. The application code will look exactly the same if the DB is running on one node or a big cluster. And not involving expensive resources (aka developers) to scale a DB, brings financial benefits to the business owners. It comes on the top of relatively cheap costs of a resilient cluster.

NoSQL hosting costs

I already mentioned hosting costs in the previous post:

SQL databases are optimising storage usage, when NoSQL — CPU and RAM. And now CPU and RAM is the most expensive resource in data centres, when storage is the cheapest. At the same time, the most often operation against the DB is querying data, where all the JOINs and GROUP BYs on a normalised DB are hammering RAM with the CPU on finding matching rows.

Of course, CPU & RAM are heavily involved in building indexes for NoSQL data, but if the DB is mostly used for querying, you’ll get cost benefits on hosting it in the cloud.

And here, if you combine costs of hosting a highly resilient database, plus maintenance and scaling, then NoSQL options can be very lucrative in the long run overcoming all the cons.

Reason #2. Convenience of development

The old-school perception of databases for enterprise projects is that they often acted as integration databases used by multiple applications (even developed by separate teams). It led to many issues, like shared DBs had very complex structures, which was almost impossible to refactor due to difficulties in coordination between the teams.

The modern approach is using application database — a database used by a single application developed by one team. It simplifies maintenance, makes it scalable and fit well into the microservices architecture.

And once there is a single consumer of the DB, it can be crafted to meet the access patterns of that single app and responsibility for database integrity can be put in the application code. This opens the gates to NoSQL.

Convenience of using NoSQL is subjective and here I’ll mention just two most common reasons, why devs like NoSQL.

No Impedance Mismatch

Impedance Mismatch is a difference between the relational model (structure of tables and rows) and the in-memory data structures (rich objects used in the application, containing hierarchical structures, arrays, etc.).

Normalised database forces devs to translate rich objects to relational representation to store on disk and translate back when reading. Sure, the ORM frameworks help a lot here, but they bring other issues well-described in posts like OrmHate by Martin Fowler or ORM Is an Offensive Anti-Pattern by Yegor Bugayenko:

Complex data mapping leads to expensive maintenance (changes on either side need to be reflected properly in the mapping).
One extra gruelling learning curve, as devs need to understand in depth how the ORM works on the top of how the DB works.
Poor performance as the under-the-hood interactions with the database aren’t pretty.

NoSQL data model suggest using aggregates or graphs, which eliminate the necessity of using ORM and allow the devs to operate with the same structures in memory as they are stored on disk. Aggregates make a lot of sense to an application programmer.

Unlock best architectural patterns and practices

Talking about aggregates in document-oriented database brings us to Domain-Driven Design (DDD), which without doubts we all need to embrace.

Sure, relational databases are also capable to persist DDD’s Aggregates, Entities and Value Objects, but only through the pain of complex mapping of hierarchical structures to/from normalised tables and rows. When NoSQL solutions provides if not painless then at least less troublesome path.

It feels that document-oriented NoSQL comes in a bundle with architectural patterns like DDD, CQRS, etc. NoSQL doesn’t require, but highly encourage to use them. For example, underlining cluster-friendly store forces to consider the consistency model and perhaps using eventual consistency, as the CAP theorem states that on a distributed data store you have to trade off availability of data vs consistency. In its turn, eventual consistency leads to implementation of Command and Query Responsibility Segregation (CQRS) pattern. And NoSQL vendors spend a lot of time popularising the DDD and CQRS.

Sometimes, a nudge towards those patterns is enough to encourage learning and applying them in the project, and also improving internal processes.

Better integration with your tech stack

Nowadays we rarely work with the database in isolation. Usually, the database is an integral part of a complex environment where a back-end or full-stack developer performs various operations leveraging tools and technologies at his/her disposal. And the database integration with those used tools and technologies becomes very important.

This idea was well recognised by many NoSQL vendors:

MongoDb was marketed along with the MEAN stack promoting integration between the database and NodeJS.
CosmosDb got a boost by releasing a provider in Entity Framework Core 3.
RavenDb got famous among the .NET folks because of its Client API library with LINQ support.

The ability to communicate with the database using the tech stack of a dev’s preference and less thinking about the DB topology, structure, etc. does make a difference. How much of a difference? It’s subjective, one gotta try it first.

That’s all

For a deeper dive in the subject, check out these books:

NoSQL Distilled by Pramod Sadalage and Martin Fowler.
Domain-Driven Design Distilled by Vaughn Vernon.

Any thoughts? Please share in the comments below, on Twitter, or join the Reddit discussion.

Share the post