Multi-tenant applications with NoSQL

15 September 2021

This series is dedicated to building enterprise applications leveraging the .NET + RavenDB bundle. Check out the source code on GitHub and the application live at yabt.ravendb.net.

For SaaS applications, multi-tenancy might seem like a simple problem at first glance. But just scratch the surface and it reveals itself as one of those topics where technical architecture concerns, business needs, security risks and user experience create a delicate area that requires thoroughly choosing the tenancy model and only then challenge the devs to implement it.

Choosing multi-tenancy model

So, how come it’s so complicated? You’ve got an app and a database, so it boils down to three options:

Multiple standalone single-tenanted apps (duplicating the app-DB bundle for each tenant).
Single app with multiple databases (database-per-tenant).
Single app and database both supporting multiple tenants.

If these three is not enough, there are hybrid combinations based on options #3 & #1 or #3 & #2.

Which one to choose?

Let’s park the hybrid options for a moment.

Usual criteria for assessing the tenancy model would include a few points for reliability of the system:

Scalability:
- Number of tenants.
- Storage & workload per tenant.
Tenant isolation:
- Shared data between tenants.
- Impact of one tenant’s workload on others.
- Schema management (need of tenant-specific customisations).
Risk management:
- Risk of data exposure between tenants.
- Disaster recovery (restoring a single tenant vs all).
Operational & Maintenance complexity:
- Release management (rolling out updates, etc).
- Monitoring and managing performance.

Of course, the list would go on and on, but it weighs against cost-effectiveness of the application (operational, maintenance costs, etc.). Once $$$ comes into the picture, the stakeholders are suddenly willing to compromise on many aspects and reconsider the risks.

Of course, option #1 with fully isolated tenants is more secure but others are likely to become cheaper in the long run with option #3 being the most affordable out of the three (often, judging just by the operational costs, as DBs tend to be expensive). The multi-tenant database option also opens the gates to hybrid options in the future if you need a higher level of isolation for some tenants.

Multi-tenant database in NoSQL

If multi-tenant database is your choice then see how it can be arranged in NoSQL in general and RavenDB in particular based on our YABT project that got multi-tenancy in version 2. Feel free to jump straight to the code on GitHub if impatient.

Our basic requirements were simple:

Each user belongs to a single tenant.
CRUD operations on tenanted entities can be performed only by users of the same tenant.

The first steps of is simple – add TenantId property to appropriate collections and indexes. We add an interface for enforcing the new constraint:

public interface ITenantedEntity
{
	string TenantId { get; }
}

For simplicity, we keep a list of tenants in the same database with all the TenantIds referring to Project collection:

Abstracting away tenancy enforcements

Probably, we could have wrapped up here if only manipulating with TenantId properties each time when you need to query or update wasn’t such a vicious practice. Gotta add a new requirement:

As prevention of human error, checking and setting the tenant in CRUD operations happens automatically at the DB session level.

Freeing devs from worrying about missing or incorrect tenant link when writing queries or implementing CRUD operations would be an outstanding achievement. And finding an elegant solution requires deep understanding of the guts of a particular database server you are dealing with.

Finding a way in RavenDB

The first thing you may cast an eye on would be some promising events in the Document Store of the RavenDB Client API:

It’s a good start and I tried them out (see the code and tests on GitHub). Most of them didn’t bring much joy, but they might come in handy in some special cases. So, how did it go?

Preventing loading a record with a wrong tenant can be done pretty easy:

store.OnAfterConversionToEntity += (_, args) => {
   if (args.Entity is ITenantedEntity entity && entity.TenantId != currentTenantId)
      throw new InvalidTenantException();
}

and tenant validation on deletion would look like:

store.OnBeforeDelete += (sender, args) => {
   if (args.Entity is ITenantedEntity entity && entity.TenantId != currentTenantId)
      throw new InvalidTenantException();
   if (args.DocumentId != null && args.Entity == null)
      throw new NotSupportedException("Deletion by ID not supported");
}

You see a feature? The only way to interrupt a Load or Delete operation is by throwing an exception. It’s understandable. But a pitfall is that on throwing an exception the session object gets into a precarious state (explanation) and shouldn’t be touched again (better to discard it immediately).

Another pitfall is that deleting documents by ID becomes prohibited without any changes in the contract (methods). That would require a gentlemen agreement between devs on avoiding it (otherwise one would learn about the restriction in the runtime only).

Moving on. Adding the current TenantId on storing a record looks straightforward:

store.OnBeforeStore += (_, args) => {
   var entityType = args.Entity.GetType(); 
   if (!typeof(ITenantedEntity).IsAssignableFrom(entityType)) 
      return;
   // Use reflection here as we consciously avoid a setter on 'TenantId' 
   // Just another barrier to prevent setting the tenant manually in the code
   entityType.GetProperty(nameof(ITenantedEntity.TenantId))?
             .SetValue(args.Entity, currentTenantId);
}

And here we’re hitting a snag – adding a tenant constraint in queries becomes a struggle:

store.OnBeforeQuery += (_, args) => {
   var entityType = args.QueryCustomization.GetType().GetGenericArguments()[0];
   if (!typeof(ITenantedEntity).IsAssignableFrom(entityType))
      return;				
      
   // Add the "AND" to the the WHERE clause 
   // (the method has a check under the hood to prevent adding "AND" if the "WHERE" is empty)
   type.GetDeclaredMethod(nameof(IDocumentQuery<object>.AndAlso)).Invoke(customization, null);
   // Add "TenantId = 'Bla'" into the WHERE clause
   type.GetDeclaredMethod(
            nameof(IDocumentQuery<object>.WhereEquals),
            new[] { typeof(string), typeof(object), typeof(bool) })
       .Invoke(
            customization,
            new object[]
            {
               nameof(ITenantedEntity.TenantId),
               currentTenantId.ToLower(),
               true,
            });
}

First of all, the solution above is hacky (and even more so once you add proper type validation). Secondly, it misses the mark. It works on fetching full entities (e.g. session.Query<BacklogItemTask>()) but fails if the query fetches a subset of fields, e.g. session.Query<BacklogItemTask>().Select(t => t.Id). In the later case, the query type would be IAsyncDocumentQuery<string>, so the entityType in the event handler above would be resolved to string and break the ITenantedEntity check.

Conclusion:

The above approach with the Document Store events will work nicely if the database contains exclusively tenanted collections and you’re going to always add a tenant condition to the query.

Whether a DB should contain a mix of tenanted and non-tenanted collections is a separate discussion, but to make our solution more generic we back out from the events and try another path.

Decorator for the session

Most of the .NET applications use a high-level session object to send requests to the DB server. In RavenDB it’s IAsyncDocumentSession (see What is a session?). It’s not massive, has a bit over a dozen methods and a couple of properties.

It also has one key feature – all methods are generic with the entity object as the parameter (for queries it can be the indexed object). It’s exactly what was missing in OnBeforeQuery event to reliably detect whether a tenant filter is needed!

Applying the decorator pattern over the session we would:

incorporate our tenant-related manipulations;
get a new cleaner interface with the only methods that support multi-tenancy (omitting deleting by ID, Include(), etc.).

Of course, if the interface changes in new RavenDB versions it may affect our decorator. Though, it didn’t change often in the past, and was extended twice over last 3 years. Looks like a safe bet.

Let’s call the new interface IAsyncTenantedDocumentSession. Initially it gets 10 methods for CRUD operations and 1 property (Advanced that propagates IAsyncDocumentSession.Advanced), roughly looking like examples below.

Tenant checks for loading a record or deleting one wold be like:

public async Task<T?> LoadAsync<T>(string id, CancellationToken token = default)
{
   var entity = await DbSession.LoadAsync<T>(id, token);

   if (entity == null || !typeof(T).GetInterfaces().Contains(typeof(ITenantedEntity)) || (entity as ITenantedEntity)?.TenantId == _currentTenantId)
      return entity;
   
   throw new ArgumentException("Attempt to access a record of another tenant");
}

Setting the current tenant on saving:

public Task StoreAsync<T>(T entity, CancellationToken token = default) where T: notnull
{
   if (typeof(T).GetInterfaces().Contains(typeof(ITenantedEntity)))
   {
      // Set current tenant via reflection as we consciously avoid a setter on 'TenantId'
      var property = typeof(T).GetProperty(nameof(ITenantedEntity.TenantId));
      if (property == null)
         throw new ArgumentException("Can't resolve tenanted property");
      property.SetValue(entity, _currentTenantId);
   }
   // Propagate call to the original `StoreAsync()` method
   return session.StoreAsync(entity, token);
}

Filtering by tenant on querying:

public IRavenQueryable<T> Query<T>(string? indexName = null, string? collectionName = null, bool isMapReduce = false)
{
   var query = DbSession.Query<T>(indexName, collectionName, isMapReduce);

   if (!typeof(T).GetInterfaces().Contains(typeof(ITenantedEntity)))
      return query;

   // Add an extra WHERE condition on the current tenant
   return query.Where(e => (e as ITenantedEntity)!.TenantId == _currentTenantId);
}

OK. That works and eliminates the need of adding the current tenant each time when writing requests against the DB.

Set Based Operations on Documents

Once we got here, we may also bring over some handy properties and methods including an already existing logic for managing deferred patch requests (aka Set Based operations) used for updating fields in rich links. Till now it was separated to its own classes.

Conveniently enough, to make that change we extend our interface by just one method – AddDeferredPatchQuery(). Execution of the patches gets added to the already existing SaveChangesAsync(). It removes a couple of files from the project and streamlines saving 👍.

The end result of our session object is here - IAsyncTenantedDocumentSession interface and its implementation in AsyncTenantedDocumentSession.

Shooting yourself in the foot

While IAsyncTenantedDocumentSession prevents devs from running CRUD operations for non-current tenants, it’s still admittedly not bulletproof.

Firstly, the set based operations mentioned above still allows passing a raw RQL to the database (see YABT example) that may contain an incorrect tenant reference.

Mitigation strategy would depend on your project. The easiest solution is to prevent use of TenantID keyword in the passed RQL by checking the queued up query (in AddDeferredPatchQuery()). E.g.

public void AddDeferredPatchQuery(IndexQuery patchQuery)
{
   var match = Regex.Match(patchQuery.Query, @$"\b{nameof(ITenantedEntity.TenantId)}\b", RegexOptions.IgnoreCase);
   if (match.Success)
      throw new ArgumentException("Attempt to access a tenant in RQL");
   _deferredPatchQueries.Enqueue(patchQuery);
}

That constraint would suffice for YABT.

Another weak point is exposed IAsyncDocumentSession.Advanced property that gives away too much of low-level operations. If it’s a concern then hide this property in the decorator and provide targeting methods to perform operations where Advanced would still be involved under the hood.

In our case the session object got two methods that check record by Id and TenantId:

ExistsAsync<T>() (replaced session.Advanced.ExistsAsync())
Patch<TEntity, TProp>() that sets field value for a record without loading (replaced session.Advanced.Patch())

And I might got carried away but I also extended parameters of some ‘traditional’ methods to avoid referencing powerful session.Advanced ever. E.g. the saving method now can also clear the cache (comes in handy during testing) and returns true if there actually were records to save:

Task<bool> SaveChangesAsync(bool clearCache, CancellationToken token = default)

Housekeeping rules

And the last bit. To keep the back-end project architecture neat, all the DB-specific piping (session, connections logic, etc.) is contained in a dedicated project called Database that’s exposed to the domain layer (project Domain) only.

At the same time, tenant resolution clearly belongs to a higher level. So we need a small helper to register our session that gets called in Domain project:

static IServiceCollection AddAndConfigureDatabaseTenantedSession(
   this IServiceCollection services, 
   Func<IServiceProvider, Func<string>> currentTenantResolverFunc)
{
   return services.AddScoped<IAsyncTenantedDocumentSession>(x =>
      new AsyncTenantedDocumentSession(x.GetRequiredService<IDocumentStore>(), currentTenantResolverFunc(x))
   );
}

At this stage we eliminate registration of IAsyncDocumentSession in the application code by replacing with the newly created tenant-safe IAsyncTenantedDocumentSession.

Here we go. YABT with multi-tenancy support has been released (source). Thoughts?

Let me know what you think in the comments below, on Twitter, LinkedIn or Reddit.

Share the post