Hidden side of document IDs in RavenDB

30 November 2020

This series is dedicated to building enterprise applications leveraging the .NET + RavenDB bundle. Check out the source code on GitHub and the application live at yabt.ravendb.net.

At first glance, you don’t need to pay special attention to document IDs in RavenDB. The default autogenerated semantic IDs (e.g. users/1-A) are good enough – robust, concise, human-readable, customisable. Of course, there are other options, including GUID if you want to go fancy, but the YABT (”Yet Another Bug Tracker”) sticks to the defaults and here are some hitches you may come across.

1. Passing ID in the URL

Consider the traditional URL format for updating/deleting/viewing an entity. For a User the format would look like /api/users/{id}, where the {id} must be a unique identifier.

What would you pass as the {id}?

Passing the document ID ’as is’ would be suboptimal. For ID users/1-A the URL /api/users/users/1-A not only looks ugly, it also will derail the routing if passed unencoded. Encoded ID /api/users/users%2F1-A though functional, looks rather puzzling and doesn’t bring much joy either.

1.1. Masking the ID

Oren Eini recommends to avoid exposing the ID by masking it via encryption, so the URL would look like /api/users/bPSPEZii22y5JwUibkQgUuXR3VHBDCbUhC343HBTnd1XMDFZMuok. In that blog post, he provides the code for using the AES encryption and then encoding to the Bitcoin format.

The main benefit is disguising the pace of growing records in a collection that could be visible through sequential IDs (e.g. how many orders were created between events A and B). However, for enterprise applications it would mean sacrificing the user experience – the format is not human-readable and harder to manage (e.g. accidentally miss a character or two when selecting for copy-paste).

This method is a bit faster than GUIDs due to a lower impact on the B-tree index (though, Oren says: ”[impact] isn’t going to be a major one before you get to 100 million records”), but there can be a better approach.

1.2. Dropping the prefix

Another option would be decomposition of the ID by taking out the prefix (e.g. use 1-A for users/1-A) that gives a more conventional URL like /api/users/1-A. Note that the A (a node tag) is an integral part of the ID.

It works when the prefix can be resolved from the context. I guess it would be the case for most enterprise apps. The apps need context to correctly present data on the front-end, to run data validation on incoming parameters, to apply appropriate business logic, etc. Usually, the context for each request is well-known, and it persists when the request hits the DB.

Once we know what kind of entity the short ID is for, the ID transition is trivial and here are two helper methods that mimic the RavenDB logic:

// From `users/1-A` to `1-A`
static string? GetShortId(this string? fullId) => fullId?.Split('/').Last();

// From `1-A` to `users/1-A`
static string GetFullId<T>(this IAsyncDocumentSession session, string shortId) where T : IEntity
{
    // Pluralise the collection name (e.g. 'User' becomes 'Users', 'Person' becomes 'People')
    var pluralisedName = DocumentConventions.DefaultGetCollectionName(typeof(T));
    // Fix the later case - converts 'Users' to 'users', 'BacklogItems' to 'backlogItems'
    var prefix = session.Advanced.DocumentStore.Conventions.TransformTypeCollectionNameToDocumentIdPrefix(pluralisedName);

    return $"{prefix}/{shortId}";
}

This approach is adopted in the YABT.

2. Exposing nested references

It’s getting more interesting when we need to process an entity containing nested references.

Take a sample Backlog item record from the YABT database:

{
	"Status": "Active",
	"Title": "Malfunction at the Springfield Nuclear Power Plant",
	"Assignee": {
		"Id": "users/1-A",
		"Name": "N. Flanders",
		"FullName": "Ned Flanders"
	}
}

The Assignee has a reference to a corresponding record in the Users collection. When we expose this backlog item (e.g. via API or on a web front-end), the reference must be consumable with minimum transformation to form a URL for navigating to the user.

Ideally, the recipient should simply concatenate Assignee.Id to the base URI and get a URL for the user’s page (like https://yabt.dev/users/1-A) not thinking of complicated ID rules.

There are two options.

2.1. Store processed IDs in the DB records

The most direct approach would be storing the reference ID in the form you present to the consumers. E.g.

"Assignee": {
    "Id": "1-A",
    "Name": "N. Flanders",
    "FullName": "Ned Flanders"
}

It’s viable and the main advantage – no need in post processing when such references are passed onto the consumer. The same as in the previous example, the domain logic can resolve the full record ID from the context (only a User can be the assignee).

Another advantage can be a far-fetched one. What if your collection name may change somewhere down the track? It’s not an unimaginable scenario when the ubiquitous language is evolving (in spite of your rigorous efforts to get it right at the start). To reflect the change, devs need to rename the collection (e.g. from Users to Clients)… and all the full references with prefixes (from users/1-A to clients/1-A). Storing partial references at least eliminates the last task.

Nothing is perfect and the downsides would be

Can’t easily use Include() to prevent excessive round trips for fetching referred records from the DB. Include requires a full document ID in the reference.

var ticket = session
                .Include<BacklogItem>(x => x.Assignee.Id)
                .Load("backlogItems/1-A");

Some obscurity when looking at the record in the RavenDB Studio. It’s not transparent what collection the reference is coming from, and the Studio won’t show a list of related documents for quick navigation.

2.2. Process references before exposing

To have your ducks in a row at the DB level, we can store full reference IDs and process them before exposing to the consumer. This way, we avoid the downsides described above.

At its minimum, we can call GetShortId() (described above) on all the properties of the returned DTO that require ID processing… It would be a bit tedious and prone to human error. So we need helper methods.

Let’s apply a constraint on all the classes with the ID property:

interface IEntity
{
    string Id { get; }
}

To make it more generic, we avoid a setter for ID property (it’s generated by Raven for entities and we should read it only). IEntity interface would be implemented by all the entities and references:

public class User: IEntity
{
    public string Id { get; }
    ...
}
public class UserReference: IEntity
{
    public string Id { get; set; }
    public string Name { get; set; }
    public string FullName { get; set; }
}

Hence a more generic implementation would require Reflection to set a property value without a public setter:

static T RemoveEntityPrefixFromId<T>(this T target) where T: IEntity
{
    var newRefId = target?.Id?.GetShortId();    // The new ID value without the entity prefix

    if (newRefId == null)
        return target;
    var type = target!.GetType();

    var idProp = type.GetProperty(nameof(IEntity.Id));
    if (idProp == null)
        throw new NotImplementedException($"No '{nameof(IEntity.Id)}' property of '{type.Name}' type");

    idProp.SetValue(target, newRefId);

    return target;
}

To sanitise the Id property on a DTO representing a Backlog Item:

backlogItem.Assignee.RemoveEntityPrefixFromId()

To make the solution a bit better and sanitise multiple properties at once, we add two more helpers:

static void RemoveEntityPrefixFromIds<T, TReference>(this T target, params Expression<Func<T, TReference>>[] referenceMemberLambdas) where TReference : IEntity
{
    foreach (var referenceMember in referenceMemberLambdas)
        target.RemoveEntityPrefixFromIds(referenceMember);
}

static void RemoveEntityPrefixFromIds<T, TReference>(this T target, Expression<Func<T, TReference>> referenceMemberLambda) where TReference : IEntity
{
    if (   !(referenceMemberLambda.Body is MemberExpression referenceMemberSelectorExpression)
        || !(referenceMemberSelectorExpression.Member is PropertyInfo referenceProperty))
        return;

    // Read the current reference
    var referenceFunc = referenceMemberLambda.Compile();
    var reference = referenceFunc(target);

    if (reference == null)
        return;

    // Update the reference
    reference.RemoveEntityPrefixFromId();
}

So before returning a DTO, we sanitise all the references by calling

backlogItem.RemoveEntityPrefixFromIds(b => b.Assignee, b => b.AnotherReference)

And it’s the main downside, the devs need to call the method on the returning DTOs diligently. The perfection at the DB level turns out to be a bit of a hustle at the domain services level.

Of course, it can be taken one step further – looping through all the properties of the DTO via recursion, but we’ll stop here.

It’s for you to decide which approach is better for your project. YABT is using the last one to provide a better RavenDB experience. Check out more helpers used in the YABT.

3. Customising the ID

To cover every aspect, let’s show alternatives to the default document IDs.

3.1. GUID

The simplest, from the dev’s perspective solution, would be configuring RavenDB to generate GUID document IDs. Such IDs don’t have a prefix, so no problems with passing them around in the URL (it may look like /api/users/b794686e-7bbf-42fd-a1fe-e4a94025735a). This way, we avoid issues described at the beginning, and it’s easy to use – just set the ID to Guid.NewGuid() or leave it as string.Empty:

var user = new User
{
    Id = string.Empty // database will create a GUID value for it
};

The downsides are

it’s considered not optimal for performance due to its randomness;
it’s obscuring the name of the collection the reference is coming from;
too verbose for a neat UX.

3.2. Customise the ID convention: identity part separator, collection name

Another way of avoiding those issues is to alter the default ID convention. Though it would work for a small number of entities only.

Configure two parameters:

Set IdentityPartsSeparator to something neutral and URL-friendly (e.g. -);
Set TransformTypeCollectionNameToDocumentIdPrefix for shortening the collection prefix to 1-2 first letters (e.g. TransformTypeCollectionNameToDocumentIdPrefix = name => name.FirstOrDefault();.

Now, instead of users/1-A you get u-1-A that can be used in the URL (e.g. api/v1/users/u-1-A). The main downside – uniqueness of the ID prefix is on your shoulders now.

3.3. Artificial ID

That one is not a proper solution but just a way to make IDs more expressive. By customising the semantic IDs, you can take it to another level and produce ID from the name (so-called Artificial ID), so the user’s record would look like

{
	"Id": "users/userFlandersNerd",
	"Name": "N. Flanders",
	"FullName": "Ned Flanders"
}

and then dropping the collection prefix in the reference will keep it as expressive as before (one of the problems indicated in 2.1):

"Assignee": {
    "Id": "userFlandersNerd",
    "Name": "N. Flanders",
}

However, it looks slightly better and doesn’t solve other concerns raised in 2.1. And also it creates new problems:

enforcing uniqueness of the ID is now on your shoulders;
the name may change over time, and you will face a dilemma of either accepting the out-of-sync ID or updating the ID along with all the references.

The general recommendation would be to carefully weigh all pros and cons before embracing artificial document IDs.

That’s it. There are ample options, but let’s be reasonable, apply features wisely and avoid unnecessary complexity.

Check out the full source code at our repository on GitHub - github.com/ravendb/samples-yabt and let me know what you think in the comments below on Twitter or LinkedIn.

Share the post