Alex KlausDeveloper  |  Architect  |  Leader
Power of Dynamic fields for indexing dictionaries and collections in RavenDB
10 December 2020

Dynamic fields in RavenDB

This series is dedicated to building enterprise applications leveraging the .NET + RavenDB bundle. Check out the source code on GitHub and the application live at yabt.ravendb.net.

It may sound unexpected, but indexing custom fields and structures dynamically added to some documents can be challenging in NoSQL. Add on the top a strongly-typed language to deal with it in the DB and things start looking even less optimistic… but not in RavenDB.

In this post the Yet Another Bug Tracker (YABT) is getting custom fields and structures that can be added by the end-user. The custom fields is a widely-adopted concept in bug-trackers to manage additional fields on the ticket per project, team, or ticket type. It exists in Jira, Azure DevOps, Bugzilla and others.

Let’s follow the trend and add to the Backlog Item (aka ”ticket”) two related features:

  1. A container for custom fields – additional properties of various types.
  2. A list of all modifications to keep track of who/when modified the ticket.

Here is a mock-up of the Backlog Item screen:

Backlog Item

  • Custom fields are displayed on the right-hand side. Each of them is formatted according to its type.
  • The ”Created” and ”Last Updated” fields (below the custom fields) get resolved from the history of the ticket’s modifications (with the rest of the history is available under the ”View All”). The name is a link for navigation to the user’s profile page.

1. NoSQL solution

At first glance, the task is trivial. Flexibility of the NoSQL does not require defining the structure upfront, so any fields can be easily added ad hoc. Though, the developers still need to be aware of the structure for two reasons:

  • To build the presentation layer (layout and style of the fields).
  • To filter and search on dynamically added fields (especially in a strongly-typed language).

1.1. DB design overview

We described the Data model in the previous article. Here is the relevant part of the diagram including CustomField, BacklogItem and User aggregates:

NoSQL diagram

In JSON format a Backlog Item record would look like

{
	"Title": "Malfunction at the Springfield Nuclear Power Plant",
	"Description": "Some terrible details",
	"ModifiedBy": [
		{
			"Timestamp": "2020-01-01T00:00:00",
			"Summary": "Created",
			"ActionedBy": {
				"Id": "users/1-A",
				"Name": "Homer Simpson"
			}
		},
		{
			"Timestamp": "2020-01-02T00:00:00",
			"Summary": "Modified description",
			"ActionedBy": {
				"Id": "users/2-A",
				"Name": "Waylon Smithers"
			}
		}
	],
	"CustomFields": {
		"CustomFields/1-A": "Mr. Burns",
		"CustomFields/2-A": 10000000000
	}
}

Where sample records of custom fields would be:

({
	"Id": "CustomFields/1-A",
	"Name": "Affected Persona",
	"Type": "Text"
},
{
	"Id": "CustomFields/2-A",
	"Name": "Potential Loss",
	"Type": "Currency"
})

We are going to focus on two dynamic BacklogItem properties:

  • ModifiedBy – change history, a collection of items with a predefined structure;
  • CustomFields – container for custom fields, a more complex structure represented by a dictionary, where each value has variable structure (BTW, any JSON structure can be presented as a dictionary).

Taking it one level up from JSON to C# we get RavenDB models for persisting in the database:

// The Backlog Item aggregate persisted in the DB
public class BacklogItem
{
    public string Id    { get; private set; }
    public string Title { get; set; }

    // List of all users who/when modified the ticket.
    public IList<ChangedByUserReference> ModifiedBy { get; } = new List<ChangedByUserReference>();

    // Resolve Who/when created & updated the ticket, no need to persist it in the DB
    [JsonIgnore]
    public ChangedByUserReference Created     => ModifiedBy?.OrderBy(m => m.Timestamp).FirstOrDefault();
    [JsonIgnore]
    public ChangedByUserReference LastUpdated => ModifiedBy?.OrderBy(m => m.Timestamp).LastOrDefault();

    // Custom properties of various data types. Stored as { custom field ID, value }
    public IDictionary<string, object> CustomFields { get; set; }
}

public class ChangedByUserReference
{
    // Timestamp of the change
    public DateTime Timestamp { get; set; }
    // Brief summary of the change
    public string Summary { get; set; }
    // The user who made the change
    public UserReference ActionedBy { get; set; }
}

public class UserReference
{
    string Id	{ get; set; }
    string Name	{ get; set; }
}

and a separate collection of Custom Fields would be

// The Custom Field entity persisted in the DB
public class CustomField
{
    public string Id { get; set; }
    public string Name { get; set; }
    // Type of the custom field determines how to process the associated value
    public CustomFieldType FieldType { get; set; }

    public enum CustomFieldType { Text, Date, Currency }
}

1.2. Design justification

Design of a de-normalised database always depends on the main use-cases of your application. Therefore, the structures provided above ought to be justified.

References to other entities

While both ModifiedBy and CustomFields refer to other entities, they do it differently.

Instead of just referencing user IDs, ActionedBy property has a bit richer structure – UserReference class. It’s done to keep often used user’s properties handy and avoid excessive JOINs with the User collection for presenting auxiliary data requested along with the backlog items. For example, a list of backlog items will show names of users who created/edited them:

Backlog Items list

But convenience of receiving user names along with backlog items comes with some maintenance responsibility – keeping user names in references in sync with corresponding records in the Users collection. It is a trade-off that will be discussed in a separate article.

Alternatively, we can go with a traditional approach of using the ID of the referred record and resolving other properties in runtime when querying. The CustomFields property is a good fit for that as custom fields are unlikely to be queried along with multiple backlog items (as shown on the mock-up above), so resolving them via a separate request or a JOIN would not put too much stress on the DB.

ModifiedBy: IList vs IDictionary

ModifiedBy could have been presented in C# as a dictionary with a timestamp as the key (e.g. IDictionary<DateTime, ChangedByUserReference>). If you are confident that 2+ events will never occur at the exact same time, then go for it. Note, that the user ID could not be used as the key, as one user may make multiple changes and the list should reflect the whole history. Overall, I would rather have ModifiedBy as a list.

CustomFields on the other side is the perfect candidate for a dictionary if we decide not to use one custom field twice in the ticket.

OK, it was a diversion. I hope, getting values from the Custom Fields along with the created/updated is straightforward (see YABT source code for examples). So, we resolved the first issue and can build the presentation layer for a known data structure. But what about filtering on those fields?

2. Filtering on sub-attributes… and sub- sub- attributes

There are two interesting cases requiring filtering the Backlog items on ModifiedBy and CustomFields properties:

  1. Getting ”My recent tickets”, tickets that were recently edited by the current user.
  2. Filtering on certain values of various custom fields. E.g. for text fields where the ”Affected Persona” contains word ”Burns” or the ”Potential Loss” is more than $1,000.

2.1. Filter for ”My recent tickets

To filter a collection we need an index and the Dynamic fields feature comes in handy. It creates key-value pairs in the index terms where the keys are resolved runtime when updating the index.

A generic example of Dynamic fields in index would look like

Map = tickets =>
    from ticket in tickets
    select new
    {
        _ = ticket.ModifiedBy.Select(x => CreateField(x.ActionedBy.Id, x.Timestamp))
    };

It will create index terms for the above example reflecting changes by each user:

  • 'users/1-A': { 2020-01-01T00:00:00 }
  • 'users/2-A': { 2020-01-02T00:00:00 }

Looks good, but not good enough as

  1. One user can modify one ticket multiple times, but to get the timestamp for the very last modification we need to group by ActionedBy.Id.
  2. Filtering on the generated terms from a strongly-typed language requires a generic structure for using in the queries.

Battling the first problem is easy but figuring out a solution for the second one would require deep knowledge of underlining structures.

If you need details, the discussion is here. The gist – forming the keys for CreateField() in a special format will allow querying on the terms in C# as it was a dictionary. Bear with me.

For index created with syntax CreateField("Bla_" + k.Key, k.Value) and k.Key='Key' you would query

  • in RQL it looks quite obvious:
from index 'Y' where Bla_Key = 4
  • in C# it would be a dictionary:
s.Query<X,Y>().Where(p => p.Bla["Key"].Equals(4))

RavenDB below v5 had two constraints on the Dynamic field key:

  • the key cannot start with a digit, e.g. 1-A (issue #15234);
  • the key cannot contain the slash (/) symbol (issue #15235).

Fortunately, both have been fixed in RavenDB v5.0.

If you are on an older version, then to work around those constraints, we set the key as

CreateField("Bla_" + k.Key!.Replace("/",""), k.Value)

Then for k.Key == 'users/1-A' you would query s.Query<X, Y>().Where(p => p.Bla["users1-A"].Equals(4)).

Here is a full example:

public class BacklogItems_ForList : AbstractIndexCreationTask<BacklogItem>
{
    public class Result
    {
        public IDictionary<string, DateTime> ModifiedByUser { get; set; }
    }
    public BacklogItems_ForList()
    {
        Map = tickets =>
            from ticket in tickets
            select new
            {
                _ = ticket.ModifiedBy.GroupBy(m => m.ActionedBy.Id)		// Grouping by user
                                     .Select(x => CreateField($"{nameof(Result.ModifiedByUser)}_{x.Key!.Replace("/","").ToLower()}",
                                                              x.Max(o => o.Timestamp)
                                                             )
                                            )
            };
    }
}

Now we can get all the tickets modified by a user ID and sort in descending order by the timestamp of the last change by the current user via

var userKey = userId.Replace("/", "").ToLower();  // For 'users/1-A' get 'users1-A'
s.Query<Result, BacklogItems_ForList>()
 .Where(t => t.ModifiedByUser[userKey] > DateTime.MinValue)
 .OrderByDescending(t => t.ModifiedByUser[userKey])

Note: here we take string concatenation for the key outside of the query, it’d fail otherwise.

2.2. Filter by custom fields

We have already covered all the gotchas so it should be a smooth ride for filtering on custom fields. A simple case looks pretty much the same as for ModifiedBy:

__ = ticket.CustomFields.Select(x => CreateField($"{nameof(Result.CustomFields)}_{x.Key.Replace("/","").ToLower()}", x.Value))

But here it all boils down to the supported data types of the custom fields. And we can knock ourselves out:

  • can search in text fields;
  • check for equality in numeric fields;
  • filter on sub-attributes for complex structures (e.g. on ID for UserReference when referencing to other users).

So the index would require resolving FieldType of the custom field and running many if/else conditions for each type:

public class BacklogItems_ForList : AbstractIndexCreationTask<BacklogItem>
{
    public class Result
    {
        public IDictionary<string, string> CustomFields { get; set; }
    }
    public BacklogItems_ForList()
    {
        Map = tickets =>
            from ticket in tickets
            select new
            {
                __ = from x in ticket.CustomFields
                        let fieldType = LoadDocument<CustomField.CustomField>(x.Key).FieldType
                        let key = $"{nameof(Result.CustomFields)}_{x.Key.Replace("/", "").ToLower()}"
                        select
                            (fieldType == CustomFieldType.Text)
                                // search in text Custom Fields
                                ? CreateField(key, x.Value, false, true)
                                : (fieldType == CustomFieldType.UserReference)
                                    // Exact match of User ID
                                    ? CreateField(key, x.Value.Id)
                                    // Other Custom Fields (e.g. numbers, dates) can use a '≥' comparison
                                    : CreateField(key, x.Value)
            };
    }
}

And here how we can query against that index:

  1. When the Custom Field in question has type Text
    "CustomFields": {
        "CustomFields/1-A": "Mr. Burns"
    }
    we can search text:
    var fieldKey = customFieldId.Replace("/", "").ToLower();  // For 'CustomFields/1-A' get 'customfields1-A'
    s.Query<Result, BacklogItems_ForList>()
     .Search(t => t.CustomFields[fieldKey], "Burns")
  2. When the Custom Field in question is a Number:
    "CustomFields": {
        "CustomFields/2-A": 10000000000
    }
    we can use the ’>’ operator:
    var fieldKey = customFieldId.Replace("/", "").ToLower();  // For 'CustomFields/2-A' get 'customfields2-A'
    s.Query<Result, BacklogItems_ForList>()
     .Where(t => t.CustomFields[fieldKey] > 1_000_000)
    A similar approach would be used for querying on user IDs.

3. RavenDB Studio Tools

Dynamic fields lack of transparency. To have a look under the hood (e.g. see the Index Terms) use the RavenDB Studio.

If you open the Index Terms for an index when running one of the YABT tests covering the scenarios described above (e.g. this one in the YABT repo), you would see terms like:

RavenDB Studio: Index Terms

That screenshot shows values for Custom Fields and user’s modifications you can query on.

That’s it. Happy filtering.

Check out the full source code at our repository on GitHub - github.com/ravendb/samples-yabt and let me know what you think in the comments below, on Twitter or LinkedIn.