Posts tagged 'schema design'

MongoDB Newsfeed Schema Design for Entexis

In this blog post, I'll describe how we designed the MongoDB schema of the facebook-like newsfeed in Entexis, a modern recruiting-application that helps organizations to optimize their recruiting efforts, supply online application forms, etc.

MongoDB is a schema-less document database. Nonetheless, schema design is as important as ever, if not even more important. Why so?

Mostly, because the schema design in NoSQL is generally not determined by the data you want to store, but by the operations you want to perform. This is a fundamentally different approach. What I describe here works well for us, but it will almost certainly not work well for e.g. twitter, where the follower-count is probably distributed exponentially, rather than evenly.

Entexis Newsfeed Screenshot

Newsfeed Basics

Of course you know news feeds from facebook, linkedIn and many other software-as-a-service applications. The news feed's goal is to provide the user with relevant information that is usually roughly sorted by time, if not real-time.

The key to a good news feed is relevancy scoring (i.e., what does the user really care about?) which is often strongly, but not exclusively, based on recency and the relationship of reader and author. Recency does not only refer to the action that triggered the news item itself, but it might be the time the last comment was posted, or somebody interacted in any other way with the action item at hand.

Unique Array Keys in MongoDB

This is just a tiny observation, but I think it's worth pointing it out: Documents in MongoDB can never violate 'their own' unique value constraint. Let's look at an example:

> db.Test.getIndexes();
        // (snipped the _id index)
                "v" : 1,
                "key" : {
                        "Keys" : 1
                "unique" : true,
                "ns" : "test.Test",
                "name" : "Keys_1"

As you can see, i have a unique key "Keys" in my "Test" collection. Now let's look at a simple insert:

> db.Test.insert({"Keys" : [1, 12]});

Fine, this worked. Let's add another one:

> db.Test.insert({"Keys" : [6, 8, 12]});
E11000 duplicate key error index: test.Test.$Keys_1  dup key: { : 12.0 }

Well, that was expected: the value '12' violates the unique constraint. How about this one:

> db.Test.insert({"Keys" : [2, 2]});
> db.Test.find()
{ "_id" : ObjectId("4edd4af9d9d4c41a519c9d33"), "Keys" : [ 1, 12 ] }
{ "_id" : ObjectId("4edd4b4ad9d4c41a519c9d38"), "Keys" : [ 2, 2 ] }

Yes, it worked! One might expect the insert to fail, because '2' is not 'unique'. However, an object can't violate itself by constraints, so to speak. I don't think this is too much of a surprise: If you searched for an object using

> db.Test.find({"Keys" : 2})
{ "_id" : ObjectId("4edd4b4ad9d4c41a519c9d38"), "Keys" : [ 2, 2 ] }

you'd expect to find exactly this document, and it is only matched once. In this sense, the behavior is fully consistent, but it's probably not what everybody expects.


Hi! My name is Christoph Menge. I'm addicted to coffee, cookies and code; I'm the CTO of e-Invoicing and billing company Pactas. I occasionally blog about tech stuff here (surprise).


Powered by caffeine.