MongoDB Newsfeed Schema Design for Entexis

In this blog post, I'll describe how we designed the MongoDB schema of the facebook-like newsfeed in Entexis, a modern recruiting-application that helps organizations to optimize their recruiting efforts, supply online application forms, etc.

MongoDB is a schema-less document database. Nonetheless, schema design is as important as ever, if not even more important. Why so?

Mostly, because the schema design in NoSQL is generally not determined by the data you want to store, but by the operations you want to perform. This is a fundamentally different approach. What I describe here works well for us, but it will almost certainly not work well for e.g. twitter, where the follower-count is probably distributed exponentially, rather than evenly.

Entexis Newsfeed Screenshot

Newsfeed Basics

Of course you know news feeds from facebook, linkedIn and many other software-as-a-service applications. The news feed's goal is to provide the user with relevant information that is usually roughly sorted by time, if not real-time.

The key to a good news feed is relevancy scoring (i.e., what does the user really care about?) which is often strongly, but not exclusively, based on recency and the relationship of reader and author. Recency does not only refer to the action that triggered the news item itself, but it might be the time the last comment was posted, or somebody interacted in any other way with the action item at hand.

Unique Array Keys in MongoDB

This is just a tiny observation, but I think it's worth pointing it out: Documents in MongoDB can never violate 'their own' unique value constraint. Let's look at an example:

> db.Test.getIndexes();
        // (snipped the _id index)
                "v" : 1,
                "key" : {
                        "Keys" : 1
                "unique" : true,
                "ns" : "test.Test",
                "name" : "Keys_1"

As you can see, i have a unique key "Keys" in my "Test" collection. Now let's look at a simple insert:

> db.Test.insert({"Keys" : [1, 12]});

Fine, this worked. Let's add another one:

> db.Test.insert({"Keys" : [6, 8, 12]});
E11000 duplicate key error index: test.Test.$Keys_1  dup key: { : 12.0 }

Well, that was expected: the value '12' violates the unique constraint. How about this one:

> db.Test.insert({"Keys" : [2, 2]});
> db.Test.find()
{ "_id" : ObjectId("4edd4af9d9d4c41a519c9d33"), "Keys" : [ 1, 12 ] }
{ "_id" : ObjectId("4edd4b4ad9d4c41a519c9d38"), "Keys" : [ 2, 2 ] }

Yes, it worked! One might expect the insert to fail, because '2' is not 'unique'. However, an object can't violate itself by constraints, so to speak. I don't think this is too much of a surprise: If you searched for an object using

> db.Test.find({"Keys" : 2})
{ "_id" : ObjectId("4edd4b4ad9d4c41a519c9d38"), "Keys" : [ 2, 2 ] }

you'd expect to find exactly this document, and it is only matched once. In this sense, the behavior is fully consistent, but it's probably not what everybody expects.

Don't not use MongoDB!

I have never really worried about credibility of sources on the internet. Yesterday's and today's events related to MongoDB on got me thinking, however. Something evil roamed the interwebs.

The Dark Side of the Force

> Luke: Is the dark side stronger?
> Yoda: No, no, no. Quicker, easier, more seductive.

To sum it up quickly: somebody posted a link to an article titled "Don't use MongoDB" on pastebin yesterday. The whole thing went viral pretty quickly and has probably had a couple of thousand readers by now. The author, who wants to remain anonymous for political reasons (of course), claims that MongoDB has suffered major breakdowns in a very large production system. It's a warning to the reader that MongoDB is a product of the dark side of the force, seducing developers to commit to an unstable technology that is quicker, easier and more seductive, but dangerous in the long run.

Today, the same user admitted that the story was a hoax - which does not prevent the clan of conspiracy theorists to claim the withdrawal is a hoax, but more on that later.

Appealing to Neophobia

The story gained enough attention that Eliot Horowitz posted an official 10gen statement, in which he sensitively addresses all issues in detail. He admits that there are still some rough edges to MongoDB, but he also clearly says that most allegations, especially that MongoDB is randomly losing data are false, and no such issues have ever been reported.

Hello, World

Yes, it's a classic. Test: That's my old blog. End Test.


Hi! My name is Christoph Menge. I'm addicted to coffee, cookies and code; I'm the CTO of e-Invoicing and billing company Pactas. I occasionally blog about tech stuff here (surprise).


Powered by caffeine.