Tuesday, October 8, 2013

MongoDB aggregation use cases?

I put this question on MongoDB univerisy M101P discussion forum. https://education.mongodb.com/courses/10gen/M101P/2013_September/discussion/forum/i4x-10gen-M101P-course-2013_September/threads/5254490dabcee8ba1e0039d2

I'm wondering an use case of aggregation framework. That's, suppose I have one Application server & one MongoDB server. As I previously learned in SQL(MySQL/Informix) world:

we'd better put computing in application instead of database if possible

E.g. if we can get avg value by application logic, we should not ask database to compute it.

MongdoDB aggregation and MapReduce framework are able to run on MongoDB cluster natively , that is an obvious advantage comparing to implement distributed framework self in distributed environment.

If the case is there's only one MongoDB node, what should I do or what's the best practice? The only advantage I can image is: "consumption of CPU and memory will happen in database server instead of application server. And if lucky, all documents already existed in working set, there may be few memory consumption during the data processing"

about an hour ago

You are right. We normally tell you to move as much as you can to the application layer so you relieve stress on the database layer.

However, aggregation calculations are normally performed on a large set of data (quite often a whole collection) so it can actually be more detrimental to transfer all that data over the wire to perform the calculations on the app server rather than simply allowing mongodb to do it. The transfer latency alone would normally make it slower unless you have specific servers which are optimised to perform these tasks.

Also you would need to have some kind of aggregation framework setup on your app server to be able to process the large amount of data coming in. That means sourcing such a framework from a software vendor or writing your own.

So there are some serious disadvantages to wanting to move aggregation functions away from mongodb.

What some users do is setup mongod secondary nodes dedicated to serving aggregation queries so this can relieve the stress on the primary.

The other thing to think about is optimising your aggregation queries by using indexes. If your aggregation query can use a covered index and all your indexes can fit in memory this will obviously also be a lot quicker to process aggregation functions on mongodb rather than looping over a cursor on the app server.

So unfortunately there is no right or wrong answer here. You would need to assess the load on your database server and see whether it is actually faster and worth the time and money moving those functions to the app server.


1 comment: