Monday, December 23, 2013

MongoDB GridFS performance test

Test code and test result: https://github.com/feifangit/MongoDB-GridFS-test

Purpose

MongoDB GridFS comes with some natural advantages such as scalability(sharding) and HA(replica set). But as it stores file in ASCII string chunks, there's no doubt a performance loss.
I'm trying 3 different deployments (different MongoDB drivers) to read from GridFS. And compare the results to classic Nginx configuration.

Credits

Configurations

1, Nginx

location /files/ {
  alias /home/ubuntu/;
} 
open_file_cache kept off during the test.

2, Nginx_GridFS

It's a Nginx plugin based on MongoDB C driver. https://github.com/mdirolf/nginx-gridfs

Compile code & install

I made a quick install script in this repo, run it with sudo. After Nginx is ready, modify the configration file under /usr/local/nginx/conf/nginx.conf (if you didn't change the path).

Configuration

location /gridfs/{
   gridfs test1 type=string field=filename;
}
Use /usr/local/nginx/sbin/nginx to start Nginx. And use parameter -s reload if you changed the configuration file again.

3, Python

library version

  • Flask 0.10.1
  • Gevent 1.0.0
  • Gunicorn 0.18.0
  • pymongo 2.6.3

run application

cd flaskapp/
sudo chmod +x runflask.sh
bash runflask.sh
Script runflask.sh will start gunicorn with gevnet woker mode. Gunicorn configuration file here

4, Node.js

library version

  • Node.js 0.10.4
  • Express 3.4.7
  • mongodb(driver) 1.3.23

run application

cd nodejsapp/
sudo chmod +x runnodejs.sh
bash runnodejs.sh

Test

Test items:

  1. file served by Nginx directly
  2. file served by Nginx_gridFS + GridFS
  3. file served by Flask + pymongo + gevent + GridFS
  4. file served by Node.js + GridFS

Files for downloading:

Run script insert_file_gridfs.py from MongoDB server to insert 4 different size of file to databasetest1(pymongo is required)
  • 1KB
  • 100KB
  • 1MB

Test Environment

2 servers:
  • MongoDB+Application/Nginx
  • tester(Apache ab/JMeter)
hardware:

Concurrency

100 concurrent requests, total 500 requests.
ab -c 100 -n 500 ...

Result

Throughput

Time per request (download)
File sizeNginx+Hard driveNginx+GridFS pluginPython(pymongo+gevent)Node.js
1KB0.1741.1241.9821.679
100KB1.0141.5723.1033.708
1MB9.5829.56715.97318.317
You can get Apache ab report in folder: testresult

Server load

The server load is be monitored by command: vmstat 2
Nginx:
Nginx
Nginx_gridfs
Nginx
gevent+pymongo
Nginx
Node.js
Nginx

Conclusion

  • Files served by Nginx directly
  • No doubt it's the most efficient one, whether performance or server load.
  • Support cache. In real world, the directive open_file_cache should be configured well for better performance.
  • And must mention, it's the only one support pause and resume during the download(HTTP range support).
  • For the rest 3 test items, files are stored in MongoDB, but served by different drivers.
  • serve static files by application is really not an appropriate choice. They drains CPU too much and the performance is not good.
  • nginx_gridfs (MongoDB C driver): downloading requests will be processed at Nginx level, which is in front of web applications in most deployments. Web application can focus on processing dynamic contents instead of static content.
  • nginx_gridfs got the best performance comparing to other applications written in script languages. - The performance differences between Nginx and nginx_gridfs getting small after file size increased. But you can not turn a blind eye on the server load.
  • pymongo and node.js driver: it's a draw game. Static files should be avoid to be served in productive applications.

Advantages of GridFS

  • Put files in database make static content management much easier. We can omit maintain the consistency between files and its meta data in database.
  • Scalable and HA advantages come with MongoDB

Drawbacks of GridFS

  • bad performance

When should I use MongoDB GridFS

There are rare use cases I can imagine, especially in a performance sensitive system. But I may taste it in some prototype projects.
Here goes the answer from MongoDB official website, hope this will help.http://docs.mongodb.org/manual/faq/developers/#faq-developers-when-to-use-gridfs

Monday, December 16, 2013

gevent + pymongo doesn't process replica set host names defined in /etc/hosts

I have a Flask application connecting to a MongoDB replica set via hostname "node1"~"node3"

I can run it properly in dev environment by
python webservice.py
But in production environment, it fails to connect to MongoDB by such a command :
gunicorn  -k gevnet webservice:app -b 0.0.0.0:8100
There's no such issue if I change the work class to sync or eventlet

In the gevent 0.13.x period, I met similar issues caused by gevent DNS mechanism. At that tough time, genent does not process /etc/hosts and /etc/resolv.conf at all. That's you can not reach 10.1.1.47 by hostname "node1" which defined in your /etc/hosts file. :( 

About this gevent + pymongo issue, here's the report, and several workarounds mentioned in the description


After read the 1.0 release note carefully, it may be a pitfall instead of bug. The ares resolver is not only a simple better choice, you must use the "better choice" in some cases.


Fix
for supervisord, add configuration:
environment=GEVENT_RESOLVER=ares
for command line:
env GEVENT_RESOLVER=ares yourcmd...