Tuesday, April 8, 2014

patch for openssl heartbleed bug on Ubuntu

Install update


apt-get update
apt-get install openssl libssl1.0.0

Check libssl version

root@hydrausdev:~# dpkg -l|grep libssl
ii libssl-dev 1.0.1-4ubuntu5.12 SSL development libraries, header files and documentation
ii libssl-doc 1.0.1-4ubuntu5.11 SSL development documentation documentation
ii libssl1.0.0 1.0.1-4ubuntu5.12 SSL shared libraries

Restart Nginx/Apache



Verify

by online tool: http://filippo.io/Heartbleed/

Friday, March 28, 2014

Install MySQL library for python on MacOS

Both clang and gcc in latest xcode command line tool have some problem while compiling MySQL-python
An error say "unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]"

Here's the problematic gcc and clang version
ffmb13:~ fanfei$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.1.0
Thread model: posix

ffmb13:~ fanfei$ clang -v
Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.1.0
Thread model: posix

To work around the issue, we can set environment variable ARCHFLAGS during the compiling.
brew install mysql
su
export ARCHFLAGS="-Wno-error=unused-command-line-argument-hard-error-in-future"
pip install MySQL-python

Wednesday, January 8, 2014

form label missing in Django 1.6?

You may used the monkey patch to add attrs during the form rendering.

e.g. add attribute "class" for all form labels, as this article said.
http://ask.make-money-article.com/que/2028404
def add_control_label(f):
    def control_label_tag(self, contents=None, attrs=None):
        if attrs is None: attrs = {}
        attrs['class'] = 'control-label'
        return f(self, contents, attrs)
    return control_label_tag

BoundField.label_tag = add_control_label(BoundField.label_tag)
If you had used this "solution" in your application, you may find the field names(control labels) are missed in admin site since Django 1.6, a form without label.



This is because the function .label_tag you hooked had been changed! Now, it receives an additional parameter label_suffix

Django 1.5
https://github.com/django/django/blob/1.5.5/django/forms/forms.py#L498
def label_tag(self, contents=None, attrs=None):
Django 1.6
https://github.com/django/django/blob/1.6/django/forms/forms.py#L515
def label_tag(self, contents=None, attrs=None, label_suffix=None):
So, you have to either add label_suffix in control_label_tag or use variable arguments like following slice.
def add_control_label(f):
    def control_label_tag(self, contents=None, attrs=None, *args, **kwargs):
        if attrs is None: attrs = {}
        attrs['class'] = 'control-label'
        return f(self, contents, attrs, *args, **kwargs)
    return control_label_tag

BoundField.label_tag = add_control_label(BoundField.label_tag)

Monday, December 23, 2013

MongoDB GridFS performance test

Test code and test result: https://github.com/feifangit/MongoDB-GridFS-test

Purpose

MongoDB GridFS comes with some natural advantages such as scalability(sharding) and HA(replica set). But as it stores file in ASCII string chunks, there's no doubt a performance loss.
I'm trying 3 different deployments (different MongoDB drivers) to read from GridFS. And compare the results to classic Nginx configuration.

Credits

Configurations

1, Nginx

location /files/ {
  alias /home/ubuntu/;
} 
open_file_cache kept off during the test.

2, Nginx_GridFS

It's a Nginx plugin based on MongoDB C driver. https://github.com/mdirolf/nginx-gridfs

Compile code & install

I made a quick install script in this repo, run it with sudo. After Nginx is ready, modify the configration file under /usr/local/nginx/conf/nginx.conf (if you didn't change the path).

Configuration

location /gridfs/{
   gridfs test1 type=string field=filename;
}
Use /usr/local/nginx/sbin/nginx to start Nginx. And use parameter -s reload if you changed the configuration file again.

3, Python

library version

  • Flask 0.10.1
  • Gevent 1.0.0
  • Gunicorn 0.18.0
  • pymongo 2.6.3

run application

cd flaskapp/
sudo chmod +x runflask.sh
bash runflask.sh
Script runflask.sh will start gunicorn with gevnet woker mode. Gunicorn configuration file here

4, Node.js

library version

  • Node.js 0.10.4
  • Express 3.4.7
  • mongodb(driver) 1.3.23

run application

cd nodejsapp/
sudo chmod +x runnodejs.sh
bash runnodejs.sh

Test

Test items:

  1. file served by Nginx directly
  2. file served by Nginx_gridFS + GridFS
  3. file served by Flask + pymongo + gevent + GridFS
  4. file served by Node.js + GridFS

Files for downloading:

Run script insert_file_gridfs.py from MongoDB server to insert 4 different size of file to databasetest1(pymongo is required)
  • 1KB
  • 100KB
  • 1MB

Test Environment

2 servers:
  • MongoDB+Application/Nginx
  • tester(Apache ab/JMeter)
hardware:

Concurrency

100 concurrent requests, total 500 requests.
ab -c 100 -n 500 ...

Result

Throughput

Time per request (download)
File sizeNginx+Hard driveNginx+GridFS pluginPython(pymongo+gevent)Node.js
1KB0.1741.1241.9821.679
100KB1.0141.5723.1033.708
1MB9.5829.56715.97318.317
You can get Apache ab report in folder: testresult

Server load

The server load is be monitored by command: vmstat 2
Nginx:
Nginx
Nginx_gridfs
Nginx
gevent+pymongo
Nginx
Node.js
Nginx

Conclusion

  • Files served by Nginx directly
  • No doubt it's the most efficient one, whether performance or server load.
  • Support cache. In real world, the directive open_file_cache should be configured well for better performance.
  • And must mention, it's the only one support pause and resume during the download(HTTP range support).
  • For the rest 3 test items, files are stored in MongoDB, but served by different drivers.
  • serve static files by application is really not an appropriate choice. They drains CPU too much and the performance is not good.
  • nginx_gridfs (MongoDB C driver): downloading requests will be processed at Nginx level, which is in front of web applications in most deployments. Web application can focus on processing dynamic contents instead of static content.
  • nginx_gridfs got the best performance comparing to other applications written in script languages. - The performance differences between Nginx and nginx_gridfs getting small after file size increased. But you can not turn a blind eye on the server load.
  • pymongo and node.js driver: it's a draw game. Static files should be avoid to be served in productive applications.

Advantages of GridFS

  • Put files in database make static content management much easier. We can omit maintain the consistency between files and its meta data in database.
  • Scalable and HA advantages come with MongoDB

Drawbacks of GridFS

  • bad performance

When should I use MongoDB GridFS

There are rare use cases I can imagine, especially in a performance sensitive system. But I may taste it in some prototype projects.
Here goes the answer from MongoDB official website, hope this will help.http://docs.mongodb.org/manual/faq/developers/#faq-developers-when-to-use-gridfs

Tuesday, December 17, 2013

Restart Heroku apps in python code

In some rare cases, we need to restart Heroku apps automatically in our periodic tasks.
e.g. Send massive HTTP requests while there's a firewall protecting the target by some IP policy. You N+1 request from the same IP may fail.

Solution 1: Use Heroku platform API directly
They are actually set of REST interface, you should either use curl or implement the HTTP request by your own.

Solution 2: Use existing python library
https://github.com/heroku/heroku.py
pip install heroku

1, Get your Heroku API key from: https://dashboard.heroku.com/account

2, Understand which Heroku app you wanna restart
list all Heroku apps you have by run
heroku apps

3, sample code, fill your API key in HEROKU_TOKEN, Heroku app name in APP_NAME


You can put the code aforementioned in your Cron task, or even the Heroku app itself.

Monday, December 16, 2013

gevent + pymongo doesn't process replica set host names defined in /etc/hosts

I have a Flask application connecting to a MongoDB replica set via hostname "node1"~"node3"

I can run it properly in dev environment by
python webservice.py
But in production environment, it fails to connect to MongoDB by such a command :
gunicorn  -k gevnet webservice:app -b 0.0.0.0:8100
There's no such issue if I change the work class to sync or eventlet

In the gevent 0.13.x period, I met similar issues caused by gevent DNS mechanism. At that tough time, genent does not process /etc/hosts and /etc/resolv.conf at all. That's you can not reach 10.1.1.47 by hostname "node1" which defined in your /etc/hosts file. :( 

About this gevent + pymongo issue, here's the report, and several workarounds mentioned in the description


After read the 1.0 release note carefully, it may be a pitfall instead of bug. The ares resolver is not only a simple better choice, you must use the "better choice" in some cases.


Fix
for supervisord, add configuration:
environment=GEVENT_RESOLVER=ares
for command line:
env GEVENT_RESOLVER=ares yourcmd...