Thursday, March 28, 2013

Nginx upload module vs Flask

Env

Ubuntu 12.04 32bit
Nginx 1.1.19 (HTTP server)

Flask 0.9 (light python framework)
Gevent: 1.0rc2 (coroutine I/O)
Gunicorn 0.17.2 (WSGI server)

Nginx setting

Nginx listen to port 8666
serves requests to URL /upload
and also work as a reverse proxy to Flask which serve requests on port 8222

Server code

server application run in 5 worker mode with Gunicorn + Gevent
gunicorn -k gevent -b 0.0.0.0:8222 -w 5 t:app


Test code

start N threads, and upload file to the URL




Result


Performance test result:
Unit: seconds
file size: 3.3 MB
concurrent number
Flask 0.9 + gevent 1.0rc2 + gunicorn 0.17.2 + Nginx
(5 worker processes)
Nginx + Nginx upload module
50
6.029
2.328
100
12.788
4.995
200
28.828
10.813


As we expected, Nginx upload module which written in C, is about twice faster than pure python code while processing file uploading.


Wednesday, March 6, 2013

Add disk for datanode in CDH4

First step: add the disk, make the partition, mount to the machine

  • connect HDD to machine
  • ls /dev/[sh]d*
    The one not ending with a number should be the new HDD, e.g. /dev/sdb
  • create partition on the new disk (assume create one partition on the entire disk)
    fdisk /dev/sdb
    => command n   add a new partition
    => select primary
    => default partition number 1
    => default start cylinder
    => default end cylinder
    => command w   write table to disk and exit
    => command q leave fdisk
  • list partitions on the new disk
    fdisk -l /dev/sdb
    you should see /dev/sdb1
  • format the partition
    mkfs -t ext4 /dev/sdb1
  • get the UUID of disk
    blkid
    copy the UUID, assume it's a-b-c-d-e
  • mount the partition, assume we're trying to mount on /data2
    add the line to file </etc/fstab>
    UUID=a-b-c-d-e /data2 ext4 defaults 0
  • reboot
  • mount /data2

Step 2: change the attributes of the folder

This step is very important!
chgrp hadoop /data2
chown hdfs /data2

Step 3: add this folder to datanode setting by CDH Manager



Step 4: restart HDFS

check out the space now by namenode's status page: http://xxxx:50070/dfshealth.jsp