Friday, June 22, 2012

HBase + Thrift performance test 1

Thrift transport is not thread safe!
At the beginning, I used only 1 global thrift connection in my test app, and 10 concurrent threads send data to HBase by this thrift connection.

Then, lots of confusing exceptions came towards me!
- Server side printed exceptions:
"java.lang.OutOfMemoryError: Java heap space"
NullPointer...
...

- Client side printed exceptions:
"... broken pipe..."


Performance test 
Data size:
  • write 60K records to database, 
HBase Row
  • each record is about 1024 bytes => 60MB data totally
  • each record owns 3 columns "f1:1", "f1:2", "f1:3" in 1 column family "f1". values in each column were formatted as "value_%s_endv" % "x"*(1024/3)
  • row key is formatted as  "RK_%s_%s" % (random.random(), time.time())
  • one record :



Write mode
  • 10 threads concurrent => each thread in charge of writing 6K records(6MB)
  • write to database every 300 records (mutateRows)



Hardware:
4 boxes in cluster:
  1. NameNode, Secondary NameNode, HBase Master, Zookeeper Server
  2. DataNode, Region server, Thrift
  3. DataNode, Region server
  4. DataNode, Region server

They're all Ubuntu 12.04 x64 servers, Intel Core2 Quad@2.66GHz. #1 #3 #4 are equipped with 8G memory, #2 is 16G because thrift was running on it.


HBase Configuration:
Most preferences keep default after Cloudera CDH4 Manager installed. The only two modifications:
HBase Master's Java Heap Size in bytes: 1073741824 -> 2147483648
HBase Client Write Buffer 2097152 -> 8388608

create test database "testdb1" with column family "f1"
> create 'testdb1','f1'
0 row(s) in 1.6290 seconds


[Test 1]
each thread own its private connection to Thrift, so there are 10 connections totally in this test.
test code: https://github.com/feifangit/hbase-thrift-performance-test/blob/master/connectioninthread.py

result: 6.9139 seconds -> 60 000 records (60MB)



[Test 2]
use one global connection, each thread should acquire the global reentrant lock before write.
test code: https://github.com/feifangit/hbase-thrift-performance-test/blob/master/sharedconnection.py

result: 16.345 seconds -> 60 000 records(60MB)


Summary
Uh... Of course, 10 connections(test 1) are much faster than single connection(test 2). thrift is not the bottle neck in this test. More thrift connections bring you better performance.

Next week, I'll add a tornado Web application in front of the thrift interface to collect mass data.
I'll try to reach the best performance as I can.



[Additional] 
I also did load test with 6 million records. It cost me 896 seconds(14 mins), so avg. time to store a record is 0.1ms. impressive performance!!


Here's the server status:(the regionserver where thrift located) 
CPU Time


IO



 Memory




2 comments:

  1. Thanks for sharing such informative article on Loadrunner Automation testing tool. This load testing tool will provide most precise information about the quality of software. Loadrunner training in Chennai

    ReplyDelete
  2. Thanks for your informative article on software testing. Your post helped me to understand the future and career prospects in software testing. Keep on updating your blog with such awesome article. Software testing course in Chennai| Software testing training in Chennai

    ReplyDelete