At the beginning, I used only 1 global thrift connection in my test app, and 10 concurrent threads send data to HBase by this thrift connection.
Then, lots of confusing exceptions came towards me!
- Server side printed exceptions:
"java.lang.OutOfMemoryError: Java heap space"
NullPointer...
...
- Client side printed exceptions:
"... broken pipe..."
Performance test
Data size:
- write 60K records to database,
- each record is about 1024 bytes => 60MB data totally
- each record owns 3 columns "f1:1", "f1:2", "f1:3" in 1 column family "f1". values in each column were formatted as "value_%s_endv" % "x"*(1024/3)
- row key is formatted as "RK_%s_%s" % (random.random(), time.time())
- one record :
Write mode
- 10 threads concurrent => each thread in charge of writing 6K records(6MB)
- write to database every 300 records (mutateRows)
Hardware:
4 boxes in cluster:
- NameNode, Secondary NameNode, HBase Master, Zookeeper Server
- DataNode, Region server, Thrift
- DataNode, Region server
- DataNode, Region server
They're all Ubuntu 12.04 x64 servers, Intel Core2 Quad@2.66GHz. #1 #3 #4 are equipped with 8G memory, #2 is 16G because thrift was running on it.
HBase Configuration:
Most preferences keep default after Cloudera CDH4 Manager installed. The only two modifications:
HBase Master's Java Heap Size in bytes: 1073741824 -> 2147483648
HBase Client Write Buffer 2097152 -> 8388608
create test database "testdb1" with column family "f1"
> create 'testdb1','f1'
0 row(s) in 1.6290 seconds
[Test 1]
each thread own its private connection to Thrift, so there are 10 connections totally in this test.
test code: https://github.com/feifangit/hbase-thrift-performance-test/blob/master/connectioninthread.py
result: 6.9139 seconds -> 60 000 records (60MB)
[Test 2]
use one global connection, each thread should acquire the global reentrant lock before write.
test code: https://github.com/feifangit/hbase-thrift-performance-test/blob/master/sharedconnection.py
result: 16.345 seconds -> 60 000 records(60MB)
Summary
Uh... Of course, 10 connections(test 1) are much faster than single connection(test 2). thrift is not the bottle neck in this test. More thrift connections bring you better performance.
Next week, I'll add a tornado Web application in front of the thrift interface to collect mass data.
I'll try to reach the best performance as I can.
[Additional]
I also did load test with 6 million records. It cost me 896 seconds(14 mins), so avg. time to store a record is 0.1ms. impressive performance!!
Here's the server status:(the regionserver where thrift located)
![]() |
CPU Time |
![]() |
IO |
![]() |
Memory |
Thanks for sharing such informative article on Loadrunner Automation testing tool. This load testing tool will provide most precise information about the quality of software. Loadrunner training in Chennai
ReplyDeleteThanks for your informative article on software testing. Your post helped me to understand the future and career prospects in software testing. Keep on updating your blog with such awesome article. Software testing course in Chennai| Software testing training in Chennai
ReplyDeleteThank you very useful struts training in chennai
ReplyDeleteVery informative article!! Keep updating your information to gain more knowledge
ReplyDeleteSoftware training | Software Testing Training in Chennai
now present in your city cara menggugurkan hamil
ReplyDelete1. manfaat kurma untuk persalinan
2. manfaat buah nanas
3. aktivitas penyebab keguguran
4. apakah usg berbahaya
5. penyebab telat haid
Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.
ReplyDeleteMicrosoft Windows Azure Training | Online Course | Certification in chennai | Microsoft Windows Azure Training | Online Course | Certification in bangalore | Microsoft Windows Azure Training | Online Course | Certification in hyderabad | Microsoft Windows Azure Training | Online Course | Certification in pune
cool, please guidance so that I can create a blog like yours cara menggugurkan kandungan
ReplyDeletecara melancarkan haid
aktivitas penyebab keguguran
tanda tanda keguguran
manfaat dan bahaya buah nanas
tanda tanda kehamilan
cara membaca hasil usg
hamil muda
cara mengatasi keputihan
pengaruh kista saat hamil