Thursday, March 29, 2012

Write Data to HBase over thrift (Python)

[[Find Thrift interface file]]  

the Thrift interface file should be located under
$HBASE_HOME/src/main/resource/org/apache/hadoop/hbase/thrift
 

[[Install Thrift]]
I tested on both Ubuntu 11.04 x32 & CentOS 5 x64.
+Download
  wget http://mirrors.axint.net/apache/thrift/0.8.0/thrift-0.8.0.tar.gz
  tar -xzvf thrift-0.8.0.tar.gz
+Compile & Install
  cd thrift-0.8.0
  ./configure
  make
  sudo make install

  //try thrift command, you should get the usage information
+Install Thrift library for your language
  Trhift provided a lot of libraries for different languages.
  I'll use python to make an example. So, install the python library for thrift first.
  cd thrift-0.8.0/lib/py
  sudo python setup.py isntall

  //verify the installaton by run "import thrift" in the python interactive shell.

[[Generate Hbase library "header file"]]
  thrift --gen py hbase.thrift  You'll get a folder named "gen-py", those are the python header files


[[write a script]]
  Let's write a script to 1,Create a table; 2,Show table names; 3,Inseart some data; 4,Read them.
   1: import sys
   2: sys.path.append('/root/Desktop/working/gen-py')
   3:  
   4: from thrift.transport.TSocket import TSocket
   5: from thrift.transport.TTransport import TBufferedTransport
   6: from thrift.protocol import TBinaryProtocol
   7: from hbase import Hbase
   8:  
   9:  
  10: transport = TBufferedTransport(TSocket('10.1.2.127', 9090))
  11: transport.open()
  12: protocol = TBinaryProtocol.TBinaryProtocol(transport)
  13: client = Hbase.Client(protocol)
  14:  
  15: columns = []
  16: col = Hbase.ColumnDescriptor();
col.name = "data:"
columns.append(col);
 
  17: client.createTable("test", columns)
  18: print client.getTableNames()
  19:  
  20: mutations = [Hbase.Mutation(column="data:1",value='value1')]
  21: client.mutateRow("test", "row1", mutations )
  22:  
  23: print client.getRow('test', 'row1')


[[test the script]]
  Make sure the thrift-server is running. (in this sample script, thrift server is running on the same machine)
  If you can not make your thrift-server run in a Cluodera-manager-managed cluster, look at the tail of http://blog.thisisfeifan.com/2012/03/set-up-cdh3-cluster.html
  Run the script: "python t2.py", Get stdout result:
   1: ['test']
   2: [TRowResult(columns={'data:1': TCell(timestamp=1333062795476L, value='value1')}, row='row1')]


[[Performance]]
 There's an article compared the performance between thrift python client and HBase native JAVA API by Jython.
http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html




[[Verify the stability when region server down]]
  As we known, Hbase is based on HDFS file system, and HDFS keeps replicas in data nodes by its coherency model
  You can read more in "Hadoop: The Definitive Guide" Chapter 3 > DataFlow
  And the setting to indicate how many replicas is "dfs.replication" in <hdfs-site.xml>. Deafault value is 3. It means, every data block in HDFS own 2 copies.
  Make a case to verify whether it work as we expect.
 
  Steps:
  1, On region server "REGIONSRV3", create a table named "test", and write some data in it.
  2, Check the table status from HBase master page. "http://HBASEMASTER:60010/table.jsp?name=test"
  It shows the "Table regions" is located on "REGIONSRV3", and the table is enabled.
  3, Then turn this region server "REGIONSRV3" down.
  4, Our expect we still able to query the table content from the cluster, coz there're 2 copies of the data in other alive nodes.
   run "scan 'test'" in hbase shell, we can see the result. That's what we expected :)


   1: hbase(main):025:0> scan 'test'
   2: ROW COLUMN+CELL 
   3: row1 column=data:1, timestamp=1332891318009, value=value1 
   4: row2 column=data:2, timestamp=1333049644415, value=value2 
   5: row3 column=data:3, timestamp=1333053002019, value=value3 
   6: 3 row(s) in 0.0890 seconds

  5, Check the table properties by URL "http://HBASEMASTER:60010/table.jsp?name=test"
   The "Table regions" of the table had moved to "REGIONSRV4"

18 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Course in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training Chennai). By the way you are running a great blog. Thanks for sharing this.

    Big Data Training in Chennai | Big Data Training

    ReplyDelete
  2. I agree with your thoughts!!! As the demand of java programming application keeps on increasing, there is massive demand for java professionals in software development industries. Thus, taking training will assist students to be skilled java developers in leading MNCs. J2EE Training in Chennai | JAVA Training Institutes in Chennai

    ReplyDelete
  3. Thank you so much for sharing this worth able content with us. The concept taken here will be useful for my future programs and i will surely implement them in my study. Keep blogging article like this.

    python training in bangalore|

    ReplyDelete
  4. Thanks for providing a valuable information with us. get an offer for MSBI Online Training

    ReplyDelete
  5. Nice post about MSBI, looking for best msbi online training institute ?

    ReplyDelete
  6. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
    Hadoop Training Institute In chennai

    ReplyDelete
  7. Have you been thinking about the power sources and the tiles whom use blocks I wanted to thank you for this great read!! I definitely enjoyed every little bit of it and I have you bookmarked to check out the new stuff you post

    java training in annanagar | java training in chennai

    java training in marathahalli | java training in btm layout

    java training in rajaji nagar | java training in jayanagar

    ReplyDelete
  8. Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to read  about their market situation nowadays.
    python training in chennai
    python training in Bangalore
    Python training institute in chennai

    ReplyDelete
  9. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Selenium Training in Chennai | Selenium Training in Bangalore | Selenium Training in Pune | Selenium online Training

    ReplyDelete
  10. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Selenium Training in Chennai | Selenium Training in Bangalore | Selenium Training in Pune | Selenium online Training

    ReplyDelete
  11. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

    Devops Training in pune
    DevOps online Training

    ReplyDelete
  12. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

    rpa interview questions and answers
    automation anywhere interview questions and answers
    blueprism interview questions and answers
    uipath interview questions and answers
    rpa training in chennai

    ReplyDelete
  13. Thank you for an additional great post. Exactly where else could anybody get that kind of facts in this kind of a ideal way of writing? I have a presentation next week, and I’m around the appear for this kind of data.
    angularjs Training in bangalore

    angularjs Training in btm

    angularjs Training in electronic-city

    angularjs online Training

    angularjs Training in marathahalli

    angularjs interview questions and answers

    ReplyDelete