Thursday, April 26, 2012

Gnome classic in Ubuntu 12.04

Today, April 26, Ubuntu 12.04 finally released.

As previously said, GNOME classic had be included in the 12.04 Beta. But, unfortunately, in today's distribution, I did not see that after installation finished.

And I can not even find the package "gnome-session-fallback" in  "Software Center" :(

However, run the apt-get command in terminal will install the classic package...
sudo apt-get install gnome-seesion-fallback

restart, select "Gnome classic" in log-in screen.
Gosh... it's back

Add some widgets to top panel by holding ALT key.

Thursday, April 19, 2012

Run Map Reduce application from Eclipse


Run a Map/Reduce project from on the configured Eclipse in previous blog.

1, create project
"File" > "New Project" > "Map/Reduce Project"
Assume project name as "ttt".
You can see your project structure like following.

2, create package "com.hadoop.test" under "src" 

3, copy <PiEstimator.java> under package "com.hadoop.test" in the project

4, modify package declaration of of <PiEstimator.java> from
"package org.apache.hadoop.examples;"
to
"package com.hadoop.test;"


5, modify "Run Configuration" of this project
Since the Pi estimator application requires 2 arguments, add 2 arguments "10 10000" in the tab "Arguments"

6, Run the application
Get following print out in "Console" tab.

Number of Maps  = 10
Samples per Map = 10000
12/04/19 15:24:10 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
...
...
Job Finished in 7.448 seconds
Estimated value of Pi is 3.14120000000000000000

7, Extra:
a) invoke HDFS API to show content of a file in HDFS

String uri = "hdfs://10.1.2.124/user/td.txt";
 Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(URI.create(uri), conf);
 InputStream in = null;
 try{
 in = fs.open(new Path(uri));
 IOUtils.copyBytes(in, System.out, 4096, false);
 }finally{
 IOUtils.closeStream(in);
 }


b) invoke HBase API:
> Download HBase package, same as the runtime version.
e.g.http://archive.cloudera.com/cdh/3/hbase-0.90.4-cdh3u3.tar.gz
> add some libraries. In "project properties" > "Libraries" > "Add External JARs...", add at least HBASE_PKG/<hbase-0.90.4-cdh3u3.jar> and HBASE_PKG/lib/<zookeeper-3.3.4-cdh3u3.jar>.
Code slice, show properties of HBase table "test".


Configuration config = HBaseConfiguration.create();
    config.set("hbase.zookeeper.quorum", "10.1.2.124");
    
    String tablename = "test";
    HTable table = new HTable(config, tablename);
    Scan scan = new Scan();
    ResultScanner scanner = table.getScanner(scan);
    try{
    for(Result scannerResult: scanner)
    System.out.println("scan:" + scannerResult);
    }finally{
    scanner.close();
    }    



Hadoop Eclipse plugin for CDH3 u3

There's a general Hadoop Eclipse plugin (http://wiki.apache.org/hadoop/EclipsePlugIn). But it's built against Apache Hadoop, and it's incompatible with CDH3 cluster.

On a CDH3 installed machine, we can find some additional tools and libraries under folder "/usr/lib/hadoop/contrib", they were built from source code under HADOOP_PKG/src/contrib. And the Eclipse plugin source code is also there, it was just not compiled.

Build your CDH3 Eclipse plugin yourself
(under ubuntu x32 Desktop)
1, Download Hadoop package from Cloudera (http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u3.tar.gz).

2, Install compile tools:
a) Eclipse: I failed compile with the Eclipse which installed from Ubuntu software center, so, please download eclipse from eclipse.org. Eclipse Indigo 3.7.2:http://mirrors.med.harvard.edu/eclipse//eclipse/downloads/drops/R-3.7.2-201202080800/eclipse-SDK-3.7.2-linux-gtk.tar.gz
b) apt-get install maven2


3, compile Hadoop
  cd HADOOP_PKG
  ant

4, compile Eclipse Plugin
we need to point out Eclipse position, because some jar files under ${eclipse.home}/plugins will be used during the compiling.
  cd HADOOP_PKG/src/contrib/eclipse-plugin
  ant -Declipse.home=XXXX -Dversion=0.20.2-cdh3u3 jar

If everything is OK, the final eclipse plugin jar will generated under HADOOP_PKG/build/contrib/eclipse-plugin/


Work with Eclipse
1, install the plugin
put the plugin jar under <Eclipse_ROOT>/plugins

2, basic configuration 
Fill out your Hadoop installation directory in "Window" > "Preferences" > "Hadoop Map/Reduce"
Eclipse will load some Hadoop libraries when writing a Map/Reduce  project.

3, add a CDH3 cluster information
In "Map/Reduce Locations" window, add "New Hadoop location...", fill the cluster information.

4, A problem u MUST meet :(
After you add the M/R location, you can click the "DFS Locations"  in "Project Explorer" to browse HDFS. Unfortunately, you must see the error dialog telling you about connecting error.

"An internal error occurred during: "Connecting to DFS vm".
org/apache/hadoop/thirdparty/guava/common/collect/LinkedListMultimap"


That's because the eclipse plugin can not find the guava jar.
Fix it: merge guava jar to eclipse plugin jar.
find out the "guava-r09-jarjar.jar" under HADOOP_PKG/lib, copy content inside (folder "org/") to <plugin_jar>/classes/.
You should see 2 folder "eclipse" & "thirdparty" under <plugin_jar>/classes/org/apache/hadoop/.
Put the fixed plugin jar file back to <Eclipse_ROOT>/plugins. You should be able to browse HDFS from "DFS Locations" now.

* MAKE SURE DO NOT DESTROY THE JAR STRUCTURE WHILE MERGING THEM. If you're using an archive tool, such as "File Roller" in Ubuntu, just drag  the "org" to folder /classes/ in GUI.

5, problems u might meet
a) No "Map/Reduce Project" choice in "New Project" wizard.
  the plugin jar works only with the specified version that compiled with. If you copy the plugin jar to another version eclipse, that might not work.


Tuesday, April 17, 2012

An Error in thrift python sample


root@ubuntu:/home/feifan/Desktop/thrift-0.8.0/tutorial/py# python PythonServer.py 
Starting the server...
Traceback (most recent call last):
  File "PythonServer.py", line 95, in <module>
    server.serve()
  File "/usr/local/lib/python2.7/dist-packages/thrift/server/TServer.py", line 74, in serve
    self.serverTransport.listen()
  File "/usr/local/lib/python2.7/dist-packages/thrift/transport/TSocket.py", line 136, in listen
    res0 = self._resolveAddr()
  File "/usr/local/lib/python2.7/dist-packages/thrift/transport/TSocket.py", line 31, in _resolveAddr
    return socket.getaddrinfo(self.host, self.port, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, socket.AI_PASSIVE | socket.AI_ADDRCONFIG)
TypeError: getaddrinfo() argument 1 must be string or None

How to Fix:
maybe construction method of class TServerSocket had been changed from a previous version, so, modify file <PythonServer.py>:

#transport = TSocket.TServerSocket(9090)
transport = TSocket.TServerSocket('localhost', 9090)


Monday, April 2, 2012

Run Hive with HBase


[[Reference]]
Hive HBase Integration (https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration)


+Error you might meet
12/03/30 23:59:28 ERROR hive.log: Got exception: org.apache.hadoop.security.AccessControlException org.apache.hadoop.security.AccessControlException: Permission denied: user=feifan, access=WRITE, inode="/user/beeswax/warehouse":hue:hive:drwxrwxr-x
12/03/30 23:59:28 ERROR hive.log: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=feifan, access=WRITE, inode="/user/beeswax/warehouse":hue:hive:drwxrwxr-x

  >That's because of the folder "/user/beeswax/warehouse" permission
  Run "sudo -u hdfs hadoop fs -chmod 777 /user/beeswax/warehouse"