Wednesday, March 28, 2012

Set up a CDH3 cluster

[[Nodes distributing]]

I'm building a 4-nodes cluster. Here's plan:
Node A:
  HDFS: namenode+job_tracker
  HBase: hbase master, zookeeper server(quorum peer)
  Other: Hue
Node B:
  HDFS: datanode1+task_tracker
  HBase: region server1
Node C:
  HDFS: datanode2+task_tracker
  HBase: region server2
Node D:
  HDFS: datanode3+task_tracker
  HBase: region server3

[[Manually set up the cluster on Ubuntu 11.10 x64 servers]]

Follow these steps strictly:  https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster
*configuration management:
list all profiles
    sudo update-alternatives --display hadoop-0.20-conf
  add new profile based on conf.empty (use a greater value to indicate the profile priority)
    sudo cp -r  /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.my_cluster
    sudo update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster 50
  remove profile
    sudo update-alternatives --remove hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster

*config DNS
edit /etc/hosts

*config hostname
/etc/hostname (Ubuntu)
  /etc/sysconfig/network-scripts/ifcfg-eth0 (CentOS)
  reboot

Problems you might meet:
  > Exception description containing "org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java"
    sol 1, make sure the owner of dfs.name.dir and dfs.data.dir directionries is the hdfs user.(https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster#CDH3DeploymentonaCluster-ConfiguringLocalStorageDirectoriesforUsebyHDFSandMapReduce )
    sol 2, delete folder configured in <core-site.xml>:hadoop.tmp.dir , <hdfs-site.xml>:dfs.data.dir and <hdfs-site.xml>:dfs.name.dir

  > Exception "java.net.BindException: Problem binding to xxx/ip:port", "ipc.RPC....."
    make sure you're starting job traker on the machine you configured. JobTracker is the one, like "master" while TaskTrackers works like "slaves"


  > Other strange problems:
    you might need to disable ipv6 on ubuntu
    make sure /etc/hosts & /etc/hostname were correctly configured



[[Deploy & manage the cluster by tool]]

Once you tried the Cloudera Manager, you may feel regret of “manually done”. However, manually set up the cluster is kind of experience, which will help you understand, look deep in some confused issues.

Requirement:  https://ccp.cloudera.com/display/FREE373/Requirements+for+Cloudera+Manager
  I used CentOS 5 x64 to build up the cluster.

Steps:
  Follow: https://ccp.cloudera.com/display/FREE374/Cloudera+Manager+Free+Edition+Installation+Guide

If you're using VM:
  1, prepare 2 vm, 1 for namnode, 1 for datanode
  2, install cloudera manager on vm1, then install components on vm2 over cloudera manager web UI
  3, config vm1~vm5 in DNS file(/etc/hosts) in both vm1 and vm2.
  4, config authed ssh login from vm1 to vm2
  5, clone vm2 to vm3, vm4, vm5...
  6, config IP(/etc/sysconfig/network-scripts/ifcfg-eth0) and hostname(/etc/sysconfig/network) on vm3~vm5, and restart network service(/sbin/service network restart)
  7, refresh cloudera manager UI, you should see 5 vms is ready.

Problems you might meet:
+ERROR 1: can not recognize OS version
  >you should use Red Hat-compatible systems or SUSE systems

+ERROR 2: installing components in cluster
  error msg in installation detail "scm agent could not be started, giving up"
  > host name not be set properly
  follow instruction here:https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster#CDH3DeploymentonaCluster-ConfigureNetworkNames
  (If you can not get correct result with command host -v -t -A `hostname` as instrauction said, it doesn't matter, forget it)
  > error msg "remote package cloudera-manager-agent is not available, giving up" while installing Cloudera Manager agent package
  target node is not x64, no suitable cloudera-manager-agent for x32

+ERROR 3:
  install all components on a node successfully, not can not see the node in hosts list.
  > turn off firewall
  use /sbin/service iptables status/stop to indicate firewall status and turn it off if necessary.
  /sbin/service iptables stop
  chkconfig --level 35 iptables off

+ERROR 4:
  permission error "org.apache.hadoop.security.AccessControlException: Permission denied: user=xxx, access=WRITE, inode=" while running sample jobs in HUE web UI.
  > there's a warning "This job commonly fails because /user/<your-user-name> is not writable." before you run sample job "Pi Calculator".
  default permission of "/user/" foloder in HDFS is "drwxr-xr-x User:hdfs, Group hadoop"
  So, we have to change its permission.
  Now, check this link: https://ccp.cloudera.com/display/CDHDOC/Hue+Installation#HueInstallation-Authentication
  It said "every Hue user who wants to use the Shell application must have a Unix user account with the same name on the server that runs Hue."
  >>
  solution 1: add user "hdfs" by "Admin console" in HUE Web UI. And run the job with user "hdfs"
  solution 2: add user "hdfs" by "Admin console" in HUE Web UI. Change the permission of "/user/" to "everyone writable"(777).
  solution 3: in terminal, run "sudo -u hdfs hadoop fs -chmod 777 /user/"
 Run the job with other users.


[[Add more services on Cloudera Manager-managed cluster]]

  Cloudera manager-managed cluster using database to store its configurations, you can not find any meaningful conf under /etc/hadoop or /etc/hbae. So when you have to add services manually, you have to fill the conf files under /etc/hadoop or /etc/hbase, otherwise manually installed service will not find the conf currently cluster using.
Generate client configuration
  >Click "setting" icon in the top right corner.
  Select "Export" and "Download Configuration Script"

Copy 2 configuration folders to each nodes, follow the 2 readme text files in the folder
  >Hadoop: use alternatives tool to lead hadoop use the new configuration folder "hadoop-conf/"
  >Hbase & Zookeeper: export HBASE_CONF_DIR and ZOOKEEPER_CONF = "hbase_conf/"

Start service (e.g. thrift server)
  run: hbase thrift start
  If you did not follow the previous step, you will get an error "zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect"

Update (good news):
 In CDH4, you can use "Deploy Client Configuration..." in service actions menu instead of copy configuration files to each cluster member.
e.g. 
 after you deployed client configurations,  
 you will see folder "conf.cloudera.hbase1/" under /etc/hbase, and the /etc/hbase/conf was pointed to "/etc/hbase/conf.cloudera.hbase1/"
 you can start thrift by  command "hbase thrift start" (or install package hbase-thrift)

No comments:

Post a Comment