Hadoop和Hbase动态扩展

Hadoop和Hbase动态扩展 环境Centos7.2 64位hadoop-2.6.0-cdh5.5.2hbase-1.0.0-cdh5.5.2jdk1.8.0_91master192.168.205.153slave1192.168.205.154slave2192.168.205.155新增节点slave3192.168.205.156一、hadoop添加节点● 添加节点有两种方式一种是静态添加关闭hadoop集群配置相应配置重启集群这个就不再重述了● 动态添加在不重启集群的情况下添加节点1.准备工作(1).修改slave3的主机名[rootlocalhost ~]# vi /etc/hosts192.168.205.153 h153 192.168.205.154 h154 192.168.205.155 h155 192.168.205.156 h156[rootlocalhost ~]# vi /etc/hostnameh156[rootlocalhost ~]# vi /etc/sysconfig/networkNETWORKINGyesHOSTNAMEh156[rootlocalhost ~]# reboot(2).在slave3中创建hadoop用户名安装jdk1.8.0_91具体步骤请参考我的另一篇文章http://blog.csdn.net/m0_37739193/article/details/71222673并且把slave2的hadoop-2.6.0-cdh5.5.2复制到slave3的相应目录下[hadooph155 ~]$ scp -r hadoop-2.6.0-cdh5.5.2/ h156:/home/hadoop/注意一开始我按上面的步骤把slave2上的hadoop-2.6.0-cdh5.5.2复制到slave3上但到最后在浏览器上查看DataNode information发现slave2和slave3不能共存三个节点只显示有两个h154一直都在而h155和h156竞争有我则没你也是奇了怪了。。解决应该是把主节点master的hadoop-2.6.0-cdh5.5.2目录复制到slave3上(3).配置namenode节点和resourcemanager节点到slave03的免登录[hadooph153 ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h156(4).在master、slave1和slave2的/etc/hosts文件中添加如下一行使这四台机器的/etc/hosts一致因为是动态扩展所以不需要重启192.168.205.156 h1562.动态新增节点(1).修改master、slave1、slave2、slave3的etc/hadoop/slaves文件添加新增的节点slave3[hadooph153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/slavesh154h155h156(2).在新增的slave3节点执行命令./sbin/hadoop-daemon.sh start datanode启动datanode[hadooph156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/hadoop-daemon.sh start datanodestarting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/hadoop-hadoop-datanode-h156.out[hadooph156 hadoop-2.6.0-cdh5.5.2]$ jps2898 Jps2854 DataNode(3).在新增的slave3节点执行命令./sbin/yarn-daemon.sh start nodemanager启动nodemanager[hadooph156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/yarn-daemon.sh start nodemanagerstarting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/yarn-hadoop-nodemanager-h156.out[hadooph156 hadoop-2.6.0-cdh5.5.2]$ jps2854 DataNode3015 Jps2952 NodeManager(4)新增的节点slaves启动了DataNode和NodeManager实现了动态向集群添加了节点在浏览器上登录h153:50070查看DataNode information登录h153:8088查看Nodes of the cluster二、动态删除节点1.master配置启用动态删除节点在etc/hadoop/目录下添加excludes文件配置需要输出的节点[hadooph153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludesh1562.master修改etc/hadoop/hdfs-site.xml添加如下内容[hadooph153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/hdfs-site.xmlproperty namedfs.hosts.exclude/name value/home/hadoop/hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes/value /property3.master修改mapred-site.xml添加如下内容[hadooph153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/mapred-site.xmlproperty namemapred.hosts.exclude/name value/home/hadoop/hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes/value finaltrue/final /property4.master修改完这些配置文件后执行命令命令./bin/hadoop dfsadmin -refreshNodes[hadooph153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -refreshNodesDEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 17/09/16 02:26:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Refresh nodes successful5.可用./bin/hadoop dfsadmin -report或web界面查看slave3节点状态变由Normal-decomissioning-Decommissioned。[hadooph153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -reportDEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 17/09/16 02:29:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 37492891648 (34.92 GB) Present Capacity: 31898951680 (29.71 GB) DFS Remaining: 31898517504 (29.71 GB) DFS Used: 434176 (424 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (3): Name: 192.168.205.154:50010 (h154) Hostname: h154 Decommission Status : Normal Configured Capacity: 18746441728 (17.46 GB) DFS Used: 212992 (208 KB) Non DFS Used: 2797916160 (2.61 GB) DFS Remaining: 15948312576 (14.85 GB) DFS Used%: 0.00% DFS Remaining%: 85.07% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 7 Last contact: Sat Sep 16 02:29:14 CST 2017 Name: 192.168.205.155:50010 (h155) Hostname: h155 Decommission Status : Normal Configured Capacity: 18746441728 (17.46 GB) DFS Used: 212992 (208 KB) Non DFS Used: 2796023808 (2.60 GB) DFS Remaining: 15950204928 (14.85 GB) DFS Used%: 0.00% DFS Remaining%: 85.08% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 7 Last contact: Sat Sep 16 02:29:13 CST 2017 Name: 192.168.205.156:50010 (h156) Hostname: h156 Decommission Status : Decommissioned Configured Capacity: 18746441728 (17.46 GB) DFS Used: 8192 (8 KB) Non DFS Used: 2058260480 (1.92 GB) DFS Remaining: 16688173056 (15.54 GB) DFS Used%: 0.00% DFS Remaining%: 89.02% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sat Sep 16 02:29:14 CST 20176.在slave3节点上关闭datanode和nodemanager进程运行./sbin/hadoop-daemon.sh stop datanode和./sbin/yarn-daemon.sh stop nodemanager[hadooph156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/hadoop-daemon.sh stop datanodestopping datanode[hadooph156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/yarn-daemon.sh stop nodemanagerstopping nodemanager[hadooph156 hadoop-2.6.0-cdh5.5.2]$ jps10148 Jps7../bin/hadoop dfsadmin -report或web界面查看节点状态当第6步执行完后并没有马上出现下面的效果而是等了很长时间[hadooph153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -reportDEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 17/09/16 02:52:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 37492883456 (34.92 GB) Present Capacity: 31898931200 (29.71 GB) DFS Remaining: 31898456064 (29.71 GB) DFS Used: 475136 (464 KB) DFS Used%: 0.00% Under replicated blocks: 17 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.205.154:50010 (h154) Hostname: h154 Decommission Status : Normal Configured Capacity: 18746441728 (17.46 GB) DFS Used: 237568 (232 KB) Non DFS Used: 2797936640 (2.61 GB) DFS Remaining: 15948267520 (14.85 GB) DFS Used%: 0.00% DFS Remaining%: 85.07% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 7 Last contact: Sat Sep 16 02:52:32 CST 2017 Name: 192.168.205.155:50010 (h155) Hostname: h155 Decommission Status : Normal Configured Capacity: 18746441728 (17.46 GB) DFS Used: 237568 (232 KB) Non DFS Used: 2796015616 (2.60 GB) DFS Remaining: 15950188544 (14.85 GB) DFS Used%: 0.00% DFS Remaining%: 85.08% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 7 Last contact: Sat Sep 16 02:52:34 CST 2017 Dead datanodes (1): Name: 192.168.205.156:50010 (h156) Hostname: h156 Decommission Status : Decommissioned Configured Capacity: 0 (0 B) DFS Used: 0 (0 B) Non DFS Used: 0 (0 B) DFS Remaining: 0 (0 B) DFS Used%: 100.00% DFS Remaining%: 0.00% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 0 Last contact: Sat Sep 16 02:30:56 CST 20178.默认balancer的threshold为10%即各个节点与集群总的存储使用率相差不超过10%我们可将其设置为5%然后启动Balancersbin/start-balancer.sh -threshold 5等待集群自均衡完成即可[hadooph153 hadoop-2.6.0-cdh5.5.2]$ sbin/start-balancer.sh -threshold 5starting balancer, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/hadoop-hadoop-balancer-h153.out三、添加HBase节点192.168.205.1561.准备工作和hadoop增加节点的基本一样2.把主节点的hbase-1.0.0-cdh5.5.2复制到h156上[hadooph153 ~]$ scp -r hbase-1.0.0-cdh5.5.2/ h156:/home/hadoop/3.修改所有节点hbase安装目录下conf中的regionservers里的内容新增h156[hadooph153 ~]$ vi hbase-1.0.0-cdh5.5.2/conf/regionserversh154h155h1564.在h156上启动regionserver[hadooph156 hbase-1.0.0-cdh5.5.2]$ bin/hbase-daemon.sh start regionserver[hadooph156 hbase-1.0.0-cdh5.5.2]$ jps2804 HRegionServer2939 Jps5.在新启动的节点h156上打开hbase shell如下设置balance_switch true四、hbase删除HRegionServer192.168.205.1551.方法一(1).节点关闭之前要先禁用负载均衡在要关闭的节点上启动hbase shellhbase(main):003:0 balance_switch status 2 servers, 0 dead, 1.0000 average load true 0 row(s) in 0.0660 seconds 我刚重启hbase集群后查看是true可是再查看马上就又成false了 -- 后来 还以为是查看balance当前状态的, 结果就搞笑了, 一会儿true, 一会儿falsebalance_switch status 经测试, 这个命令的意思是将balance状态强制转换为false, 无论之前是true还是false, 返回的是之前的状态, 所以一会儿true, 一会儿false, 所以这个命令很鸡肋, 别乱用, 查看当前状态的命令是balancer_enabled。 参考https://blog.csdn.net/u011250186/article/details/134044423 hbase(main):004:0 balance_switch false改为false true原来是true 0 row(s) in 0.0700 seconds(2).在需要删除的RegionServer上执行以下命令./bin/hbase-daemon.sh stop regionserver[hadooph155 ~]$ jps2707 NodeManager2915 QuorumPeerMain11012 Jps2601 DataNode3082 HRegionServer[hadooph155 hbase-1.0.0-cdh5.5.2]$ ./bin/hbase-daemon.sh stop regionserverstopping regionserver....● region服务器会先关闭所有region然后把自己停止。这句话不是很明白● 等待zk超时后就会过期。● master会将这台机器上的region移动到别的机器上[hadooph155 hbase-1.0.0-cdh5.5.2]$ jps11074 Jps2707 NodeManager2915 QuorumPeerMain2601 DataNode注意此节点将会在Zookeeper消失。Master注意到了此RegionServer挂掉了它将会重新分配掉的这些Region。在停掉一个节点的时候注意要先关闭Load Balancer即第1步若已经为false就不用管了因为Load Balancer可能要和Master的恢复机制争夺停掉的RegionServer2.方法二(1).方法一中region下线会带来一定的服务不可用时间时间取决于zk的超时。这种方式不是很好。所以最好使用graceful_stop[hadooph155 ~]$ jps2707 NodeManager2915 QuorumPeerMain11012 Jps2601 DataNode3082 HRegionServer[hadooph155 hbase-1.0.0-cdh5.5.2]$ ./bin/graceful_stop.sh h1552017-09-16T03:53:08 Disabling load balancer 2017-09-16 03:53:13,443 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2017-09-16 03:53:15,642 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-09-16T03:53:18 Previous balancer state was false 2017-09-16T03:53:18 Unloading h155 region(s) 2017-09-16 03:53:23,336 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-09-16 03:53:25,345 INFO [main] zookeeper.RecoverableZooKeeper: Process identifierhconnection-0x3679d92e connecting to ZooKeeper ensembleh154:2181,h153:2181,h155:2181 2017-09-16 03:53:25,362 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version3.4.5-cdh5.5.2--1, built on 01/25/2016 17:46 GMT 2017-09-16 03:53:25,362 INFO [main] zookeeper.ZooKeeper: Client environment:host.nameh154 2017-09-16 03:53:25,362 INFO [main] zookeeper.ZooKeeper: Client environment:java.version1.8.0_91 2017-09-16 03:53:25,362 INFO [main] zookeeper.ZooKeeper: Client environment:java.vendorOracle Corporation 2017-09-16 03:53:25,362 INFO [main] zookeeper.ZooKeeper: Client environment:java.home/usr/jdk1.8.0_91/jre 2017-09-16 03:53:25,362 INFO [main] zookeeper.ZooKeeper: Client environment:java.class.pathjar包太多我省略了。。。 2017-09-16 03:53:25,402 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path/home/hadoop/hbase-1.0.0-cdh5.5.2/bin/../lib/native/Linux-amd64-64 2017-09-16 03:53:25,402 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir/tmp 2017-09-16 03:53:25,403 INFO [main] zookeeper.ZooKeeper: Client environment:java.compilerNA 2017-09-16 03:53:25,403 INFO [main] zookeeper.ZooKeeper: Client environment:os.nameLinux 2017-09-16 03:53:25,403 INFO [main] zookeeper.ZooKeeper: Client environment:os.archamd64 2017-09-16 03:53:25,403 INFO [main] zookeeper.ZooKeeper: Client environment:os.version3.10.0-327.el7.x86_64 2017-09-16 03:53:25,408 INFO [main] zookeeper.ZooKeeper: Client environment:user.namehadoop 2017-09-16 03:53:25,408 INFO [main] zookeeper.ZooKeeper: Client environment:user.home/home/hadoop 2017-09-16 03:53:25,408 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir/home/hadoop/hbase-1.0.0-cdh5.5.2 2017-09-16 03:53:25,409 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectStringh154:2181,h153:2181,h155:2181 sessionTimeout90000 watcherhconnection-0x3679d92e0x0, quorumh154:2181,h153:2181,h155:2181, baseZNode/hbase 2017-09-16 03:53:25,494 INFO [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Opening socket connection to server h153/192.168.205.153:2181. Will not attempt to authenticate using SASL (unknown error) 2017-09-16 03:53:25,503 INFO [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.205.154:40286, server: h153/192.168.205.153:2181 2017-09-16 03:53:25,521 INFO [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Session establishment complete on server h153/192.168.205.153:2181, sessionid 0x15e867a6d3e0003, negotiated timeout 40000 RuntimeError: Server h155:60020 not online stripServer at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:192 unloadRegions at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:301 (root) at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:484 2017-09-16T03:53:26 Unloaded h155 region(s) 2017-09-16T03:53:26 Stopping regionserver stopping regionserver..[hadooph155 hbase-1.0.0-cdh5.5.2]$ jps11074 Jps2707 NodeManager2915 QuorumPeerMain2601 DataNode(2).由于会关闭hbase的balancer因此需要在其他节点上打开hbase shell检查hbase状态同时重新设置hbase(main):001:0 balance_switch truefalse疑问当我把这个设为true后过一会儿马上就又变成false既然它马上自动变为false那又何必去管它呢参考http://blog.csdn.net/Mark_LQ/article/details/53393081https://www.cnblogs.com/zlingh/p/3983984.html