保姆级教程:在Ubuntu 20.04上从零搭建Apache Storm 2.1.0集群(含Zookeeper配置与WordCount实例)

保姆级教程:在Ubuntu 20.04上从零搭建Apache Storm 2.1.0集群(含Zookeeper配置与WordCount实例) 保姆级教程在Ubuntu 20.04上从零搭建Apache Storm 2.1.0集群含Zookeeper配置与WordCount实例1. 环境准备与基础配置在开始搭建Storm集群之前我们需要确保所有节点具备一致的基础环境。假设您已准备三台Ubuntu 20.04 LTS虚拟机分别命名为storm-nimbus、storm-supervisor1、storm-supervisor2每台建议配置至少2核CPU、4GB内存和20GB存储空间。1.1 系统级前置条件首先在所有节点执行以下基础配置# 更新软件包索引并升级现有软件 sudo apt update sudo apt upgrade -y # 安装必要工具链 sudo apt install -y openjdk-11-jdk python3 python3-pip net-tools验证Java安装是否成功java -version # 应输出类似openjdk version 11.0.12 2021-07-20关键配置项检查清单确保所有节点时区一致使用timedatectl命令检查检查/etc/hosts文件包含所有节点的IP-hostname映射禁用各节点的防火墙或开放必要端口详见3.2节1.2 用户与权限规划为Storm服务创建专用用户是生产环境的最佳实践sudo useradd -m storm sudo passwd storm # 设置密码 sudo usermod -aG sudo storm # 授予sudo权限提示后续所有操作建议在storm用户下执行避免权限问题2. Zookeeper集群部署作为Storm的协调服务Zookeeper需要先于Storm部署。我们将采用三节点集群确保高可用。2.1 安装与配置在所有节点执行# 下载并解压Zookeeper wget https://downloads.apache.org/zookeeper/zookeeper-3.7.0/apache-zookeeper-3.7.0-bin.tar.gz tar -xzf apache-zookeeper-3.7.0-bin.tar.gz -C /opt/ sudo ln -s /opt/apache-zookeeper-3.7.0-bin /opt/zookeeper创建配置文件/opt/zookeeper/conf/zoo.cfgtickTime2000 initLimit10 syncLimit5 dataDir/var/lib/zookeeper clientPort2181 server.1storm-nimbus:2888:3888 server.2storm-supervisor1:2888:3888 server.3storm-supervisor2:2888:38882.2 集群启动与验证在每个节点创建唯一的myid文件# 在nimbus节点 echo 1 | sudo tee /var/lib/zookeeper/myid # 在supervisor1节点 echo 2 | sudo tee /var/lib/zookeeper/myid # 在supervisor2节点 echo 3 | sudo tee /var/lib/zookeeper/myid启动服务并检查状态/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/bin/zkServer.sh status # 正常应显示Mode: leader或Mode: follower3. Storm集群核心配置3.1 安装Storm二进制包在所有节点执行wget https://downloads.apache.org/storm/apache-storm-2.1.0/apache-storm-2.1.0.tar.gz tar -xzf apache-storm-2.1.0.tar.gz -C /opt/ sudo ln -s /opt/apache-storm-2.1.0 /opt/storm3.2 关键配置文件详解编辑/opt/storm/conf/storm.yamlstorm.zookeeper.servers: - storm-nimbus - storm-supervisor1 - storm-supervisor2 nimbus.seeds: [storm-nimbus] storm.local.dir: /var/lib/storm supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 ui.port: 8080端口开放要求服务端口号方向Zookeeper2181所有节点入站Nimbus6627所有节点入站Supervisor6700本机入站UI8080Nimbus节点入站3.3 服务启动与守护在Nimbus节点/opt/storm/bin/storm nimbus /opt/storm/bin/storm ui 在Supervisor节点/opt/storm/bin/storm supervisor 验证服务状态jps # 应看到至少QuorumPeerMain(Nimbus节点)或Supervisor进程4. WordCount实例实战4.1 准备示例拓扑创建WordCountTopology.javaimport org.apache.storm.topology.*; import org.apache.storm.tuple.*; public class WordCountTopology { public static void main(String[] args) throws Exception { TopologyBuilder builder new TopologyBuilder(); builder.setSpout(spout, new RandomSentenceSpout(), 5); builder.setBolt(split, new SplitSentenceBolt(), 8) .shuffleGrouping(spout); builder.setBolt(count, new WordCountBolt(), 12) .fieldsGrouping(split, new Fields(word)); Config conf new Config(); conf.setDebug(true); StormSubmitter.submitTopology(word-count, conf, builder.createTopology()); } }4.2 提交与监控打包并提交拓扑mvn clean package /opt/storm/bin/storm jar target/wordcount-1.0.jar WordCountTopology在UI界面(http://nimbus-node:8080)查看运行状态正常应显示3个组件Spout 2 Bolts各组件应有活跃线程Throughput指标持续变化5. 深度问题排查指南5.1 常见错误与解决方案问题1Zookeeper连接失败检查storm.yaml中的服务地址验证防火墙设置查看Zookeeper日志/opt/zookeeper/logs/zookeeper.out问题2Supervisor无法注册确认Nimbus主机名解析正确检查supervisor.slots.ports配置查看Supervisor日志/opt/storm/logs/supervisor.log5.2 性能调优建议关键参数调整worker.heap.memory.mb: 2048 topology.max.spout.pending: 1000 topology.message.timeout.secs: 30监控指标关注点Spout的acked与failed比例Bolt的execute latencyWorker的GC时间在实际测试中建议先用storm kill命令停止拓扑后调整参数重新提交。对于资源密集型应用可以逐步增加supervisor.slots.ports数量并监控系统负载。