在win10上共装两台CentOS 7 虚拟机 IP 和 主机名分别为:
192.168.13.132 qq1 192.168.13.133 qq2两台计算机分配如下
将上面的 IP 和主机名分别复制到 win10 的 hosts 和 两台虚拟机的 /etc/hosts 文件中并保存
关闭防火墙: systemctl stop firewalls.service 禁止防火墙开机启动 systemctl disable firewalls.service
运行下面三条
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys然后分别打开 qq1、qq2虚拟机上的 ~/.ssh/authorized_keys文件 将 qq1 上的文件内容复制到 qq2 中(需换行),将 qq2 中的同样复制到 qq1 中 在两台虚拟机中分别测试登录,第一次登录需要确认,输入yes就行
[qgn@qq1 ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa Generating public/private rsa key pair. Created directory '/home/qgn/.ssh'. Your identification has been saved in /home/qgn/.ssh/id_rsa. Your public key has been saved in /home/qgn/.ssh/id_rsa.pub. The key fingerprint is: SHA256:wbUhCoMkBsTLLa0gZh9YKr6G5MNHpd/DZ41ZFYb39l4 qgn@qq1 The key's randomart image is: +---[RSA 2048]----+ |*+..o . o . | |.o.. o o o + + | |. B . o . o o | |+O + . . . o | |B + + S . . .| | + + . E| |= o . o = ..| |.* . . + = . .| |. o + | +----[SHA256]-----+ [qgn@qq1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [qgn@qq1 ~]$ chmod 0600 ~/.ssh/authorized_keys [qgn@qq1 ~]$ ssh qgn@qq1 The authenticity of host 'qq1 (::1)' can't be established. ECDSA key fingerprint is SHA256:nSck9x9q1Bg/tsu4Iavl7i7FbHxHyVp166Mdavv/p6k. ECDSA key fingerprint is MD5:02:4e:03:21:e2:5b:22:0a:12:31:5c:ad:bf:13:89:b3. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'qq1' (ECDSA) to the list of known hosts. Last login: Sat May 24 17:54:32 2019 from 192.168.13.1 [qgn@192.168.13.1 ~]$ [qgn@qq2 ~]$ ssh qgn@qq1 Last login: Fri May 24 17:58:37 2019 from 192.168.13.1 [qgn@qq1 ~]$ ssh qgn@qq2 Last login: Fri May 24 09:35:54 2019 from 192.168.13.1 [qgn@qq2 ~]$查看当前系统是否安装了JDK: java -version 查看所安装jdk的包:rpm -qa | grep java
例如:java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
卸载已安装jdk:rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64 安装 JDK 8.0 https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 下载后传到虚拟机中解压到 /usr/local/lib64 中 tar zxf jdk-8u211-linux-x64.tar.gz -C /usr/local/lib64 添加环境变量,将下面两行添加到 /etc/profiles中
export JAVA_HOME=/usr/local/lib64/jdk1.8.0_211 export PATH=$PATH:$JAVA_HOME/bin执行 source /etc/profiles 使变量生效 执行 java -version 查看是否成功
[qgn@qq1 ~]$ source /etc/profiles [qgn@qq1 ~]$ java -version java version "1.8.0_211" Java(TM) SE Runtime Environment (build 1.8.0_211-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)adoop官网:https://hadoop.apache.org/ hadoop2.9.2下载地址:https://hadoop.apache.org/releases.html 下载: wget http://www.pirbot.com/mirrors/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2-src.tar.gz 解压: tar -zxvf hadoop-2.9.2.tar.gz -C /opt/
配置文件在 hadoop/etc/hadoop 中
添加JAVA_HOME hadoop-env.shmapred-env.shyarn-env.sh 为上面三个文件添加: export JAVA_HOME=/usr/local/lib64/jdk1.8.0_211 core-site.xml 配置<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://qq1:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.9.2/data/tmp</value> </property> </configuration> qq1 换成自己的hostname 注意:/tmp表示临时存储目录,系统每次重启会按照脚本预先设置好的删除里面的文件,重新自定义系统生成的文件路径,/tmp会被清空,无法保证数据文件安全性hdfs-site.xml 文件配置<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.http.address</name> <value>qq1:50070</value> <description> The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>qq2:50090</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration> 指定 SecondaryNameNode 在 qq2 上运行mapred-site.xml 配置<configuration> <!--指定mapreduce程序运行在yarn平台上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> yarn-site.xml 配置<configuration> <!--指定启动运行mapreduce上的nodemanager的运行服务--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--指定resourcemanager主节点机器,可选项,不一定要配置,默认是本机,但是指定了之后在其他机器上启动,就会报错--> <property> <name>yarn.resourcemanager.hostname</name> <value>qq1</value> </property> </configuration> 指定 resourcemanager 在 qq1 上运行slaves 配置 指定从节点的机器位置,添加主机名即可qq1 qq2 添加环境变量 在 /etc/profile.d/ 下新建hadoop.sh文件并将下面三行输入 vim /etc/profile.d/hadoop.sh 使环境变量生效:source /etc/profile.d/hadoop.shexport HADOOP_HOME=/opt/hadoop-2.9.2 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin配置完成后将整个配置好的 hadoop-2.9.2 目录传到 qq2 机器上 在 qq2 上运行下面命令,输入qq1 root 用户密码即可
[qgn@qq2 ~]$ sudo scp -r root@qq1:/opt/hadoop-2.9.2 /opt格式化(只需要在 qq1 上运行) bin/hdfs namenode -format
启动 dfs (只需要在 qq1 上运行) sbin/start-dfs.sh 分别在两台计算机上查看进程 jps qq1 上:
[qgn@qq1 hadoop-2.9.2]$ start-dfs.sh Starting namenodes on [qq1] qq1: starting namenode, logging to /opt/hadoop-2.9.2/logs/hadoop-qgn-namenode-qq1.out qq2: starting datanode, logging to /opt/hadoop-2.9.2/logs/hadoop-qgn-datanode-qq2.out qq1: starting datanode, logging to /opt/hadoop-2.9.2/logs/hadoop-qgn-datanode-qq1.out Starting secondary namenodes [qq2] qq2: starting secondarynamenode, logging to /opt/hadoop-2.9.2/logs/hadoop-qgn-secondarynamenode-qq2.out [qgn@qq1 hadoop-2.9.2]$ jps 51443 DataNode 51190 NameNode 52031 Jps [qgn@qq1 hadoop-2.9.2]$qq2 上
[qgn@qq2 hadoop-2.9.2]$ jps 70307 SecondaryNameNode 70004 Jps 70171 DataNode [qgn@qq2 hadoop-2.9.2]$可以看出两台机器上 DataNode 已经全部启动,NameNode 在 qq1 上运行,SecondaryNameNode 在 qq2 上运行 注意 :如果没有 DataNode 或者不全,可能是初始化时 clusterID 不一样所致 具体分别查看 /opt/hadoop-2.9.2/data/tmp/dfs/name/current 下和 /opt/hadoop-2.9.2/data/tmp/dfs/data/current 下的 VERSION 文件
[qgn@qq1 current]$ pwd /opt/hadoop-2.9.2/data/tmp/dfs/name/current [qgn@qq1 current]$ cat VERSION #Fri May 24 11:14:54 CST 2019 namespaceID=1228588066 clusterID=CID-68666614-d793-4001-b132-8880428c6a7f cTime=1558667694011 storageType=NAME_NODE blockpoolID=BP-798175885-192.168.13.132-1558667694011 layoutVersion=-63 [qgn@qq1 current]$ ^C [qgn@qq2 current]$ pwd /opt/hadoop-2.9.2/data/tmp/dfs/data/current [qgn@qq2 current]$ cat VERSION #Fri May 24 12:47:06 CST 2019 storageID=DS-385b1d80-8502-4e5c-9f81-a8ee8958bb44 clusterID=CID-68666614-d793-4001-b132-8880428c6a7f cTime=0 datanodeUuid=2f81366c-7de3-4cd1-8ce7-81844568ad07 storageType=DATA_NODE layoutVersion=-57 [qgn@qq2 current]$可将其修改一致或者直接删除机器上 hadoop-2.9.2 下的 data和 name 文件夹,重新格式化一下再启动 在主机浏览器中输入qq1:50070 就可以查看文件系统以及 DataNode 节点信息 创建文件夹
[qgn@qq1 hadoop-2.9.2]$ hdfs dfs -mkdir /user [qgn@qq1 hadoop-2.9.2]$ hdfs dfs -mkdir /user/qgn创建好后可以在上面浏览器中查看
启动 yarn 在 qq1上输入: start-yarn.sh 并查看
[qgn@qq1 hadoop-2.9.2]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/hadoop-2.9.2/logs/yarn-qgn-resourcemanager-qq1.out qq2: starting nodemanager, logging to /opt/hadoop-2.9.2/logs/yarn-qgn-nodemanager-qq2.out qq1: starting nodemanager, logging to /opt/hadoop-2.9.2/logs/yarn-qgn-nodemanager-qq1.out [qgn@qq1 hadoop-2.9.2]$ jps 37091 Jps 33668 NodeManager 48892 NameNode 49084 DataNode 33518 ResourceManager [qgn@qq1 hadoop-2.9.2]$ [qgn@qq2 hadoop-2.9.2]$ jps 70307 SecondaryNameNode 66662 Jps 70171 DataNode 61166 NodeManager [qgn@qq2 hadoop-2.9.2]$可以看出 ResourceManager 在 qq1 上运行 在浏览器中输入 qq1:8088可以看到如下界面 关闭 yarn stop-yarn.sh 关闭 hdfs stop-dfs.sh