1. Ubuntu12.04基本设置
1) 设置与Windows共享(通过hgfs) 2) 设置root密码 sudo passwd root 3) 设置root登录 vim /etc/lightdm/lightdm.conf 在最后添加一行:greeter-show-manual-login=true 3) #adduser test #passwd test #mkdir -p /home/test 4) 查看我是谁 whoami
2. 安装组件
$ sudo apt-get install ssh $ sudo apt-get install rsync
3. 下载解压JDK (1.7)至/opt目录下
4. 下载解压Hadoop (2.6.4)/opt目录下
5. 增加JDK和Hadoop Path到/root/.bashrc和当用户的~/.bashrc中
[html]
view plain
copy
export JAVA_HOME=/opt/jdk1.7.0_79 export HADOOP_HOME=/opt/hadoop-2.6.4 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin # improve Could not resolve hostname library: Name or service not known export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
如果把HADOOP_OPTS配置为:
[html]
view plain
copy
HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
则会出现如下错误:
[html]
view plain
copy
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
6. hadoop_env.sh (JAVA_HOME, HADOOP_PREFIX)
[html]
view plain
copy
# set to the root of your Java installation export JAVA_HOME=/opt/jdk1.7.0_79 # Assuming your installation directory is /opt/hadoop-2.6.4 export HADOOP_PREFIX=/opt/hadoop-2.6.4
注:以上安装都以root用户执行
7. 实例测试
7.1 单机模式
(不需要做任何配置,但不启动任何dfs和mapreduce daemon进程) 计算在这些.xml文件中总共有多少个configuration ~$ mkdir input ~$ cp /opt/hadoop-2.6.4/etc/hadoop/*.xml input ~$ hadoop jar /opt/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input output 'configuration'
~$ cat output/*
查看debug信息(即代码中LOG.debug打印的信息):
test@ubuntu:~/hadoop$ export HADOOP_ROOT_LOGGER=DEBUG,console test@ubuntu:~/hadoop$ hadoop fs -text /test/data/origz/access.log.gz
7.2 伪分布式模式
7.2.1 配置环境
1)etc/hadoop/core-site.xml:
[html]
view plain
copy
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/test/hadoop/tmp</value> </property> </configuration>
2)etc/hadoop/hdfs-site.xml:
[html]
view plain
copy
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
3)ssh免密码登录 ~$ssh-keygen (一直默认回车) ~$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ~$ssh localhost (应该无密码)
7.2.2 在本地运行MapReduce Job
1)格式化文件系统 $ hdfs namenode -format 生成的dfs文件夹位于:/home/test/hadoop/tmp 2)启动NameNode daemon和DataNode daemon $ start-dfs.sh hadoop daemon log输出目录为$HADOOP_LOG_DIR, 默认为$HADOOP_HOME/logs 给test用户授与/opt/hadoop-2.6.4权限 sudo chown -hR test /opt/hadoop-2.6.4 可通过jps查看到以下JVM进程: 11340 SecondaryNameNode 9927 NameNode 10142 DataNode 3) 通过Web界面来查看NameNode运行状况,默认为: http://localhost:50070 http://192.168.4.91:50070 4)创建执行MapReduce Job需要的HDFS目录 $hdfs dfs -mkdir /user $hdfs dfs -mkdir /user/<username> error:mkdir: Cannot create directory /user. Name node is in safe mode. solution:hdfs dfsadmin -safemode leave 5)把输入文件copy到分布式文件系统 $hdfs dfs -put /opt/hadoop-2.6.4/etc/hadoop input 6)运行例子程序 $hadoop jar /opt/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input output 'dfs[a-z.]+' 7)检查输出文件 把输出文件从分布式文件系统copy到本地文件系统,然后查看 $hdfs dfs -get output output $cat output/* 或者 $hdfs dfs -cat output/* 8) 停止NameNode daemon和DataNode daemon $stop-dfs.sh
7.2.3 在YARN上运行MapReduce Job
1) 配置环境 在【7.2.1 配置环境】的基础上增加如下配置 (1)etc/hadoop/mapred-site.xml:
[html]
view plain
copy
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
(2)etc/hadoop/yarn-site.xml:
[html]
view plain
copy
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
2)格式化文件系统 $ hdfs namenode -format 生成的dfs文件夹位于:/home/test/hadoop/tmp 3)启动NameNode daemon和DataNode daemon $ start-dfs.sh hadoop daemon log输出目录为$HADOOP_LOG_DIR, 默认为$HADOOP_HOME/logs 给test用户授与/opt/hadoop-2.6.4权限 sudo chown -hR test /opt/hadoop-2.6.4 可通过jps查看到以下JVM进程: 11340 SecondaryNameNode 9927 NameNode 10142 DataNode 4) 通过Web界面来查看NameNode运行状况,默认为: http://localhost:50070 http://192.168.4.91:50070 5)创建执行MapReduce Job需要的HDFS目录 $hdfs dfs -mkdir /user $hdfs dfs -mkdir /user/<username> 6)启动ResourceManager daemon和NodeManager daemon $start-yarn.sh 7) 通过Web界面来查看ResourceManager运行状况,默认为: http://localhost:8088 http://192.168.4.91:8088 8)把输入文件copy到分布式文件系统 $hdfs dfs -put /opt/hadoop-2.6.4/etc/hadoop input 9)运行例子程序 $hadoop jar /opt/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input output 'dfs[a-z.]+' 10)检查输出文件 把输出文件从分布式文件系统copy到本地文件系统,然后查看 $hdfs dfs -get output output $cat output/* 或者 $hdfs dfs -cat output/* 11)$stop-yarn.sh
12)$stop-dfs.sh