mac os 下安装hadoop-2.7.3+hive-2.1.1+sqoop-1.99.3

    xiaoxiao2025-12-23  11

    hadoop 安装

    安装jdk

    vim ~/.bash_profile export JAVA_HOME="YOUR_JAVA_HOME" export PATH=$PATH:$JAVA_HOME/bin

    配置完成后,运行

    java -version -------------- java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

    ssh免密登入

    ssh-keygen -t rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh localhost # 验证

    配置hadoop

    下载hadoop,解压到指定目录,这里是/opt配置系统变量

    vim ~/.bash_profile export HADOOP_HOME=/opt/hadoop-2.7.3 export HADOOP_PREFIX=$HADOOP_HOME export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

    修改 /etc/hadoop/hadoop-env.sh

    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

    修改/etc/hadoop/core-site.xml

    <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system.</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/Users/micmiu/tmp/hadoop</value> <description>A base for other temporary directories.</description> </property> <property> <name>io.native.lib.available</name> <value>false</value> <description>default value is true:Should native hadoop libraries, if present, be used.</description> </property>

    修改hdfs-site.xml

    <property> <name>dfs.replication</name> <value>1</value> <!--如果是单节点配置为1,如果是集群根据实际集群数量配置 --> </property>

    修改yarn-site.xml

    <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>

    修改mapred-site.xml

    cp mapred-site.xml.template mapred-site.xml. <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property>

    格式化namenode

    hadoop namenode -format

    启动hdfs和yarn

    start-dfs.sh start-yarn.sh

    查看守护进程是否开启

    jps 6917 DataNode 6838 NameNode 2810 Launcher 7130 ResourceManager 7019 SecondaryNameNode 7772 Jps 7215 NodeManager

    wordcount示例

    hdfs dfs -mkdir -p /user/jjzhu/wordcount/in hdfs dfs -put xxxxx.txt /user/jjzhu/wordcount/in hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/jjzhu/wordcount/in /user/jjzhu/wordcount/out

    运行过程

    17/04/07 13:04:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/04/07 13:04:10 INFO input.FileInputFormat: Total input paths to process : 1 17/04/07 13:04:10 INFO mapreduce.JobSubmitter: number of splits:1 17/04/07 13:04:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491532908338_0004 17/04/07 13:04:11 INFO impl.YarnClientImpl: Submitted application application_1491532908338_0004 17/04/07 13:04:11 INFO mapreduce.Job: The url to track the job: http://jjzhu:8088/proxy/application_1491532908338_0004/ 17/04/07 13:04:11 INFO mapreduce.Job: Running job: job_1491532908338_0004 17/04/07 13:04:18 INFO mapreduce.Job: Job job_1491532908338_0004 running in uber mode : false 17/04/07 13:04:18 INFO mapreduce.Job: map 0% reduce 0% 17/04/07 13:04:23 INFO mapreduce.Job: map 100% reduce 0% 17/04/07 13:04:29 INFO mapreduce.Job: map 100% reduce 100% 17/04/07 13:04:29 INFO mapreduce.Job: Job job_1491532908338_0004 completed successfully 17/04/07 13:04:29 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=1141 FILE: Number of bytes written=239913 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=869 HDFS: Number of bytes written=779 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2859 Total time spent by all reduces in occupied slots (ms)=2527 Total time spent by all map tasks (ms)=2859 Total time spent by all reduce tasks (ms)=2527 Total vcore-milliseconds taken by all map tasks=2859 Total vcore-milliseconds taken by all reduce tasks=2527 Total megabyte-milliseconds taken by all map tasks=2927616 Total megabyte-milliseconds taken by all reduce tasks=2587648 Map-Reduce Framework Map input records=1 Map output records=118 Map output bytes=1219 Map output materialized bytes=1141 Input split bytes=122 Combine input records=118 Combine output records=89 Reduce input groups=89 Reduce shuffle bytes=1141 Reduce input records=89 Reduce output records=89 Spilled Records=178 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=103 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=329252864 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=747 File Output Format Counters Bytes Written=779

    查看结果

    hdfs dfs -ls /user/jjzhu/wordcount/out -rw-r--r-- 1 didi supergroup 0 2017-04-07 13:04 /user/jjzhu/wordcount/out/_SUCCESS -rw-r--r-- 1 didi supergroup 779 2017-04-07 13:04 /user/jjzhu/wordcount/out/part-r-00000 hdfs dfs -cat /user/jjzhu/wordcount/out/part-r-00000 A 1 Other 1 Others 1 Some 2 There 1 a 1 access 2 access); 1 according 1 adding 1 allowing 1 ......

    关闭hadoop

    stop-hdfs.sh stop-yarn.sh

    安装hive

    下载解压配置环境变量export HIVE_HOME=/opt/hive-2.1.1export PATH=$HIVE_HOME/bin:$PATH

    配置hive

    cd /opt/hive/conf cp hive-env.sh.template hive-env.sh cp hive-default.xml.template hive-site.xml vim hive-env.sh HADOOP_HOME=/opt/hadoop-2.7.3 export HIVE_CONF_DIR=/opt/hive-2.1.1/conf export HIVE_AUX_JARS_PATH=/opt/hive-2.1.1//lib

    下载mysql-connector-xx.xx.xx.jar 到lib下

    vim hive-site.xml将所有${system:java.io.tmpdir} 和 ${system:user.name}替换并配置mysql数据库连接信息

    <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property>

    为hive创建HDFS目录

    hdfs dfs -mkdir -p /usr/hive/warehouse hdfs dfs -mkdir -p /usr/hive/tmp hdfs dfs -mkdir -p /usr/hive/log hdfs dfs -chmod -R 777 /usr/hive

    初始化数据库

    ./bin/schematool -initSchema -dbType mysql mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | hive | | mysql | | performance_schema | | sys | +--------------------+ mysql> use hive; Database changed mysql> show tables; +---------------------------+ | Tables_in_hive | +---------------------------+ | AUX_TABLE | | BUCKETING_COLS | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | | TXNS | | TXN_COMPONENTS | | TYPES | | TYPE_FIELDS | | VERSION | | WRITE_SET | +---------------------------+

    启动hive

    jjzhu:opt didi$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/opt/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive>

    安装sqoop

    下载解压配置环境变量

    export $SQOOP_HOME=/opt/sqoop-1.99.7 export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra export PATH=$SQOOP_HOME/bin:$PATH

    修改sqoop配置

    在conf目录下的两个主要配置文件sqoop.properties和sqoop_bootstrap.properties主要修改sqoop.properties

    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop-2.7.3/etc/hadoop org.apache.sqoop.security.authentication.type=SIMPLE org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler org.apache.sqoop.security.authentication.anonymous=true

    验证配置是否有效

    jzhu:bin didi$ ./sqoop2-tool verify Setting conf dir: /opt/sqoop-1.99.7/bin/../conf Sqoop home directory: /opt/sqoop-1.99.7 Sqoop tool executor: Version: 1.99.7 Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine Running tool: class org.apache.sqoop.tools.tool.VerifyTool 0 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server. 12 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Verification was successful. Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly. jjzhu:bin didi$

    开启服务器

    ./bin/sqoop2-server start jps 9505 SqoopJettyServer ....
    最新回复(0)