《Hadoop大数据分析与挖掘实战》——2.4节动手实践

    xiaoxiao2023-06-26  152

    本节书摘来自华章社区《Hadoop大数据分析与挖掘实战》一书中的第2章,第2.4节动手实践,作者张良均 樊哲 赵云龙 李成华 ,更多章节内容可以访问云栖社区“华章社区”公众号查看

    2.4 动手实践按照2.2节的详细配置步骤进行操作,部署完成后即可进行下面的实验。实践一:HDFS命令

    1)新建文件夹。hadoop fs -mkdir /user hadoop fs -mkdir /user/root2)查看文件夹权限。# hadoop fs -ls -d /user/root drwxr-xr-x  - root supergroup     0 2015-05-29 17:29 /user/root3)上传文件。 复制02-上机实验/ds.txt并通过xftp上传到客户端机器,运行下面的命令和结果对照。# hadoop fs -put ds.txt ds.txt # hadoop fs -ls -R /user/root -rw-r--r--  3 root supergroup    9135 2015-05-29 19:07 /user/root/ds.txt4)查看文件内容。# hadoop fs -cat /user/root/ds.txt 17.759065824032646,0.6708203932499373 20.787886563063058,0.7071067811865472 17.944905786933322,0.5852349955359809 ……5)复制/移动/删除文件。# hadoop fs -cp /user/root/ds.txt /user/root/ds_backup.txt # hadoop fs -ls /user/root Found 2 items -rw-r--r--  3 root supergroup    9135 2015-05-29 19:07 /user/root/ds.txt -rw-r--r--  3 root supergroup    9135 2015-05-29 19:30 /user/root/ds_backup.tx # hadoop fs -mv /user/root/ds_backup.txt /user/root/ds_backup1.txt # hadoop fs -ls /user/root Found 2 items -rw-r--r--  3 root supergroup    9135 2015-05-29 19:07 /user/root/ds.txt -rw-r--r--  3 root supergroup    9135 2015-05-29 19:30 /user/root/ds_backup1.txt # hadoop fs -rm -r /user/root/ds_backup1.txt 15/05/29 19:32:51 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /user/root/ds_backup1.txt # hadoop fs -ls /user/root Found 1 items -rw-r--r--  3 root supergroup    9135 2015-05-29 19:07 /user/root/ds.txt实践二:MapReduce任务 1)复制02-上机实验/ds.txt并通过xftp上传到客户端机器/opt目录下。# hadoop fs -put /opt/ds.txt /user/root/ds.txt # hadoop fs -ls /user/root Found 1 items -rw-r--r--  3 root supergroup    9135 2015-05-29 19:49 /user/root/ds.txt2)复制Hadoop的安装目录的MapReduce Example的jar包到/opt目录下。# cp /opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar /opt # ls /opt/hadoop-mapreduce* /opt/hadoop-mapreduce-examples-2.6.0.jar3)运行单词计数MapReduce任务。# hadoop jar /opt/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/root/ds.txt /user/root/ds_out 15/05/29 20:23:00 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.222.131:8032 15/05/29 20:23:02 INFO input.FileInputFormat: Total input paths to process : 1 15/05/29 20:23:02 INFO mapreduce.JobSubmitter: number of splits:1 15/05/29 20:23:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1432825607351_0127 15/05/29 20:23:03 INFO impl.YarnClientImpl: Submitted application application_1432825607351_0127 15/05/29 20:23:03 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1432825607351_0127/ 15/05/29 20:23:03 INFO mapreduce.Job: Running job: job_1432825607351_0127 15/05/29 20:23:15 INFO mapreduce.Job: Job job_1432825607351_0127 running in uber mode : false 15/05/29 20:23:15 INFO mapreduce.Job: map 0% reduce 0% 15/05/29 20:23:31 INFO mapreduce.Job: map 100% reduce 0% 15/05/29 20:23:40 INFO mapreduce.Job: map 100% reduce 100% 15/05/29 20:23:40 INFO mapreduce.Job: Job job_1432825607351_0127 completed successfully 15/05/29 20:23:40 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=10341 FILE: Number of bytes written=232633 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=9236 HDFS: Number of bytes written=9375 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=12679 Total time spent by all reduces in occupied slots (ms)=6972 Total time spent by all map tasks (ms)=12679 Total time spent by all reduce tasks (ms)=6972 Total vcore-seconds taken by all map tasks=12679 Total vcore-seconds taken by all reduce tasks=6972 Total megabyte-seconds taken by all map tasks=12983296 Total megabyte-seconds taken by all reduce tasks=7139328 Map-Reduce Framework Map input records=240 Map output records=240 Map output bytes=9855 Map output materialized bytes=10341 Input split bytes=101 Combine input records=240 Combine output records=240 Reduce input groups=240 Reduce shuffle bytes=10341 Reduce input records=240 Reduce output records=240 Spilled Records=480 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=398 CPU time spent (ms)=5330 Physical memory (bytes) snapshot=321277952 Virtual memory (bytes) snapshot=2337296384 Total committed heap usage (bytes)=195235840 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=9135 File Output Format Counters Bytes Written=93754)查看任务的输出。# hadoop fs -cat /user/root/ds_out/part-r-00000 16.75481160342442,0.5590169943749481 1 17.759065824032646,0.6708203932499373 1 17.944905786933322,0.5852349955359809 1 18.619213022043585,0.5024937810560444 1 18.664436259885097,0.7433034373659246 1 …… 相关资源:Hadoop实战中文版.pdf
    最新回复(0)