MapReduce任务是多进程单线程模式验证

    xiaoxiao2022-07-07  224

    一、验证MapReduce任务是多进程的 

    1. 实现MyMapper代码,Reducer可相同处理。如下:

    package com.mapreduce; import java.io.IOException; import java.lang.management.ManagementFactory; import java.lang.management.RuntimeMXBean; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, Text> { //全局计数 private static int map_index = 0; @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { System.out.println("map_index: " + (++map_index)); //获取进程信息 RuntimeMXBean runtime = ManagementFactory.getRuntimeMXBean(); String name = runtime.getName(); System.out.println("Current Process: " + name + "--" + name.substring(0, name.indexOf("@"))); //获取线程信息 System.out.println("Current Thread: " + Thread.currentThread().getId() + "-" + Thread.currentThread().getName()); //获取当前类MyMapper信息 System.out.println("Current Mapper: " + this.toString()); context.write(new Text(""), new Text("")); } }

    2. map输入文件夹下存放两个文件,以开启两个map任务。每个文件中有3行数据,两个文件共6行。

    3. 假设:如果map任务是单进程的,那么开启的2个map任务为多线程,此时两个线程可以共享该进程的内存资源,运行输出将是进程名相同,线程不同,Mapper对象不同,全局计数map_index应为1,2,3,4,5,6。但事实上结果并非如此。如下:

    第一个Map任务的输出日志: map_index: 1 Current Process: 6717@slave1--6717 Current Thread: 1-main Current Mapper: com.etl.mapreduce.ClickStreamMapper@5f3b9c57 map_index: 2 Current Process: 6717@slave1--6717 Current Thread: 1-main Current Mapper: com.etl.mapreduce.ClickStreamMapper@5f3b9c57 map_index: 3 Current Process: 6717@slave1--6717 Current Thread: 1-main Current Mapper: com.etl.mapreduce.ClickStreamMapper@5f3b9c57 第二个Map任务的输出日志: map_index: 1 Current Process: 6728@slave1--6728 Current Thread: 1-main Current Mapper: com.etl.mapreduce.ClickStreamMapper@1e044120 map_index: 2 Current Process: 6728@slave1--6728 Current Thread: 1-main Current Mapper: com.etl.mapreduce.ClickStreamMapper@1e044120 map_index: 3 Current Process: 6728@slave1--6728 Current Thread: 1-main Current Mapper: com.etl.mapreduce.ClickStreamMapper@1e044120

    4.事实:可以看出,第一个Map任务所属进程IP为6717,第二个Map任务所属进程ID为6728,显然进程不同;两个任务的线程都为main主线程,也就是单线程模式;两个Map对象地址也不同;最后map_index均从1,2,3计数,并未达到预期的4,5,6。所以,得出结论:MapReduce任务是多进程单线程模式的。

    二、扩展阅读

    1.MapReduce多进程和spark多线程

    2.Mapreduce中使用多线程的问题

     

    最新回复(0)