Mapreduce：关于遍历Iterable迭代器添加到List中的问题

xiaoxiao2023-11-22 166

比如Reducer的输入为<1，(1,2,3,4,5)>

需要在reduce()方法中遍历两次Iterable迭代器，由于第一次遍历后迭代器的指针已经达到末尾，所以不能遍历两次同一个迭代器

有一种思路是通过foreach遍历，并添加到列表中；在对列表遍历一次达到第二次遍历

reducer()：

protected void reduce(Text key, final Iterable<Text> vs, Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException { List<Text> list = new ArrayList<Text>(); for(Text v:vs){ list.add(v); System.out.println("第一次遍历"+v); } for(Text v:list){ System.out.println("第二次遍历"+v); } }

结果

通过源码跟踪，首先这个迭代器的遍历是从尾部开始的(方便remove())，此迭代器的底层为ValueIterator，在它获取下一个值的方法中

protected class ValueIterator implements ReduceContext.ValueIterator<VALUEIN> { //... public VALUEIN next() { // if this is the first record, we don't need to advance if (firstValue) { firstValue = false; return value; } // if this isn't the first record and the next key is different, they // can't advance it here. if (!nextKeyIsSame) { throw new NoSuchElementException("iterate past last value"); } // otherwise, go to the next key/value pair try { nextKeyValue(); return value; }

问题就出在nextKeyValue()，可参考：MapReduce：关于RecordReader调用getCurrentKey()和getCurrentValue()时返回相同键-值对象，也就是说最后一次v被赋值为1，而Text v是指向1的

所以每次遍历list内元素的变化为：

[5]

[4,4]

[3,3,3]

[2,2,2,2]

[1,1,1,1,1]

同理把Text类对象v复制一份，或者不采用mapreduce的相关数据类型：将List泛型设置为String，list.add(v.toString())即可

List<Text> list = new ArrayList<Text>(); for(Text v:vs){ list.add(new Text(v)); System.out.println("第一次遍历"+v); } for(Text v:list){ System.out.println("第二次遍历"+v); } //OR List<String> list = new ArrayList<String>(); for(Text v:vs){ list.add(v.toString()); System.out.println("第一次遍历"+v); } for(String v:list){ System.out.println("第二次遍历"+v); }

最新回复(0)