在 Hadoop MapReduce 作业中链接 Multi-Reducer

Chaining Multi-Reducers in a Hadoop MapReduce job(在 Hadoop MapReduce 作业中链接 Multi-Reducer)
本文介绍了在 Hadoop MapReduce 作业中链接 Multi-Reducer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

限时送ChatGPT账号..

现在我有一个 4 阶段的 MapReduce 作业,如下所示:

Now I have a 4-phase MapReduce job as follows:

Input-> Map1 -> Reduce1 -> Reducer2 -> Reduce3 -> Reduce4 -> Output

我注意到 Hadoop 中有一个 ChainMapper 类,它可以将多个映射器链接成一个大映射器,并节省映射阶段之间的磁盘 I/O 成本.还有一个 ChainReducer 类,但它不是真正的Chain-Reducer".它只能支持以下工作:

I notice that there is ChainMapper class in Hadoop which can chain several mappers into one big mapper, and save the disk I/O cost between map phases. There is also a ChainReducer class, however it is not a real "Chain-Reducer". It can only support jobs like:

[Map+/ Reduce Map*]

我知道我可以为我的任务设置四个 MR 作业,并为最后三个作业使用默认映射器.但这将花费大量磁盘 I/O,因为 reducer 应该将结果写入磁盘以让以下映射器访问它.是否有任何其他 Hadoop 内置功能可以链接我的 reducer 以降低 I/O 成本?

I know I can set four MR jobs for my task, and use default mappers for the last three jobs. But that will cost a lot of disk I/O, since reducers should write the result into disk to let the following mapper access it. Is there any other Hadoop built-in feature to chain my reducers to lower the I/O cost?

我使用的是 Hadoop 1.0.4.

I am using Hadoop 1.0.4.

推荐答案

我不认为你可以将一个reducer的o/p直接交给另一个reducer.我会为此而努力的:

I dont think that you can have the o/p of a reducer being given to another reducer directly. I would have gone for this:

Input-> Map1 -> Reduce1 -> 
        Identity mapper -> Reducer2 -> 
                Identity mapper -> Reduce3 -> 
                         Identity mapper -> Reduce4 -> Output

在 Hadoop 2.X 系列中,在内部,您可以使用 ChainMapper 在 reducer 之前链接 mapper,在 reducer 之后使用 ChainReducer.

In Hadoop 2.X series, internally you can chain mappers before reducer with ChainMapper and chain Mappers after reducer with ChainReducer.

这篇关于在 Hadoop MapReduce 作业中链接 Multi-Reducer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

How to send data to COM PORT using JAVA?(如何使用 JAVA 向 COM PORT 发送数据?)
How to make a report page direction to change to quot;rtlquot;?(如何使报表页面方向更改为“rtl?)
Use cyrillic .properties file in eclipse project(在 Eclipse 项目中使用西里尔文 .properties 文件)
Is there any way to detect an RTL language in Java?(有没有办法在 Java 中检测 RTL 语言?)
How to load resource bundle messages from DB in Java?(如何在 Java 中从 DB 加载资源包消息?)
How do I change the default locale settings in Java to make them consistent?(如何更改 Java 中的默认语言环境设置以使其保持一致?)