MapReduce 结果似乎限制为 100?

2023-04-17Python开发问题
2

本文介绍了MapReduce 结果似乎限制为 100?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我在 MongoDB 和 python 中使用 Map Reduce,但遇到了一个奇怪的限制.我只是想计算书"记录的数量.它在少于 100 条记录时有效,但当超过 100 条记录时,由于某种原因计数会重置.

I'm playing around with Map Reduce in MongoDB and python and I've run into a strange limitation. I'm just trying to count the number of "book" records. It works when there are less than 100 records but when it goes over 100 records the count resets for some reason.

这是我的 MR 代码和一些示例输出:

Here is my MR code and some sample outputs:

var M = function () {
book = this.book;
emit(book, {count : 1});
}

var R = function (key, values) {
var sum = 0;
values.forEach(function(x) {
sum += 1;
});
var result = {
count : sum 
};
return result;
}

记录数为99时的MR输出:

MR output when record count is 99:

{u'_id': u'superiors', u'value': {u'count': 99}}

记录数为101时的MR输出:

MR output when record count is 101:

{u'_id': u'superiors', u'value': {u'count': 2.0}}

有什么想法吗?

推荐答案

你的 reduce 函数应该是对 count 值求和,而不仅仅是添加 1 每个值.否则,一个 reduce 的输出不能被正确地用作另一个 reduce 的输入.试试这个:

Your reduce function should be summing up the count values, not just adding 1 for each value. Otherwise the output of a reduce can't properly be used as input back into another reduce. Try this instead:

var R = function (key, values) {
  var sum = 0;
  values.forEach(function(x) {
    sum += x.count;
  });
  var result = {
    count : sum 
  };
  return result;
}

这篇关于MapReduce 结果似乎限制为 100?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

The End

相关推荐

在xarray中按单个维度的多个坐标分组
groupby multiple coords along a single dimension in xarray(在xarray中按单个维度的多个坐标分组)...
2024-08-22 Python开发问题
15

Pandas中的GROUP BY AND SUM不丢失列
Group by and Sum in Pandas without losing columns(Pandas中的GROUP BY AND SUM不丢失列)...
2024-08-22 Python开发问题
17

GROUP BY+新列+基于条件的前一行抓取值
Group by + New Column + Grab value former row based on conditionals(GROUP BY+新列+基于条件的前一行抓取值)...
2024-08-22 Python开发问题
18

PANDA中的Groupby算法和插值算法
Groupby and interpolate in Pandas(PANDA中的Groupby算法和插值算法)...
2024-08-22 Python开发问题
11

PANAS-基于列对行进行分组,并将NaN替换为非空值
Pandas - Group Rows based on a column and replace NaN with non-null values(PANAS-基于列对行进行分组,并将NaN替换为非空值)...
2024-08-22 Python开发问题
10

按10分钟间隔对 pandas 数据帧进行分组
Grouping pandas DataFrame by 10 minute intervals(按10分钟间隔对 pandas 数据帧进行分组)...
2024-08-22 Python开发问题
11