在 MongoDB Map Reduce 函数中查询

Query in a MongoDB Map Reduce Function(在 MongoDB Map Reduce 函数中查询)
本文介绍了在 MongoDB Map Reduce 函数中查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

限时送ChatGPT账号..

我已经将大约 250k 条推文流式传输并保存到 MongoDB 中,正如您所见,我正在根据推文中出现的单词或关键字检索它.

I have streamed and saved about 250k tweets into MongoDB and here, I am retrieving it, as you can see, based on a word, or keyword, present in the tweet.

Mongo mongo = new Mongo("localhost", 27017);
DB db = mongo.getDB("TwitterData");
DBCollection collection = db.getCollection("publicTweets");
BasicDBObject fields = new BasicDBObject().append("tweet", 1).append("_id", 0);
BasicDBObject query = new BasicDBObject("tweet", new BasicDBObject("$regex", "autobiography"));
DBCursor cur=collection.find(query,fields);

我想做的是使用 Map-Reduce 并根据关键字,对其进行分类并将其传递给 reduce 函数以计算每个类别下的推文数量,有点像你看到的 这里.在示例中,他计算的是页数,因为它是一个简单的数字.我想做类似的事情:

What I would like to do is to use Map-Reduce and based on the keyword, categorize it and pass it to the reduce function to count the number of tweets under each category, kinda like what you can see here. In the example, he's counting the number of pages as it is a simple number. I wanna do something like:

"if (this.tweet.contains("kword1")) "+
"category = 'kword1 tweets'; " + 
"else if (this.tweet.contains("kword2")) " + 
"category = 'kword2 tweets'; 

然后使用reduce函数获取计数,就像在示例程序中一样.

and then use the reduce function to get the count, just like in the sample program.

我知道语法不正确,但这正是我想做的.有没有办法实现它?谢谢!

I know that the syntax is incorrect, but that's pretty much what I would like to do. Is there any way of achieving it? Thanks!

PS:哦,我正在用 Java 编写代码.因此,Java 语法将受到高度赞赏.谢谢!

PS: Oh, and I'm coding in Java. So the Java syntax would be highly appreciated. Thank you!

发布的代码的输出是这样的:

The output of the code posted is something like this:

{ "tweet" : "An autobiography is a book that reveals nothing bad about its writer except his memory."}
{ "tweet" : "I refuse to read anything that's not real the only thing I've read since biff books is Jordan's autobiography #lol"}
{ "tweet" : "well we've had the 2012 publication of Ashley's Good Books, I predict 2013 will be seeing an autobiography ;)"}

当然,这适用于所有带有自传"一词的推文.我想在 map 函数中使用它,将其归类为自传推文"(以及其他关键字),然后将其发送到 reduce 函数以计算所有内容并返回带有单词 in 的推文数量它.

This of course, is for all tweets with the word "autobiography". What I'd like is to use this in the map function, categorize it as a "autobiography tweet" (and other keywords too), and then send it to the reduce function to count everything and return the number of tweets with the word in it.

类似:

{"_id" : "Autobiography Tweets" , "value" : { "publicTweets" : 3.0}}
{"_id" : "Biography Tweets" , "value" : { "publicTweets" : 15.0}}

推荐答案

您可能想尝试以下方法:

You might want to try the following:

    String map = "function() { " +
                 "    var regex1 = new RegExp('autobiography', 'i'); " +
                 "    var regex2 = new RegExp('book', 'i'); " +
                 "    if (regex1.test(this.tweet) ) " +
                 "         emit('Autobiography Tweet', 1); " +
                 "    else if (regex2.test(this.tweet) ) " +
                 "         emit('Book Tweet', 1); " +
                 "    else " +
                 "       emit('Uncategorized Tweet', 1); " +
                 "}";

    String reduce = "function(key, values) { " +
                    "    return Array.sum(values); " +
                    "}";

    MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce,
             null, MapReduceCommand.OutputType.INLINE, null);
    MapReduceOutput out = collection.mapReduce(cmd);

    try {
        for (DBObject o : out.results()) {

            System.out.println(o.toString());

       }
    } catch (Exception e) {
        e.printStackTrace();
    }    

这篇关于在 MongoDB Map Reduce 函数中查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

How to send data to COM PORT using JAVA?(如何使用 JAVA 向 COM PORT 发送数据?)
How to make a report page direction to change to quot;rtlquot;?(如何使报表页面方向更改为“rtl?)
Use cyrillic .properties file in eclipse project(在 Eclipse 项目中使用西里尔文 .properties 文件)
Is there any way to detect an RTL language in Java?(有没有办法在 Java 中检测 RTL 语言?)
How to load resource bundle messages from DB in Java?(如何在 Java 中从 DB 加载资源包消息?)
How do I change the default locale settings in Java to make them consistent?(如何更改 Java 中的默认语言环境设置以使其保持一致?)