问题描述
这是一个看似简单的问题:给定一个生成升序整数序列的迭代器列表,编写一个只生成每个序列中出现的整数的简洁生成器.
Here is a seemingly simple problem: given a list of iterators that yield sequences of integers in ascending order, write a concise generator that yields only the integers that appear in every sequence.
昨晚阅读了几篇论文后,我决定用 Python 编写一个完全最小的全文索引器,在这里看到(虽然那个版本现在很老了).
After reading a few papers last night, I decided to hack up a completely minimal full text indexer in Python, as seen here (though that version is quite old now).
我的问题在于 search() 函数,它必须遍历每个发布列表并仅生成每个列表中出现的文档 ID.从上面的链接可以看出,我当前的非递归工作"尝试很糟糕.
My problem is with the search() function, which must iterate over each posting list and yield only the document IDs that appear on every list. As you can see from the link above, my current non-recursive 'working' attempt is terrible.
示例:
postings = [[1, 100, 142, 322, 12312],
[2, 100, 101, 322, 1221],
[100, 142, 322, 956, 1222]]
应该让步:
[100, 322]
至少有一个优雅的递归函数解决方案,但如果可能的话,我想避免这种情况.但是,涉及嵌套生成器表达式、itertools 滥用或任何其他类型的代码高尔夫的解决方案非常受欢迎.:-)
There is at least one elegant recursive function solution to this, but I'd like to avoid that if possible. However, a solution involving nested generator expressions, itertools abuse, or any other kind of code golf is more than welcome. :-)
应该可以将函数安排为只需要与最小列表中的项目一样多的步骤,而无需将整组整数吸入内存.将来,这些列表可能会从磁盘中读取,并且大于可用的 RAM.
It should be possible to arrange for the function to only require as many steps as there are items in the smallest list, and without sucking the entire set of integers into memory. In future, these lists may be read from disk, and larger than available RAM.
在过去的 30 分钟里,我在舌尖有了一个想法,但我无法将其完全融入代码中.请记住,这只是为了好玩!
For the past 30 minutes I've had an idea on the tip of my tongue, but I can't quite get it into code. Remember, this is just for fun!
推荐答案
import heapq, itertools
def intersect(*its):
for key, values in itertools.groupby(heapq.merge(*its)):
if len(list(values)) == len(its):
yield key
>>> list(intersect(*postings))
[100, 322]
这篇关于加入一组产生 Python 迭代器的有序整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!


大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)