How to filter a set of (int, str) tuples, to return only tuple with min value in first element?(如何过滤一组(int,str)元组,仅返回第一个元素中具有最小值的元组?)
问题描述
假设我有一组用分数"表示 URL 的元组:
Suppose I have a set of tuples representing URLS with "scores":
{(0.75, 'http://www.foo.com'), (0.33, 'http://www.bar.com'), (0.5, 'http://www.foo.com'), (0.66, 'http://www.bar.com')}
.
我有什么简洁的方法可以过滤掉重复的 URL,只返回得分最低的 URL?也就是从上面的例子集合中,我想得到如下集合,其中每个 URL 只出现一次,与原始集合对应的分数最低:
What is a concise way for me to filter out duplicate URLS, returning only the URL with the lowest score? That is, from the example set above, I want to get the following set, where each URL appears only once, with the lowest corresponding score from the original set:
{(0.5, 'http://www.foo.com'),(0.33, 'http://www.bar.com')}
我想出了以下解决方案:
I came up with the following solution:
from collections import defaultdict
seen = defaultdict(lambda:1)
for score, url in s:
if score < seen[url]:
seen[url] = score
filtered = {(v,k) for k,v in seen.items()}
...但我觉得可能有一些更简单、更有效的方法可以做到这一点,而无需使用中间 dict 来跟踪最大元素,然后从中重新生成集合.按第一个元素的最小值/最大值过滤一组元组的最佳方法是什么?
... but I feel like there is probably some simpler and more efficient way to do this without using the intermediary dict to keep track of the max element, and then regenerate the set from that. What is the best way to filter a set of tuples by the min/max of the first element?
推荐答案
你已经实现了我能想到的最简单的方法.我要做的唯一改变是循环——一个稍微简洁一点的版本是使用 min
.
You've already implemented the simplest approach I can think of. The only change I'd make would be to the loop—a slightly more concise version is using min
.
seen = defaultdict(lambda: 1) # `lambda: float('inf')` if scores can be > 1
for score, url in s:
seen[url] = min(seen[url], score)
{(v,k) for k,v in seen.items()}
# {(0.33, 'http://www.bar.com'), (0.5, 'http://www.foo.com')}
<小时>
如果您真的想要一个更短的解决方案,就像我说的那样,这不是最简单的方法,但它是一种单一的方法.大多数挑战是交换 URL 和分数,因此您可以在删除重复项时使用 URL 作为键.不用说,排序是这里的先决条件(这就是为什么我不像上面那样喜欢这个解决方案).
If you really want a shorter solution, like I said, it isn't the simplest approach, but it is a one liner. Most of the challenge is interchanging the URL and the score so you can use the URL as a key when dropping duplicates. It goes without saying that sorting is a pre-condition here (that's why I don't like this solution as much as the one above).
{(v, k) for k, v in dict(sorted(((v, k) for k, v in s), reverse=True)).items()}
# {(0.33, 'http://www.bar.com'), (0.5, 'http://www.foo.com')}
如果 s
看起来像这样,这个解决方案就会变得更短:
This solution becomes so much shorter if s
looks like this:
s2 = {(v,k) for k, v in s}
s2
# {('http://www.bar.com', 0.33), ('http://www.bar.com', 0.66), ...}
你只需要这样做
list(dict(sorted(s2, reverse=True)).items())
# [('http://www.foo.com', 0.5), ('http://www.bar.com', 0.33)]
这篇关于如何过滤一组(int,str)元组,仅返回第一个元素中具有最小值的元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何过滤一组(int,str)元组,仅返回第一个元素中具有最小值的元组?


基础教程推荐
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 筛选NumPy数组 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01