Algorithm that searches for related items based on common tags(基于公共标签搜索相关项目的算法)
问题描述
让我们以 StackOverflow 问题为例.他们每个人都分配了多个标签.如何构建一个算法,根据他们有多少常见标签(按常见标签数量排序)找到相关问题?
Lets take StackOverflow questions as example. Each of them has multiple tags assigned. How to build an algorithm that would find related questions based on how many common tags they have (sorted by number of common tags)?
现在我想不出比将所有至少具有一个公共标签的问题放入一个数组中,然后循环遍历它们,为每个项目分配多个公共标签,然后对该数组进行排序更好的方法.
For now I can't think about anything better than just selecting all questions that have at least one common tag into an array and then looping through them all assigning number of common tags to each item, then sorting this array.
还有更聪明的方法吗?完美的解决方案是单个 sql 查询.
Is there more clever way of doing it? Perfect solution would be a single sql query.
推荐答案
这可能和 O(n^2) 一样糟糕,但它确实有效:
This could be as bad as O(n^2), but it works:
create table QuestionTags (questionid int, tag int);
select q1.questionid, q2.questionid, count(*) as commontags
from QuestionTags q1 join QuestionTags q2
where q1.tag = q2.tag and q1.questionid < q2.questionid
group by q1.questionid, q2.questionid order by commontags desc;
这篇关于基于公共标签搜索相关项目的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:基于公共标签搜索相关项目的算法


基础教程推荐
- 使用 VBS 和注册表来确定安装了哪个版本和 32 位 2021-01-01
- while 在触发器内循环以遍历 sql 中表的所有列 2022-01-01
- 如何在 CakePHP 3 中实现 INSERT ON DUPLICATE KEY UPDATE aka upsert? 2021-01-01
- 带更新的 sqlite CTE 2022-01-01
- ORA-01830:日期格式图片在转换整个输入字符串之前结束/选择日期查询的总和 2021-01-01
- CHECKSUM 和 CHECKSUM_AGG:算法是什么? 2021-01-01
- 从字符串 TSQL 中获取数字 2021-01-01
- MySQL根据从其他列分组的值,对两列之间的值进行求和 2022-01-01
- 带有WHERE子句的LAG()函数 2022-01-01
- MySQL 5.7参照时间戳生成日期列 2022-01-01