Best way to test for existing string against a large list of comparables(针对大量可比对象测试现有字符串的最佳方法)
问题描述
Suppose you have a list of acronym's that define a value (ex. AB1,DE2,CC3) and you need to check a string value (ex. "Happy:DE2|234") to see if an acronym is found in the string. For a short list of acronym's I would usually create a simple RegEx that used a separator (ex. (AB1|DE2|CC3) ) and just look for a match.
But how would I tackle this if there are over 30 acronym's to match against? Would it make sense to use the same technique (ugly) or is there a more effecient and elegant way to accomplish this task?
Keep in mind the example acronym list and example string is not the actual data format that I am working with, rather just a way to express my challenge.
BTW, I read a SO related question but didn't think it applied to what I was trying to accomplish.
EDIT: I forgot to include my need to capture the matched value, hence the choice to use Regular Expressions...
Personally I don't think 30 is particularly large for a regex so I wouldn't be too quick to rule it out. You can create the regex with a single line of code:
var acronyms = new[] { "AB", "BC", "CD", "ZZAB" };
var regex = new Regex(string.Join("|", acronyms), RegexOptions.Compiled);
for (var match = regex.Match("ZZZABCDZZZ"); match.Success; match = match.NextMatch())
Console.WriteLine(match.Value);
// returns AB and CD
So the code is relatively elegant and maintainable. If you know the upper bound for the number of acronyms I would to some testing, who knows what kind of optimizations there are already built into the regex engine. You'll also be able to benefit for free from future regex engine optimizations. Unless you have reason to believe performance will be an issue keep it simple.
On the other hand regex may have other limitations e.g. by default if you have acronyms AB, BC and CD then it'll only return two of these as a match in "ABCD". So its good at telling you there is an acronym but you need to be careful about catching multiple matches.
When performance became an issue for me (> 10,000 items) I put the 'acronyms' in a HashSet and then searched each substring of the text (from min acronym length to max acronym length). This was ok for me because the source text was very short. I'd not heard of it before, but at first look the Aho-Corasick algorithm, referred to in the question you reference, seems like a better general solution to this problem.
这篇关于针对大量可比对象测试现有字符串的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:针对大量可比对象测试现有字符串的最佳方法
基础教程推荐
- JSON.NET 中基于属性的类型解析 2022-01-01
- 是否可以在 asp classic 和 asp.net 之间共享会话状态 2022-01-01
- 全局 ASAX - 获取服务器名称 2022-01-01
- 首先创建代码,多对多,关联表中的附加字段 2022-01-01
- 经典 Asp 中的 ResolveUrl/Url.Content 等效项 2022-01-01
- 错误“此流不支持搜索操作"在 C# 中 2022-01-01
- 在 VS2010 中的 Post Build 事件中将 bin 文件复制到物 2022-01-01
- 从 VS 2017 .NET Core 项目的发布目录中排除文件 2022-01-01
- 将事件 TextChanged 分配给表单中的所有文本框 2022-01-01
- 如何动态获取文本框中datagridview列的总和 2022-01-01
