Remove Duplicate Lines From Text File?(从文本文件中删除重复行?)
问题描述
给定一个文本行的输入文件,我希望识别和删除重复的行.请展示一个简单的 C# 片段来完成此操作.
Given an input file of text lines, I want duplicate lines to be identified and removed. Please show a simple snippet of C# that accomplishes this.
推荐答案
应该这样做(并且会复制大文件).
This should do (and will copy with large files).
注意它只删除重复的连续行,即
Note that it only removes duplicate consecutive lines, i.e.
a
b
b
c
b
d
最终会变成
a
b
c
b
d
如果您不想在任何地方重复,则需要保留一组您已经看过的行.
If you want no duplicates anywhere, you'll need to keep a set of lines you've already seen.
using System;
using System.IO;
class DeDuper
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: DeDuper <input file> <output file>");
return;
}
using (TextReader reader = File.OpenText(args[0]))
using (TextWriter writer = File.CreateText(args[1]))
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
}
}
请注意,这假定为 Encoding.UTF8,并且您要使用文件.不过,它很容易概括为一种方法:
Note that this assumes Encoding.UTF8, and that you want to use files. It's easy to generalize as a method though:
static void CopyLinesRemovingConsecutiveDupes
(TextReader reader, TextWriter writer)
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
(请注意,这不会关闭任何东西 - 调用者应该这样做.)
(Note that that doesn't close anything - the caller should do that.)
以下版本将删除所有个重复项,而不仅仅是连续的:
Here's a version that will remove all duplicates, rather than just consecutive ones:
static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
{
string currentLine;
HashSet<string> previousLines = new HashSet<string>();
while ((currentLine = reader.ReadLine()) != null)
{
// Add returns true if it was actually added,
// false if it was already there
if (previousLines.Add(currentLine))
{
writer.WriteLine(currentLine);
}
}
}
这篇关于从文本文件中删除重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:从文本文件中删除重复行?
基础教程推荐
- 错误“此流不支持搜索操作"在 C# 中 2022-01-01
- 从 VS 2017 .NET Core 项目的发布目录中排除文件 2022-01-01
- 在 VS2010 中的 Post Build 事件中将 bin 文件复制到物 2022-01-01
- 首先创建代码,多对多,关联表中的附加字段 2022-01-01
- 全局 ASAX - 获取服务器名称 2022-01-01
- 是否可以在 asp classic 和 asp.net 之间共享会话状态 2022-01-01
- 如何动态获取文本框中datagridview列的总和 2022-01-01
- 将事件 TextChanged 分配给表单中的所有文本框 2022-01-01
- JSON.NET 中基于属性的类型解析 2022-01-01
- 经典 Asp 中的 ResolveUrl/Url.Content 等效项 2022-01-01
