How to make my split work only on one real line and be capable to skip quoted parts of string?(如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?)
问题描述
所以我们有一个简单的拆分:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
或boost split.我们有简单的 main 像:
or boost split. And we have simple main like:
int main() {
const vector<string> words = split("close no "
matter" how
far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}
如何让它像输出一样
close
no
"
matter"
how
end symbol found.
我们想引入未拆分的拆分结构
和结束解析过程的字符.怎么办?
we want to introduce to split structures
that shall be held unsplited and charecters that shall end parsing process. how to do such thing?
推荐答案
如下代码:
vector<string>::const_iterator matchSymbol(const string & s, string::const_iterator i, const vector<string> & symbols)
{
vector<string>::const_iterator testSymbol;
for (testSymbol=symbols.begin();testSymbol!=symbols.end();++testSymbol) {
if (!testSymbol->empty()) {
if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {
return testSymbol;
}
}
}
assert(testSymbol==symbols.end());
return testSymbol;
}
vector<string> split(const string& s, const vector<string> & delims, const vector<string> & terms, const bool keep_empty = true)
{
vector<string> result;
if (delims.empty()) {
result.push_back(s);
return result;
}
bool checkForDelim=true;
string temp;
string::const_iterator i=s.begin();
while (i!=s.end()) {
vector<string>::const_iterator testTerm=terms.end();
vector<string>::const_iterator testDelim=delims.end();
if (checkForDelim) {
testTerm=matchSymbol(s,i,terms);
testDelim=matchSymbol(s,i,delims);
}
if (testTerm!=terms.end()) {
i=s.end();
} else if (testDelim!=delims.end()) {
if (!temp.empty() || keep_empty) {
result.push_back(temp);
temp.clear();
}
string::const_iterator j=testDelim->begin();
while (i!=s.end() && j!=testDelim->end()) {
++i;
++j;
}
} else if ('"'==*i) {
if (checkForDelim) {
string::const_iterator j=i;
do {
++j;
} while (j!=s.end() && '"'!=*j);
checkForDelim=(j==s.end());
if (!checkForDelim && !temp.empty() || keep_empty) {
result.push_back(temp);
temp.clear();
}
temp.push_back('"');
++i;
} else {
//matched end quote
checkForDelim=true;
temp.push_back('"');
++i;
result.push_back(temp);
temp.clear();
}
} else if ('
'==*i) {
temp+="\n";
++i;
} else {
temp.push_back(*i);
++i;
}
}
if (!temp.empty() || keep_empty) {
result.push_back(temp);
}
return result;
}
int runTest()
{
vector<string> delims;
delims.push_back(" ");
delims.push_back(" ");
delims.push_back("
");
delims.push_back("split_here");
vector<string> terms;
terms.push_back(">");
terms.push_back("end_here");
const vector<string> words = split("close no "
end_here matter" how
far testsplit_heretest"another split_here test"with some"mo>re", delims, terms, false);
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}
生成:
close
no
"
end_here matter"
how
far
test
test
"another split_here test"
with
some"mo
根据您提供的示例,您似乎希望换行符出现在引号之外时被视为分隔符,并在引号内时由文字
表示,这就是这样做的.它还添加了具有多个分隔符的功能,例如我使用测试时的 split_here
.
Based on the examples you gave, you seemed to want newlines to count as delimiters when they appear outside of quotes and be represented by the literal
when inside of quotes, so that's what this does. It also adds the ability to have multiple delimiters, such as split_here
as I used the test.
我不确定您是否希望以匹配引号的方式拆分不匹配的引号,因为您提供的示例将不匹配的引号用空格分隔.此代码将不匹配的引号视为任何其他字符,但如果这不是您想要的行为,它应该很容易修改.
I wasn't sure if you want unmatched quotes to be split the way matched quotes do since the example you gave has the unmatched quote separated by spaces. This code treats unmatched quotes as any other character, but it should be easy to modify if this is not the behavior you want.
行:
if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {
将适用于 STL 的大多数(如果不是全部)实现,但不能保证工作.它可以替换为更安全但更慢的版本:
will work on most, if not all, implementations of the STL, but it is not gauranteed to work. It can be replaced with the safer, but slower, version:
if (*testSymbol==s.substr(i-s.begin(),testSymbol->size())) {
这篇关于如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何让我的拆分只在一个真实的行上工作并且能


基础教程推荐
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 从 std::cin 读取密码 2021-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01