How to make my split work only on one real line and be capable to skip quoted parts of string?(如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?)
问题描述
所以我们有一个简单的拆分:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
或boost split.我们有简单的 main 像:
or boost split. And we have simple main like:
int main() {
const vector<string> words = split("close no "
matter" how
far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}
如何让它像输出一样
close
no
"
matter"
how
end symbol found.
我们想引入未拆分的拆分结构和结束解析过程的字符.怎么办?
we want to introduce to split structures that shall be held unsplited and charecters that shall end parsing process. how to do such thing?
推荐答案
如下代码:
vector<string>::const_iterator matchSymbol(const string & s, string::const_iterator i, const vector<string> & symbols)
{
vector<string>::const_iterator testSymbol;
for (testSymbol=symbols.begin();testSymbol!=symbols.end();++testSymbol) {
if (!testSymbol->empty()) {
if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {
return testSymbol;
}
}
}
assert(testSymbol==symbols.end());
return testSymbol;
}
vector<string> split(const string& s, const vector<string> & delims, const vector<string> & terms, const bool keep_empty = true)
{
vector<string> result;
if (delims.empty()) {
result.push_back(s);
return result;
}
bool checkForDelim=true;
string temp;
string::const_iterator i=s.begin();
while (i!=s.end()) {
vector<string>::const_iterator testTerm=terms.end();
vector<string>::const_iterator testDelim=delims.end();
if (checkForDelim) {
testTerm=matchSymbol(s,i,terms);
testDelim=matchSymbol(s,i,delims);
}
if (testTerm!=terms.end()) {
i=s.end();
} else if (testDelim!=delims.end()) {
if (!temp.empty() || keep_empty) {
result.push_back(temp);
temp.clear();
}
string::const_iterator j=testDelim->begin();
while (i!=s.end() && j!=testDelim->end()) {
++i;
++j;
}
} else if ('"'==*i) {
if (checkForDelim) {
string::const_iterator j=i;
do {
++j;
} while (j!=s.end() && '"'!=*j);
checkForDelim=(j==s.end());
if (!checkForDelim && !temp.empty() || keep_empty) {
result.push_back(temp);
temp.clear();
}
temp.push_back('"');
++i;
} else {
//matched end quote
checkForDelim=true;
temp.push_back('"');
++i;
result.push_back(temp);
temp.clear();
}
} else if ('
'==*i) {
temp+="\n";
++i;
} else {
temp.push_back(*i);
++i;
}
}
if (!temp.empty() || keep_empty) {
result.push_back(temp);
}
return result;
}
int runTest()
{
vector<string> delims;
delims.push_back(" ");
delims.push_back(" ");
delims.push_back("
");
delims.push_back("split_here");
vector<string> terms;
terms.push_back(">");
terms.push_back("end_here");
const vector<string> words = split("close no "
end_here matter" how
far testsplit_heretest"another split_here test"with some"mo>re", delims, terms, false);
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}
生成:
close
no
"
end_here matter"
how
far
test
test
"another split_here test"
with
some"mo
根据您提供的示例,您似乎希望换行符出现在引号之外时被视为分隔符,并在引号内时由文字
表示,这就是这样做的.它还添加了具有多个分隔符的功能,例如我使用测试时的 split_here.
Based on the examples you gave, you seemed to want newlines to count as delimiters when they appear outside of quotes and be represented by the literal
when inside of quotes, so that's what this does. It also adds the ability to have multiple delimiters, such as split_here as I used the test.
我不确定您是否希望以匹配引号的方式拆分不匹配的引号,因为您提供的示例将不匹配的引号用空格分隔.此代码将不匹配的引号视为任何其他字符,但如果这不是您想要的行为,它应该很容易修改.
I wasn't sure if you want unmatched quotes to be split the way matched quotes do since the example you gave has the unmatched quote separated by spaces. This code treats unmatched quotes as any other character, but it should be easy to modify if this is not the behavior you want.
行:
if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {
将适用于 STL 的大多数(如果不是全部)实现,但不能保证工作.它可以替换为更安全但更慢的版本:
will work on most, if not all, implementations of the STL, but it is not gauranteed to work. It can be replaced with the safer, but slower, version:
if (*testSymbol==s.substr(i-s.begin(),testSymbol->size())) {
这篇关于如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何让我的拆分只在一个真实的行上工作并且能
基础教程推荐
- C++结构和函数声明。为什么它不能编译? 2022-11-07
- 在 C++ 中计算滚动/移动平均值 2021-01-01
- 这个宏可以转换成函数吗? 2022-01-01
- 常量变量在标题中不起作用 2021-01-01
- 如何在 C++ 中初始化静态常量成员? 2022-01-01
- 我有静态或动态 boost 库吗? 2021-01-01
- 如何通过C程序打开命令提示符Cmd 2022-12-09
- 静态库、静态链接动态库和动态链接动态库的 .lib 文件里面是什么? 2021-01-01
- 如何将 std::pair 的排序 std::list 转换为 std::map 2022-01-01
- 如何检查GTK+3.0中的小部件类型? 2022-11-30
