Any good solutions for C++ string code point and code unit?(C++ 字符串代码点和代码单元有什么好的解决方案吗?)
问题描述
在 Java 中,字符串具有方法:
In Java, a String has methods:
length()/charAt(), codePointCount()/codePointAt()
C++11 has std::string a = u8"很烫烫的一锅汤";
但是a.size()是char数组的长度,不能索引unicode char.
but a.size() is the length of char array, cannot index the unicode char.
有没有针对C++字符串中的unicode的解决方案?
Is there any solutions for unicode in C++ string ?
推荐答案
我一般将 UTF-8 字符串转换为宽 UTF-32/UCS-2 字符串在进行字符操作之前.C++ 实际上确实给了我们一些函数来做到这一点,但它们不是很用户友好,所以我在这里写了一些更好的转换函数:
I generally convert the UTF-8 string to a wide UTF-32/UCS-2 string before doing character operations. C++ does actually give us functions to do that but they are not very user friendly so I have written some nicer conversion functions here:
// This should convert to whatever the system wide character encoding 
// is for the platform (UTF-32/Linux - UCS-2/Windows)
std::string ws_to_utf8(std::wstring const& s)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> cnv;
    std::string utf8 = cnv.to_bytes(s);
    if(cnv.converted() < s.size())
        throw std::runtime_error("incomplete conversion");
    return utf8;
}
std::wstring utf8_to_ws(std::string const& utf8)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> cnv;
    std::wstring s = cnv.from_bytes(utf8);
    if(cnv.converted() < utf8.size())
        throw std::runtime_error("incomplete conversion");
    return s;
}
int main()
{
    std::string s = u8"很烫烫的一锅汤";
    auto w = utf8_to_ws(s); // convert to wide (UTF-32/UCS-2)
    // now we can use code-point indexes on the wide string
    std::cout << s << " is " << w.size() << " characters long" << '
';
}
输出:
很烫烫的一锅汤 is 7 characters long
如果您想在 UTF-32 之间进行转换而不管平台如何,那么您可以使用以下(未经充分测试的)转换例程:
If you want to convert to and from UTF-32 regardless of platform then you can use the following (not so well tested) conversion routines:
std::string utf32_to_utf8(std::u32string const& utf32)
{
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cnv;
    std::string utf8 = cnv.to_bytes(utf32);
    if(cnv.converted() < utf32.size())
        throw std::runtime_error("incomplete conversion");
    return utf8;
}
std::u32string utf8_to_utf32(std::string const& utf8)
{
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cnv;
    std::u32string utf32 = cnv.from_bytes(utf8);
    if(cnv.converted() < utf8.size())
        throw std::runtime_error("incomplete conversion");
    return utf32;
}
注意:C++17 std::wstring_convert 已弃用强>.
然而我仍然更喜欢使用它而不是第三方库,因为它可移植,它避免了外部依赖,在提供替代品之前不会被删除并且在所有情况下都可以轻松替换这些函数的实现,而无需更改使用它们的所有代码.
However I still prefer to use it over a third party library because it is portable, it avoids external dependencies, it won't be removed until a replacement is provided and in all cases it will be easy to replace the implementations of these functions without having to change all the code that uses them.
这篇关于C++ 字符串代码点和代码单元有什么好的解决方案吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:C++ 字符串代码点和代码单元有什么好的解决方案吗?
				
        
 
            
        基础教程推荐
- 如何检查GTK+3.0中的小部件类型? 2022-11-30
 - 常量变量在标题中不起作用 2021-01-01
 - C++结构和函数声明。为什么它不能编译? 2022-11-07
 - 如何在 C++ 中初始化静态常量成员? 2022-01-01
 - 我有静态或动态 boost 库吗? 2021-01-01
 - 如何通过C程序打开命令提示符Cmd 2022-12-09
 - 在 C++ 中计算滚动/移动平均值 2021-01-01
 - 这个宏可以转换成函数吗? 2022-01-01
 - 静态库、静态链接动态库和动态链接动态库的 .lib 文件里面是什么? 2021-01-01
 - 如何将 std::pair 的排序 std::list 转换为 std::map 2022-01-01
 
    	
    	
    	
    	
    	
    	
    	
    	
						
						
						
						
						
				
				
				
				