问题描述
我正在尝试计算用非拉丁语言(保加利亚语)编写的变量中的单词数.但似乎 str_word_count() 没有计算非拉丁词.php文件的编码为UTF-8
im trying to count the number of words in variable written in non-latin language (Bulgarian). But it seems that str_word_count() is not counting non-latin words. The encoding of the php file is UTF-8
$str = "текст на кирилица";
echo 'Number of words: '.str_word_count($str);
//this returns 0
推荐答案
您可以使用正则表达式:
You may do it with regex:
$str = "текст на кирилица";
echo 'Number of words: '.count(preg_split('/s+/', $str));
这里我将单词定界符定义为空格字符.如果可能还有其他东西将被视为单词分隔符,您需要将其添加到您的正则表达式中.
here I'm defining word delimiter as space characters. If there may be something else that will be treated as word delimiter, you'll need to add it into your regex.
另外,请注意,由于在正则表达式中没有 utf 字符 (不在字符串中) - /u 修饰符不是必需的.但是如果你想要一些 utf 字符作为分隔符,你需要添加这个正则表达式修饰符.
Also, note, that since there's no utf characters in regex (not in string) - /u modifier isn't required. But if you'll want some utf characters to act as delimiter, you'll need to add this regex modifier.
更新:
如果您只想在文字中处理 西里尔文 字母,您可以使用:
If you want only cyrillic letters to be treated in words, you may use:
$str = "текст
на 12453
кирилица";
echo 'Number of words: '.count(preg_split('/[^А-Яа-яЁё]+/u', $str));
这篇关于str_word_count() 用于非拉丁词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!



大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)