Java:如何检查字符是否属于特定的 unicode 块?

Java: how to check if character belongs to a specific unicode block?(Java:如何检查字符是否属于特定的 unicode 块?)
本文介绍了Java:如何检查字符是否属于特定的 unicode 块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我需要确定我的输入属于哪种自然语言.目标是区分混合输入中的 阿拉伯语英语 单词,其中输入是 Unicode 并从 XML 文本节点中提取.我注意到类 Character.UnicodeBlock.和我的问题有关吗?我怎样才能让它工作?

I need to identify what natural language my input belongs to. The goal is to distinguish between Arabic and English words in a mixed input, where the input is Unicode and is extracted from XML text nodes. I have noticed the class Character.UnicodeBlock. Is it related to my problem? How can I get it to work?

Character.UnicodeBlock 方法对阿拉伯语很有用,但显然不适用于英语(或其他欧洲语言),因为 BASIC_LATIN Unicode 块涵盖符号和不可打印字符和字母.所以现在我使用 String 对象的 matches() 方法和正则表达式 "[A-Za-z]+" 代替.我可以忍受它,但也许有人可以提出更好/更快的方法.

The Character.UnicodeBlock approach was useful for Arabic, but apparently doesn't do it for English (or other European languages) because the BASIC_LATIN Unicode block covers symbols and non-printable characters as well as letters. So now I am using the matches() method of the String object with the regex expression "[A-Za-z]+" instead. I can live with it, but perhaps someone can suggest a nicer/faster way.

推荐答案

是的,你可以简单地使用 Character.UnicodeBlock.of(char)

Yes, you can simply use Character.UnicodeBlock.of(char)

这篇关于Java:如何检查字符是否属于特定的 unicode 块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

How to send data to COM PORT using JAVA?(如何使用 JAVA 向 COM PORT 发送数据?)
How to make a report page direction to change to quot;rtlquot;?(如何使报表页面方向更改为“rtl?)
Use cyrillic .properties file in eclipse project(在 Eclipse 项目中使用西里尔文 .properties 文件)
Is there any way to detect an RTL language in Java?(有没有办法在 Java 中检测 RTL 语言?)
How to load resource bundle messages from DB in Java?(如何在 Java 中从 DB 加载资源包消息?)
How do I change the default locale settings in Java to make them consistent?(如何更改 Java 中的默认语言环境设置以使其保持一致?)