Java中字符的大小不是2字节吗?

Isn#39;t the size of character in Java 2 bytes?(Java中字符的大小不是2字节吗?)
本文介绍了Java中字符的大小不是2字节吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我使用 RandomAccessFile 从文本文件中读取 byte.

I used RandomAccessFile to read a byte from a text file.

public static void readFile(RandomAccessFile fr) {
    byte[] cbuff = new byte[1];
    fr.read(cbuff,0,1);
    System.out.println(new String(cbuff));
}

为什么我看到一个完整的字符被它读取?

Why am I seeing one full character being read by this?

推荐答案

一个char在Java中表示一个字符(*).它是 2 字节大(或 16 位).

A char represents a character in Java (*). It is 2 bytes large (or 16 bits).

这并不一定意味着一个字符的每个表示都是 2 个字节长.事实上,许多 字符编码 只为每个字符保留 1 个字节(或为最常见的字符保留 1 个字节)字符).

That doesn't necessarily mean that every representation of a character is 2 bytes long. In fact many character encodings only reserve 1 byte for every character (or use 1 byte for the most common characters).

当您调用 String(byte[]) 构造函数 你要求 Java 将 byte[] 转换为String 使用 平台的默认字符集.由于平台默认字符集通常是 1 字节编码(例如 ISO-8859-1)或可变长度编码(例如 UTF-8),因此它可以轻松地将 1 字节转换为单个字符.

When you call the String(byte[]) constructor you ask Java to convert the byte[] to a String using the platform's default charset. Since the platform default charset is usually a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8, it can easily convert that 1 byte to a single character.

如果您在使用 UTF-16(或 UTF-32 或 UCS-2 或 UCS-4 或 ...)作为平台默认编码的平台上运行该代码,那么您将不会得到有效的结果(您' 将获得一个包含 Unicode 替换字符的 String).

If you run that code on a platform that uses UTF-16 (or UTF-32 or UCS-2 or UCS-4 or ...) as the platform default encoding, then you will not get a valid result (you'll get a String containing the Unicode Replacement Character instead).

这就是您不应该依赖平台默认编码的原因之一:在 byte[]char[]/StringInputStreamReaderOutputStreamWriter 之间,你应该总是 指定您要使用的编码.如果您不这样做,那么您的代码将依赖于平台.

That's one of the reasons why you should not depend on the platform default encoding: when converting between byte[] and char[]/String or between InputStream and Reader or between OutputStream and Writer, you should always specify which encoding you want to use. If you don't, then your code will be platform-dependent.

(*) 不完全是:一个 char 代表一个 UTF-16 代码单元.一个两个 UTF-16代码单元代表一个Unicode 代码点.一个 Unicode 码点通常代表一个字符,但有时多个 Unicode 码点用于组成一个字符.但是上面的近似值已经足够接近讨论手头的话题了.

(*) that's not entirely true: a char represents a UTF-16 code unit. Either one or two UTF-16 code units represent a Unicode code point. A Unicode code point usually represents a character, but sometimes multiple Unicode code points are used to make up a single character. But the approximation above is close enough to discuss the topic at hand.

这篇关于Java中字符的大小不是2字节吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

How to send data to COM PORT using JAVA?(如何使用 JAVA 向 COM PORT 发送数据?)
How to make a report page direction to change to quot;rtlquot;?(如何使报表页面方向更改为“rtl?)
Use cyrillic .properties file in eclipse project(在 Eclipse 项目中使用西里尔文 .properties 文件)
Is there any way to detect an RTL language in Java?(有没有办法在 Java 中检测 RTL 语言?)
How to load resource bundle messages from DB in Java?(如何在 Java 中从 DB 加载资源包消息?)
How do I change the default locale settings in Java to make them consistent?(如何更改 Java 中的默认语言环境设置以使其保持一致?)