如何在Javascript中获取日文字符的长度?

2023-05-14前端开发问题
10

本文介绍了如何在Javascript中获取日文字符的长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

限时送ChatGPT账号..

我有一个带有 SHIFT_JIS 字符集的 ASP 经典页面.页面head部分下的meta标签是这样的:

I have an ASP Classic page with SHIFT_JIS charset. The meta tag under the page's head section is like this:

<meta http-equiv="Content-Type" content="text/html; charset=shift_jis">

我的页面有一个文本框 (txtName),它只能允许 200 个字符.我有一个验证字符长度的 Javascript 函数,该函数在我的提交按钮的 onclick() 事件中调用.

My page has a text box (txtName) that should only allow 200 characters. I have a Javascript function that validates the character length, which is called on the onclick() event of my Submit button.

if(document.frmPage.txtName.value.length > 200) {
  alert("You have exceeded the maximum length of 200.");
  return false;
}

问题是,Javascript 无法获取以 SHIFT_JIS 编码的正确长度的日文字符.例如,字符测的 SHIFT_JIS 长度为 8 个字符,但 Javascript 仅将其识别为一个字符,可能是因为 Javascript 默认使用的 Unicode 编码.在 SHIFT_JIS 中,某些字符(例如 ケ)有 2 或 3 个字符.

The problem is, Javascript is not getting the correct length of Japanese character encoded in SHIFT_JIS. For example, the character 测 has a SHIFT_JIS length of 8 characters, but Javascript is only recognizing it as one character, probably because of the Unicode encoding that Javascript uses by default. Some characters like ケ have 2 or 3 characters when in SHIFT_JIS.

如果我只依赖 Javascript 提供的长度,长日文字符将通过页面验证并尝试保存在数据库中,然后由于 DB 列的最大长度为 200 而失败.

If I will only depend on the length provided by Javascript, long Japanese characters would pass the page validation and it will try to save on the database, which will then fail because of the 200 maximum length of the DB column.

我使用的浏览器是 Internet Explorer.有没有办法使用 Javascript 获取日文字符的 SHIFT_JIS 长度?是否可以使用 Javascript 从 Unicode 转换为 SHIFT_JIS?如何?

The browser that I'm using is Internet Explorer. Is there a way to get the SHIFT_JIS length of the Japanese character using Javascript? Is it possible to convert from Unicode to SHIFT_JIS using Javascript? How?

感谢您的帮助!

推荐答案

例如,字符测的 SHIFT_JIS 长度为 8 个字符,但 Javascript 仅将其识别为一个字符,可能是因为 Unicode 编码的原因

For example, the character 测 has a SHIFT_JIS length of 8 characters, but Javascript is only recognizing it as one character, probably because of the Unicode encoding

让我们明确一点:测,U+6D4B(汉字'测量,估计,猜想')单个字符.当您将其编码为特定编码(如 Shift-JIS)时,它很可能会变成多个 字节.

Let's be clear: 测, U+6D4B (Han Character 'measure, estimate, conjecture') is a single character. When you encode it to a particular encoding like Shift-JIS, it may very well become multiple bytes.

一般而言,JavaScript 不提供编码表,因此您无法确定一个字符将占用多少字节.如果你真的需要,你必须携带足够的数据来自己解决.例如,如果您假设输入仅包含在 Shift-JIS 中有效的字符,则此函数将通过保留所有单字节字符的列表来计算需要多少字节,并假设每个其他字符占用两个字节:

In general JavaScript doesn't make encoding tables available so you can't find out how many bytes a character will take up. If you really need to, you have to carry around enough data to work it out yourself. For example, if you assume that the input contains only characters that are valid in Shift-JIS, this function would work out how many bytes are needed by keeping a list of all the characters that are a single byte, and assuming every other character takes two bytes:

function getShiftJISByteLength(s) {
    return s.replace(/[^x00-x80  ]/g, 'xx').length;
}

但是,Shift-JIS 中没有 8 字节序列,而且 Shift-JIS 中根本没有字符测".(这是一个在日本不使用的汉字.)

However, there are no 8-byte sequences in Shift-JIS, and the character 测 is not available in Shift-JIS at all. (It's a Chinese character not used in Japan.)

你可能会认为它构成一个 8 字节序列的原因是:当浏览器无法在表单中提交字符时,因为它不存在于目标字符集中,它会用 HTML 字符引用替换它:在这种情况下 &#27979;.这是一个有损的修改:您无法分辨用户是按字面输入的 还是 &#27979;.如果您将提交的内容 &#27979; 显示为 那么这意味着您忘记对输出进行 HTML 编码,这可能意味着您的应用程序很容易受到攻击跨站点脚本.

Why you might be thinking it constitutes an 8-byte sequence is this: when a browser can't submit a character in a form, because it does not exist in the target charset, it replaces it with an HTML character reference: in this case &#27979;. This is a lossy mangling: you can't tell whether the user typed literally or &#27979;. And if you are displaying the submitted content &#27979; as then that means you are forgetting to HTML-encode your output, which probably means your application is highly vulnerable to cross-site scripting.

唯一明智的答案是使用 UTF-8 而不是 Shift-JIS.UTF-8 可以愉快地对 测 或任何其他字符进行编码,而无需求助于损坏的 HTML 字符引用.如果您需要在数据库中存储受编码字节长度限制的内容,可以使用一种偷偷摸摸的技巧来获取字符串中 UTF-8 字节的数量:

The only sensible answer is to use UTF-8 instead of Shift-JIS. UTF-8 can happily encode 测, or any other character, without having to resort to broken HTML character references. If you need to store content limited by encoded byte length in your database, there is a sneaky hack you can use to get the number of UTF-8 bytes in a string:

function getUTF8ByteLength(s) {
    return unescape(encodeURIComponent(s)).length;
}

虽然在数据库中存储原生 Unicode 字符串可能会更好,这样长度限制指的是实际字符,而不是某些编码中的字节.

although probably it would be better to store native Unicode strings in the database so that the length limit refers to actual characters and not bytes in some encoding.

这篇关于如何在Javascript中获取日文字符的长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

The End

相关推荐

js删除数组中指定元素的5种方法
在JavaScript中,我们有多种方法可以删除数组中的指定元素。以下给出了5种常见的方法并提供了相应的代码示例: 1.使用splice()方法: let array = [0, 1, 2, 3, 4, 5];let index = array.indexOf(2);if (index -1) { array.splice(index, 1);}// array = [0,...
2024-11-22 前端开发问题
182

JavaScript小数运算出现多位的解决办法
在开发JS过程中,会经常遇到两个小数相运算的情况,但是运算结果却与预期不同,调试一下发现计算结果竟然有那么长一串尾巴。如下图所示: 产生原因: JavaScript对小数运算会先转成二进制,运算完毕再转回十进制,过程中会有丢失,不过不是所有的小数间运算会...
2024-10-18 前端开发问题
301

JavaScript(js)文件字符串中丢失"\"斜线的解决方法
问题描述: 在javascript中引用js代码,然后导致反斜杠丢失,发现字符串中的所有\信息丢失。比如在js中引用input type=text onkeyup=value=value.replace(/[^\d]/g,) ,结果导致正则表达式中的\丢失。 问题原因: 该字符串含有\,javascript对字符串进行了转...
2024-10-17 前端开发问题
437

layui中table列表 增加属性 edit="date",不生效怎么办?
如果你想在 layui 的 table 列表中增加 edit=date 属性但不生效,可能是以下问题导致的: 1. 缺少日期组件的初始化 如果想在表格中使用日期组件,需要在页面中引入 layui 的日期组件,并初始化: script type="text/javascript" src="/layui/layui.js"/scrip...
2024-06-11 前端开发问题
455

Rails/Javascript:如何将 rails 变量注入(非常)简单的 javascript
Rails/Javascript: How to inject rails variables into (very) simple javascript(Rails/Javascript:如何将 rails 变量注入(非常)简单的 javascript)...
2024-04-20 前端开发问题
5

CoffeeScript 总是以匿名函数返回
CoffeeScript always returns in anonymous function(CoffeeScript 总是以匿名函数返回)...
2024-04-20 前端开发问题
13