<tfoot id='5CmXb'></tfoot>

    • <bdo id='5CmXb'></bdo><ul id='5CmXb'></ul>
    <i id='5CmXb'><tr id='5CmXb'><dt id='5CmXb'><q id='5CmXb'><span id='5CmXb'><b id='5CmXb'><form id='5CmXb'><ins id='5CmXb'></ins><ul id='5CmXb'></ul><sub id='5CmXb'></sub></form><legend id='5CmXb'></legend><bdo id='5CmXb'><pre id='5CmXb'><center id='5CmXb'></center></pre></bdo></b><th id='5CmXb'></th></span></q></dt></tr></i><div id='5CmXb'><tfoot id='5CmXb'></tfoot><dl id='5CmXb'><fieldset id='5CmXb'></fieldset></dl></div>

    <small id='5CmXb'></small><noframes id='5CmXb'>

    <legend id='5CmXb'><style id='5CmXb'><dir id='5CmXb'><q id='5CmXb'></q></dir></style></legend>

      如何在 linux 上获取 Word 文档中的页数?

      How to get the number of pages in a Word Document on linux?(如何在 linux 上获取 Word 文档中的页数?)

      • <bdo id='RNSDz'></bdo><ul id='RNSDz'></ul>

          <legend id='RNSDz'><style id='RNSDz'><dir id='RNSDz'><q id='RNSDz'></q></dir></style></legend>
            • <i id='RNSDz'><tr id='RNSDz'><dt id='RNSDz'><q id='RNSDz'><span id='RNSDz'><b id='RNSDz'><form id='RNSDz'><ins id='RNSDz'></ins><ul id='RNSDz'></ul><sub id='RNSDz'></sub></form><legend id='RNSDz'></legend><bdo id='RNSDz'><pre id='RNSDz'><center id='RNSDz'></center></pre></bdo></b><th id='RNSDz'></th></span></q></dt></tr></i><div id='RNSDz'><tfoot id='RNSDz'></tfoot><dl id='RNSDz'><fieldset id='RNSDz'></fieldset></dl></div>
              <tfoot id='RNSDz'></tfoot>

                <tbody id='RNSDz'></tbody>
            • <small id='RNSDz'></small><noframes id='RNSDz'>

                本文介绍了如何在 linux 上获取 Word 文档中的页数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                我看到了这个问题 PHP - 获取页数一个 Word 文档.我还需要从给定的 word 文件 (doc/docx) 中确定页数.我试图调查 phplivedocx/ZF(@hobodave 链接到原始帖子答案中的那些),但我在那里失去了手和腿.我也不能使用任何外部网络服务(例如 DOC2PDF 站点,然后计算 PDF 版本中的页数,等等......).

                I saw this question PHP - Get number of pages in a Word document . I also need to determine the pages count from given word file (doc/docx). I tried to investigate phplivedocx/ZF (@hobodave linked to those in the original post answers), but I lost my hands and legs there. I can't use any outer web service either (like DOC2PDF sites, and then count the pages in the PDF version, or so...).

                简单地说:是否有任何 php 代码(使用 ZF 或 PHP 中的任何其他代码,不包括 COM 对象或其他执行文件,例如AbiWord";我使用的是共享 Linux 服务器,没有 exec 或类似的函数),找到word文件的页数?

                Simply: Is there any php code (using ZF or anything else in PHP, excluding COM object or other execution-files, such 'AbiWord'; I'm using shared Linux server, without exec or similar function), to find the pages count of word file?

                即将支持的 Word 版本是 Microsoft-Word 2003 &2007.

                The word versions that about to be supported are Microsoft-Word 2003 & 2007.

                推荐答案

                获取 docx 文件的页数非常简单:

                Getting the number of pages for docx files is very easy:

                function get_num_pages_docx($filename)
                {
                    $zip = new ZipArchive();
                
                    if($zip->open($filename) === true)
                    {  
                        if(($index = $zip->locateName('docProps/app.xml')) !== false)
                        {
                            $data = $zip->getFromIndex($index);
                            $zip->close();
                
                            $xml = new SimpleXMLElement($data);
                            return $xml->Pages;
                        }
                
                        $zip->close();
                    }
                
                    return false;
                }
                

                对于 97-2003 格式,这当然具有挑战性,但绝不是不可能的.页数存储在文档的 SummaryInformation 部分中,但由于文件的 OLE 格式,因此很难找到.该结构的定义非常彻底(尽管很糟糕)这里 和更简单的 此处.我今天看了一个小时,但并没有走多远!(不是我习惯的抽象级别),但输出十六进制以更好地理解结构:

                For 97-2003 format it's certainly challenging, but by no means impossible. The number of pages is stored in the SummaryInformation section of the document, but due to the OLE format of the files that makes it a pain to find. The structure is defined extremely thoroughly (though badly imo) here and simpler here. I looked at this for an hour today, but didn't get very far! (not a level of abstraction I'm used to), but output the hex to better understand the structure:

                function get_num_pages_doc($filename) 
                {
                    $handle = fopen($filename, 'r');
                    $line = @fread($handle, filesize($filename));
                
                    echo '<div style="font-family: courier new;">';
                
                        $hex = bin2hex($line);
                        $hex_array = str_split($hex, 4);
                        $i = 0;
                        $line = 0;
                        $collection = '';
                        foreach($hex_array as $key => $string)
                        {
                            $collection .= hex_ascii($string);
                            $i++;
                
                            if($i == 1)
                            {
                                echo '<b>'.sprintf('%05X', $line).'0:</b> ';
                            }
                
                            echo strtoupper($string).' ';
                
                            if($i == 8)
                            {
                                echo ' '.$collection.' <br />'."
                ";
                                $collection = '';
                                $i = 0;
                
                                $line += 1;
                            }
                        }
                
                    echo '</div>';
                
                    exit();
                }
                
                function hex_ascii($string, $html_safe = true)
                {
                    $return = '';
                
                    $conv = array($string);
                    if(strlen($string) > 2)
                    {
                        $conv = str_split($string, 2);
                    }
                
                    foreach($conv as $string)
                    {
                        $num = hexdec($string);
                
                        $ascii = '.';
                        if($num > 32)
                        {   
                            $ascii = unichr($num);
                        }
                
                        if($html_safe AND ($num == 62 OR $num == 60))
                        {
                            $return .= htmlentities($ascii);
                        }
                        else
                        {
                            $return .= $ascii;
                        }
                    }
                
                    return $return;
                }
                
                function unichr($intval)
                {
                    return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
                }
                

                这将输出代码,您可以在其中找到诸如以下部分的代码:

                which will out put code where you can find the sections such as:

                007000: 0500 5300 7500 6D00 6D00 6100 7200 7900 ..S.u.m.m.a.r.y.
                007010: 4900 6E00 6600 6F00 7200 6D00 6100 7400 I.n.f.o.r.m.a.t.
                007020: 6900 6F00 6E00 0000 0000 0000 0000 0000 i.o.n...........
                007030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 
                

                这将允许您查看引用信息,例如:

                Which will allow you to see the referencing info such as:

                007040: 2800 0201 FFFF FFFF FFFF FFFF FFFF FFFF (...
                007050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
                007060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
                007070: 0000 0000 2500 0000 0010 0000 0000 0000 ....%...........
                

                这将允许您确定所描述的属性:

                Which will allow you to determine properties described:

                _ab = ("SummaryInformation") 
                _cb = 0028
                _mse = 02 (STGTY_STREAM) 
                _bflags = 01 (DE_BLACK) 
                _sidLeftSib = FFFF FFFF 
                _sidRightSib = FFFF FFFF (none) 
                _sidChild = FFFF FFFF (n/a for STGTY_STREAM) 
                _clsid = 0000 0000 0000 0000 0000 0000 0000 0000 (n/a) 
                _dwUserFlags = 0000 0000 (n/a) 
                _time[0] = CreateTime = 0000 0000 0000 0000 (n/a) 
                _time[1] = ModifyTime = 0000 0000 0000 0000 (n/a)
                _startSect = 0000 0000 
                _ulSize = 0000 1000 
                _dptPropType = 0000 (n/a)
                

                这将让您找到相关的代码部分,解压缩并获取页码.当然,这是我没有时间做的难点,但应该让您朝着正确的方向前进.

                Which will let you find the relevant section of code, unpack it and get the page number. Of course this is the hard bit that I just don't have time for, but should set you in the right direction.

                百万美元不容易!

                这篇关于如何在 linux 上获取 Word 文档中的页数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                DeepL的翻译效果还是很强大的,如果我们要用php实现DeepL翻译调用,该怎么办呢?以下是代码示例,希望能够帮到需要的朋友。 在这里需要注意,这个DeepL的账户和api申请比较难,不支持中国大陆申请,需要拥有香港或者海外信用卡才行,没账号的话,目前某宝可以
                PHP通过phpspreadsheet导入Excel日期,导入系统后,全部变为了4开头的几位数字,这是为什么呢?原因很简单,将Excel的时间设置问文本,我们就能看到该日期本来的数值,上图对应的数值为: 要怎么解决呢?进行数据转换就行,这里可以封装方法,或者用第三方的
                mediatemple - can#39;t send email using codeigniter(mediatemple - 无法使用 codeigniter 发送电子邮件)
                Laravel Gmail Configuration Error(Laravel Gmail 配置错误)
                Problem with using PHPMailer for SMTP(将 PHPMailer 用于 SMTP 的问题)
                Issue on how to setup SMTP using PHPMailer in GoDaddy server(关于如何在 GoDaddy 服务器中使用 PHPMailer 设置 SMTP 的问题)

                  <bdo id='Ntq34'></bdo><ul id='Ntq34'></ul>
                  <i id='Ntq34'><tr id='Ntq34'><dt id='Ntq34'><q id='Ntq34'><span id='Ntq34'><b id='Ntq34'><form id='Ntq34'><ins id='Ntq34'></ins><ul id='Ntq34'></ul><sub id='Ntq34'></sub></form><legend id='Ntq34'></legend><bdo id='Ntq34'><pre id='Ntq34'><center id='Ntq34'></center></pre></bdo></b><th id='Ntq34'></th></span></q></dt></tr></i><div id='Ntq34'><tfoot id='Ntq34'></tfoot><dl id='Ntq34'><fieldset id='Ntq34'></fieldset></dl></div>
                  <legend id='Ntq34'><style id='Ntq34'><dir id='Ntq34'><q id='Ntq34'></q></dir></style></legend><tfoot id='Ntq34'></tfoot>

                      <small id='Ntq34'></small><noframes id='Ntq34'>

                        <tbody id='Ntq34'></tbody>