<small id='snLtu'></small><noframes id='snLtu'>

<i id='snLtu'><tr id='snLtu'><dt id='snLtu'><q id='snLtu'><span id='snLtu'><b id='snLtu'><form id='snLtu'><ins id='snLtu'></ins><ul id='snLtu'></ul><sub id='snLtu'></sub></form><legend id='snLtu'></legend><bdo id='snLtu'><pre id='snLtu'><center id='snLtu'></center></pre></bdo></b><th id='snLtu'></th></span></q></dt></tr></i><div id='snLtu'><tfoot id='snLtu'></tfoot><dl id='snLtu'><fieldset id='snLtu'></fieldset></dl></div>
<legend id='snLtu'><style id='snLtu'><dir id='snLtu'><q id='snLtu'></q></dir></style></legend>

      <tfoot id='snLtu'></tfoot>

        • <bdo id='snLtu'></bdo><ul id='snLtu'></ul>
      1. 如何从word文件中提取文本.doc,docx,.xlsx,.pptx php

        How to extract text from word file .doc,docx,.xlsx,.pptx php(如何从word文件中提取文本.doc,docx,.xlsx,.pptx php)

        <small id='WmdtK'></small><noframes id='WmdtK'>

          • <bdo id='WmdtK'></bdo><ul id='WmdtK'></ul>
          • <i id='WmdtK'><tr id='WmdtK'><dt id='WmdtK'><q id='WmdtK'><span id='WmdtK'><b id='WmdtK'><form id='WmdtK'><ins id='WmdtK'></ins><ul id='WmdtK'></ul><sub id='WmdtK'></sub></form><legend id='WmdtK'></legend><bdo id='WmdtK'><pre id='WmdtK'><center id='WmdtK'></center></pre></bdo></b><th id='WmdtK'></th></span></q></dt></tr></i><div id='WmdtK'><tfoot id='WmdtK'></tfoot><dl id='WmdtK'><fieldset id='WmdtK'></fieldset></dl></div>

              <tbody id='WmdtK'></tbody>

              <tfoot id='WmdtK'></tfoot>
              <legend id='WmdtK'><style id='WmdtK'><dir id='WmdtK'><q id='WmdtK'></q></dir></style></legend>

                • 本文介绍了如何从word文件中提取文本.doc,docx,.xlsx,.pptx php的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  可能有这样的场景,我们需要从word文档中获取文本,以备日后在用户上传的文档中搜索字符串,比如在简历/简历中搜索,常见的问题是如何获取文本,打开并阅读用户上传的 Word 文档,有一些有用的链接,但并不能解决整个问题.我们需要在上传时获取文本并将文本保存在数据库中,以便在数据库中轻松搜索.

                  There may be a scenario we need to get the text from word documents for the future use to search the string in the document uploaded by user like for searching in cv's/resumes and occurs a common problem that how to get the text , Open and read a user uploaded Word document,there are some helpful links but don't cure the whole problem.We need to get the text at the time of uploading and save text in database and we can easily search within the database.

                  推荐答案

                  这是一个简单的类,它为 .doc/.docx 做正确的工作,PHP docx 阅读器:将 MS Word Docx 文件转换为文本.

                  Here is a simple class which does the right job for .doc/.docx , PHP docx reader: Convert MS Word Docx files to text.

                      class DocxConversion{
                      private $filename;
                  
                      public function __construct($filePath) {
                          $this->filename = $filePath;
                      }
                  
                      private function read_doc() {
                          $fileHandle = fopen($this->filename, "r");
                          $line = @fread($fileHandle, filesize($this->filename));   
                          $lines = explode(chr(0x0D),$line);
                          $outtext = "";
                          foreach($lines as $thisline)
                            {
                              $pos = strpos($thisline, chr(0x00));
                              if (($pos !== FALSE)||(strlen($thisline)==0))
                                {
                                } else {
                                  $outtext .= $thisline." ";
                                }
                            }
                           $outtext = preg_replace("/[^a-zA-Z0-9s,.-
                  
                  	@/\_()]/","",$outtext);
                          return $outtext;
                      }
                  
                      private function read_docx(){
                  
                          $striped_content = '';
                          $content = '';
                  
                          $zip = zip_open($this->filename);
                  
                          if (!$zip || is_numeric($zip)) return false;
                  
                          while ($zip_entry = zip_read($zip)) {
                  
                              if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
                  
                              if (zip_entry_name($zip_entry) != "word/document.xml") continue;
                  
                              $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
                  
                              zip_entry_close($zip_entry);
                          }// end while
                  
                          zip_close($zip);
                  
                          $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
                          $content = str_replace('</w:r></w:p>', "
                  ", $content);
                          $striped_content = strip_tags($content);
                  
                          return $striped_content;
                      }
                  
                   /************************excel sheet************************************/
                  
                  function xlsx_to_text($input_file){
                      $xml_filename = "xl/sharedStrings.xml"; //content file name
                      $zip_handle = new ZipArchive;
                      $output_text = "";
                      if(true === $zip_handle->open($input_file)){
                          if(($xml_index = $zip_handle->locateName($xml_filename)) !== false){
                              $xml_datas = $zip_handle->getFromIndex($xml_index);
                              $xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
                              $output_text = strip_tags($xml_handle->saveXML());
                          }else{
                              $output_text .="";
                          }
                          $zip_handle->close();
                      }else{
                      $output_text .="";
                      }
                      return $output_text;
                  }
                  
                  /*************************power point files*****************************/
                  function pptx_to_text($input_file){
                      $zip_handle = new ZipArchive;
                      $output_text = "";
                      if(true === $zip_handle->open($input_file)){
                          $slide_number = 1; //loop through slide files
                          while(($xml_index = $zip_handle->locateName("ppt/slides/slide".$slide_number.".xml")) !== false){
                              $xml_datas = $zip_handle->getFromIndex($xml_index);
                              $xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
                              $output_text .= strip_tags($xml_handle->saveXML());
                              $slide_number++;
                          }
                          if($slide_number == 1){
                              $output_text .="";
                          }
                          $zip_handle->close();
                      }else{
                      $output_text .="";
                      }
                      return $output_text;
                  }
                  
                  
                      public function convertToText() {
                  
                          if(isset($this->filename) && !file_exists($this->filename)) {
                              return "File Not exists";
                          }
                  
                          $fileArray = pathinfo($this->filename);
                          $file_ext  = $fileArray['extension'];
                          if($file_ext == "doc" || $file_ext == "docx" || $file_ext == "xlsx" || $file_ext == "pptx")
                          {
                              if($file_ext == "doc") {
                                  return $this->read_doc();
                              } elseif($file_ext == "docx") {
                                  return $this->read_docx();
                              } elseif($file_ext == "xlsx") {
                                  return $this->xlsx_to_text();
                              }elseif($file_ext == "pptx") {
                                  return $this->pptx_to_text();
                              }
                          } else {
                              return "Invalid File Type";
                          }
                      }
                  
                  }
                  

                  Document_file_format Doc 文件是二进制 blob.可以使用 fopen.而 .docx 文件只是 zip 文件和 xml 文件 zipfile 容器中的 xml 文件(源维基百科),您可以使用 zip_open.

                  Document_file_format Doc files are binary blobs.They can be read by using fopen.While .docx files are just zip files and xml files xml files in a zipfile container (source wikipedia) you can read them by using zip_open.

                  上面类的用法

                  $docObj = new DocxConversion("test.doc");
                  //$docObj = new DocxConversion("test.docx");
                  //$docObj = new DocxConversion("test.xlsx");
                  //$docObj = new DocxConversion("test.pptx");
                  echo $docText= $docObj->convertToText();
                  

                  这篇关于如何从word文件中提取文本.doc,docx,.xlsx,.pptx php的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  DeepL的翻译效果还是很强大的,如果我们要用php实现DeepL翻译调用,该怎么办呢?以下是代码示例,希望能够帮到需要的朋友。 在这里需要注意,这个DeepL的账户和api申请比较难,不支持中国大陆申请,需要拥有香港或者海外信用卡才行,没账号的话,目前某宝可以
                  PHP通过phpspreadsheet导入Excel日期,导入系统后,全部变为了4开头的几位数字,这是为什么呢?原因很简单,将Excel的时间设置问文本,我们就能看到该日期本来的数值,上图对应的数值为: 要怎么解决呢?进行数据转换就行,这里可以封装方法,或者用第三方的
                  mediatemple - can#39;t send email using codeigniter(mediatemple - 无法使用 codeigniter 发送电子邮件)
                  Laravel Gmail Configuration Error(Laravel Gmail 配置错误)
                  Problem with using PHPMailer for SMTP(将 PHPMailer 用于 SMTP 的问题)
                  Issue on how to setup SMTP using PHPMailer in GoDaddy server(关于如何在 GoDaddy 服务器中使用 PHPMailer 设置 SMTP 的问题)
                    • <i id='x1Ms2'><tr id='x1Ms2'><dt id='x1Ms2'><q id='x1Ms2'><span id='x1Ms2'><b id='x1Ms2'><form id='x1Ms2'><ins id='x1Ms2'></ins><ul id='x1Ms2'></ul><sub id='x1Ms2'></sub></form><legend id='x1Ms2'></legend><bdo id='x1Ms2'><pre id='x1Ms2'><center id='x1Ms2'></center></pre></bdo></b><th id='x1Ms2'></th></span></q></dt></tr></i><div id='x1Ms2'><tfoot id='x1Ms2'></tfoot><dl id='x1Ms2'><fieldset id='x1Ms2'></fieldset></dl></div>

                          <tbody id='x1Ms2'></tbody>

                        <tfoot id='x1Ms2'></tfoot>

                        <legend id='x1Ms2'><style id='x1Ms2'><dir id='x1Ms2'><q id='x1Ms2'></q></dir></style></legend>
                          <bdo id='x1Ms2'></bdo><ul id='x1Ms2'></ul>

                            <small id='x1Ms2'></small><noframes id='x1Ms2'>