问题描述
我有一个这样的 html 表格:
I have a html table like this :
<table ... >
<tbody ... >
<tr ... >
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
</tr>
<tr ... >
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
string...
</td>
<td ...>
</td>
<td ...>
string...
</td>
</tr>
..............
</tbody>
</table>
这是一个数据表,我需要从中获取所有数据.该表将有许多行 ().每行都有一个固定的列()(目前是 5 ).记住每个表、tr、td 标签可能已格式化(其中说...")
This is a data table and I need to get all data from this.
The table will have many rows (<tr></tr>) . each row will have a fixed columns (<td></td>)(currently is 5 ).
remember each table,tr,td tag maybe formatted (where say "...")
我希望大家能帮我写一个正则表达式用于 preg_match_all 函数来获取这样的数据:
And I hope everyone can help me to write a regex for preg_match_all function to get the data like this :
array(
0 => array(
0=> 'some data0',
1=> 'some data1',
2=> 'some data2',
3=> 'some data3',
4=> 'some data4',
)
1 => array(
0=> 'some data0',
1=> 'some data1',
2=> 'some data2',
3=> 'some data3',
4=> 'some data4',
)
2 => array(
0=> 'some data0',
1=> 'some data1',
2=> 'some data2',
3=> 'some data3',
4=> 'some data4',
)
..........
)
现在是你的测试示例,希望你能帮助我!!!
Now the example for your test, hopfully you can help me!!!
<table border="1" >
<tbody style="" >
<tr style="" >
<td style="color:blue;">
data0
</td>
<td style="font-size:15px;">
data1
</td>
<td style="font-size:15px;">
data2
</td>
<td style="color:blue;">
data3
</td>
<td style="color:blue;">
data4
</td>
</tr>
<tr style="" >
<td style="color:blue;">
data00
</td>
<td style="font-size:15px;">
data11
</td>
<td style="font-size:15px;">
data22
</td>
<td style="color:blue;">
data33
</td>
<td style="color:blue;">
data44
</td>
</tr>
<tr style="color:black" >
<td style="color:blue;">
data000
</td>
<td style="font-size:15px;">
data111
</td>
<td style="font-size:15px;">
data222
</td>
<td style="color:blue;">
data333
</td>
<td style="color:blue;">
data444
</td>
</tr>
</tbody>
</table>
推荐答案
你绝对不想用 Regex 解析 HTML.
You absolutely do NOT want to parse HTML with Regex.
有太多的变体,一方面,更重要的是,正则表达式对于 HTML 的层次结构不是很好.最好使用 XML 解析器或更好的 HTML 特定解析器.
There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.
每当我需要抓取 HTML 时,我倾向于使用 Simple HTML DOM Parser 库,它需要一个HTML 树并将其解析为可遍历的 PHP 对象,您可以在该对象中查询类似 JQuery 的内容.
Whenever I need to scrape HTML, I tend to use the Simple HTML DOM Parser library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.
<?php
require 'simplehtmldom/simple_html_dom.php';
$sHtml = <<<EOS
<table border="1" >
<tbody style="" >
<tr style="" >
<td style="color:blue;">
data0
</td>
<td style="font-size:15px;">
data1
</td>
<td style="font-size:15px;">
data2
</td>
<td style="color:blue;">
data3
</td>
<td style="color:blue;">
data4
</td>
</tr>
<tr style="" >
<td style="color:blue;">
data00
</td>
<td style="font-size:15px;">
data11
</td>
<td style="font-size:15px;">
data22
</td>
<td style="color:blue;">
data33
</td>
<td style="color:blue;">
data44
</td>
</tr>
<tr style="color:black" >
<td style="color:blue;">
data000
</td>
<td style="font-size:15px;">
data111
</td>
<td style="font-size:15px;">
data222
</td>
<td style="color:blue;">
data333
</td>
<td style="color:blue;">
data444
</td>
</tr>
</tbody>
</table>
EOS;
$oHTML = str_get_html($sHtml);
$oTRs = $oHTML->find('table tr');
$aData = array();
foreach($oTRs as $oTR) {
$aRow = array();
$oTDs = $oTR->find('td');
foreach($oTDs as $oTD) {
$aRow[] = trim($oTD->plaintext);
}
$aData[] = $aRow;
}
var_dump($aData);
?>
和输出:
array
0 =>
array
0 => string 'data0' (length=5)
1 => string 'data1' (length=5)
2 => string 'data2' (length=5)
3 => string 'data3' (length=5)
4 => string 'data4' (length=5)
1 =>
array
0 => string 'data00' (length=6)
1 => string 'data11' (length=6)
2 => string 'data22' (length=6)
3 => string 'data33' (length=6)
4 => string 'data44' (length=6)
2 =>
array
0 => string 'data000' (length=7)
1 => string 'data111' (length=7)
2 => string 'data222' (length=7)
3 => string 'data333' (length=7)
4 => string 'data444' (length=7)
这篇关于仅从 php 中使用的 preg_match_all 的 html 表中获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!



大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)