R - Scraping an HTML table with rvest when there are missing lt;trgt; tags(当缺少lt;trgt;标记时,使用rvest R-擦除HTML表格)
本文介绍了当缺少<;tr>;标记时,使用rvest R-擦除HTML表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用rvest从网站上抓取一个HTML表。唯一的问题是,我要清理的表没有<tr>标记,第一行除外。如下所示:
<tr>
<td>6/21/2015 9:38 PM</td>
<td>5311 Lake Park</td>
<td>UCPD</td>
<td>African American</td>
<td>Male</td>
<td>Subject was causing a disturbance in the area.</td>
<td>Name checked; no further action</td>
<td>No</td>
</tr>
<td>6/21/2015 10:37 PM</td>
<td>5200 S Blackstone</td>
<td>UCPD</td>
<td>African American</td>
<td>Male</td>
<td>Subject was observed fighting in the McDonald's parking lot</td>
<td>Warned; released</td>
<td>No</td>
</tr>
等等。因此,使用以下代码,我只能将第一行放入我的数据框中:
library(rvest)
mydata <- html_session("https://incidentreports.uchicago.edu/incidentReportArchive.php?startDate=06/01/2015&endDate=06/21/2015") %>%
html_node("table") %>%
html_table(header = TRUE, fill=TRUE)
如何更改它以使html_table理解这些行就是行,即使它们没有开始<tr>标记?或者,有没有更好的办法来解决这个问题?
推荐答案
library(rvest)
url_parse<- read_html("https://incidentreports.uchicago.edu/incidentReportArchive.php?startDate=06/01/2015&endDate=06/21/2015")
col_name<- url_parse %>%
html_nodes("th") %>%
html_text()
mydata <- url_parse %>%
html_nodes("td") %>%
html_text()
finaldata <- data.frame(matrix(mydata, ncol=7, byrow=TRUE))
names(finaldata) <- col_name
finaldata
Incident Location
Reported Occurred
1 Theft 1115 E. 58th St. (Walker Bike Rack) 6/1/15 12:18 PM 5/31/15 to 6/1/15 8:00 PM to 12:00 PM
2 Information 5835 S. Kimbark 6/1/15 3:57 PM 6/1/15 3:55 PM
3 Information 1025 E. 58th St. (Swift) 6/2/15 2:18 AM 6/2/15 2:18 AM
4 Non-Criminal Damage to Property 850 E. 63rd St. (Car Wash) 6/2/15 8:48 AM 6/2/15 8:00 AM
5 Criminal Damage to Property 5631 S. Cottage Grove (Parking Structure) 6/2/15 7:32 PM 6/2/15 6:45 PM to 7:30 PM
Comments / Nature of Fire Disposition
1 Bicycle secured to bike rack taken by unknown person Open
2 Unknown person used staff member's personal information to file a fraudulent claim with U.S. Social Security Admin. / CPD case CPD
3 Three unaffiliated individuals reported tampering with bicycles in bike rack / Subjects were given trespass warnings and sent on their way Closed
4 Rear wiper blade assembly damaged on UC owned vehicle during car wash Closed
5 Unknown person(s) spray painted graffiti on north concrete wall of the structure Open
UCPDI#
1 E00344
2 E00345
3 E00346
4 E00347
5 E00348
这篇关于当缺少<;tr>;标记时,使用rvest R-擦除HTML表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
沃梦达教程
本文标题为:当缺少<;tr>;标记时,使用rvest R-擦除
基础教程推荐
猜你喜欢
- WatchKit 支持 html 吗?有没有像 UIWebview 这样的控制器? 2022-01-01
- 如何在特定日期之前获取消息? 2022-01-01
- Node.js 有没有好的索引/搜索引擎? 2022-01-01
- 为什么我在 Vue.js 中得到 ERR_CONNECTION_TIMED_OUT? 2022-01-01
- 如何使用sencha Touch2在单页中显示列表和其他标签 2022-01-01
- Javascript 在多个元素上单击事件侦听器并获取目标 2022-01-01
- 每次设置弹出窗口的焦点 2022-01-01
- jQuery File Upload - 如何识别所有文件何时上传 2022-01-01
- 什么是不使用 jQuery 的经验技术原因? 2022-01-01
- 如何使用 CSS 显示和隐藏 div? 2022-01-01
