我缺少哪些标题来抓取 NBA 统计数据?

What headers am I missing to scrape the NBA Stats data?(我缺少哪些标题来抓取 NBA 统计数据?)
本文介绍了我缺少哪些标题来抓取 NBA 统计数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

几天前,我在 Power BI 中创建了一个 Web 查询,允许我从 NBA 球员统计数据 不使用任何标题.截至今天,我注意到该查询不再有效;我收到以下错误消息:

A couple of days ago in Power BI, I was able to create a web query that allowed me to extract the JSON data from NBA Player Stats without using any headers. As of today, I have noticed that the query no longer works; I am getting the following error message:

DataSource.Error: The underlying connection was closed. An unexpected error occurred on a receive.
Details: https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=&Weight=

在相关说明中,我曾经能够从 NBA Team Stats 使用 https://stats.nba.com/ 作为 Referer 标题,但现在它给了我同样的错误消息如上图.为了尝试解决这些错误,我尝试输入以下标题:

On a related note, I used to be able to pull the JSON data from NBA Team Stats using https://stats.nba.com/ as a Referer header, but now it's giving me the same error message as shown above. To try and get around these errors, I have tried entering the following headers:

Host: stats.nba.com
Connection: keep-alive
Accept: application/json
x-nba-stats-token: true
User-Agent: Chrome/79.0.3945.130
x-nba-stats-origin: stats
Referer: https://stats.nba.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

当我使用上述标题提交查询时,它会返回以下错误消息:

When I do submit the query with the above headers, it comes back with the following error message:

Unable to connect

We encountered an error while trying to connect.

Details: "The 'Host' header must be modified using the appropriate property or method.
Parameter name: name"

对于如何正确运行查询,我已经没有想法了.我对网络抓取和 HTML 真的很陌生——我一直在努力自学.非常感谢任何帮助.

I have run out of ideas as to how I'm able to properly run the query. I'm really new to web-scraping and HTML -- I've been trying to teach myself. Any help is greatly appreciated.

推荐答案

GET 请求的所有标头:

All headers for GET request:

Host: stats.nba.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json, text/plain, */*
x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==
DNT: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US;q=0.9,en;q=0.7

网址:

https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=

必填标题:

Accept: application/json, text/plain, */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1

不确定是否需要:

x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==

可能的问题:

  1. 您检测为机器人并被阻止

  1. You detected as a bot and blocked

Header X-NewRelic-ID 是一个令牌(可能有超时).可能它是使用不同的参数分配的,例如 IP、User-Agent 等等.
您可以通过对 https://stats.nba.com/ 的 GET 请求在 HTML 响应中获取新的 X-NewRelic-ID.这是带有 xpid 标记的 HTML 的一部分:<script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQECWF5UChAHUlNTBwgBVw==",licenseKey:"09f0cb5c68",applicationID:"76210961"};

Header X-NewRelic-ID is a token (maybe with timeout). Probably it's assign using different params like IP, User-Agent and among others.
You can get fresh X-NewRelic-ID in HTML response with GET request to https://stats.nba.com/. Here is a part from HTML with xpid token: <script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQECWF5UChAHUlNTBwgBVw==",licenseKey:"09f0cb5c68",applicationID:"76210961"};

这篇关于我缺少哪些标题来抓取 NBA 统计数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

ajax请求获取json数据并处理的实例代码 $.ajax({ type: 'GET', url: 'https://localhost:44369/UserInfo/EditUserJson',//请求数据 data: json,//传递数据 //dataType:'json/text',//预计服务器返回的类型 timeout: 3000,//请求超时的时间 //回调函数传参 suc
quot;Each child in an array should have a unique key propquot; only on first time render of page(“数组中的每个孩子都应该有一个唯一的 key prop仅在第一次呈现页面时)
How do I make a TextGeometry multiline? How do I put it inside a square so it wraps like html text does inside a div?(如何制作 TextGeometry 多线?如何将它放在一个正方形内,以便它像 html 文本一样包裹在 div 内?) - IT屋-程序员软件开发技术分享社
Scale background image to fit ie8 window(缩放背景图像以适合 ie8 窗口)
Problems reading JSON file in D3 javascript in some (not all) browsers(在某些(不是全部)浏览器中读取 D3 javascript 中的 JSON 文件时出现问题)
Safari 5.1 breaks CSS table cell spacing(Safari 5.1 打破 CSS 表格单元格间距)