使用 beautifulsoup 查找下一个兄弟姐妹，直到某个兄弟姐妹

2023-08-30Python开发问题

本文介绍了使用 beautifulsoup 查找下一个兄弟姐妹，直到某个兄弟姐妹的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

网页是这样的:

<h2>section1</h2><p>文章</p><p>文章</p><p>文章</p><h2>section2</h2><p>文章</p><p>文章</p><p>文章</p>

我怎样才能找到其中包含文章的每个部分?即找到h2后，找到nextsiblings

直到下一个h2.

如果网页是这样的:(通常是这种情况)

<h2>部分 1</h2><p>文章</p><p>文章</p><p>文章</p></div><h2>section2</h2><p>文章</p><p>文章</p><p>文章</p></div>

我可以写如下代码:

soup.findAll('div') 中的部分:...对于 section.findAll('p') 中的帖子

但是如果我想获得相同的结果，我应该如何处理第一个网页呢?

解决方案

我认为你可以这样做:

soup.findAll('h2') 中的部分:下一个节点 = 部分而真:nextNode = nextNode.nextSibling尝试:tag_name = nextNode.name除了属性错误:标签名 = ""如果 tag_name == "p":打印 nextNode.string别的:打印 "*****"休息

给定:

<h2>section1</h2><p>文章 1</p><p>文章 2</p><p>文章 3</p><h2>section2</h2><p>文章 4</p><p>第 5 条</p><p>第 6 条</p>

输出:

文章1第2条第3条*****第4条第五条第六条*****

The webpage is something like this:

<h2>section1</h2>
<p>article</p>
<p>article</p>
<p>article</p>

<h2>section2</h2>
<p>article</p>
<p>article</p>
<p>article</p>

How can I find each section with articles within them? That is, after finding h2, find nextsiblings

until the next h2.

If the webpage were like: (which is normally the case)

<div>
<h2>section1</h2>
<p>article</p>
<p>article</p>
<p>article</p>
</div>

<div>
<h2>section2</h2>
<p>article</p>
<p>article</p>
<p>article</p>
</div>

I can write codes like:

for section in soup.findAll('div'):
...
    for post in section.findAll('p')

But what should I do with the first webpage if I want to get the same result?

解决方案

I think you can do something like this:

for section in soup.findAll('h2'):
    nextNode = section
    while True:
        nextNode = nextNode.nextSibling
        try:
            tag_name = nextNode.name
        except AttributeError:
            tag_name = ""
        if tag_name == "p":
            print nextNode.string
        else:
            print "*****"
            break

Given:

<h2>section1</h2>
<p>article1</p>
<p>article2</p>
<p>article3</p>

<h2>section2</h2>
<p>article4</p>
<p>article5</p>
<p>article6</p>

Output:

article1
article2
article3
*****
article4
article5
article6
*****

这篇关于使用 beautifulsoup 查找下一个兄弟姐妹，直到某个兄弟姐妹的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

The End

相关推荐

在xarray中按单个维度的多个坐标分组

Pandas中的GROUP BY AND SUM不丢失列

GROUP BY+新列+基于条件的前一行抓取值

PANDA中的Groupby算法和插值算法

PANAS-基于列对行进行分组，并将NaN替换为非空值

按10分钟间隔对 pandas 数据帧进行分组

热门文章

热门精品源码

最新VIP资源