Python - beautifulsoup，应用于文件夹中的每个文本文件并生成新的文本文件

沃梦达教程前端问题

2022-01-01

Python - beautifulsoup, apply in every text file in folder and produce new text file(Python - beautifulsoup，应用于文件夹中的每个文本文件并生成新的文本文件)

本文介绍了Python - beautifulsoup，应用于文件夹中的每个文本文件并生成新的文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用以下 Python - Beautifulsoup 代码从文本文件中删除 html 元素:

I am using the following Python - Beautifulsoup code to remove html elements from a text file:

from bs4 import BeautifulSoup

with open("textFileWithHtml.txt") as markup:
    soup = BeautifulSoup(markup.read())

with open("strip_textFileWithHtml.txt", "w") as f: 
    f.write(soup.get_text().encode('utf-8'))

我的问题是如何将此代码应用于文件夹(目录)中的每个文本文件，并为每个文本文件生成一个新的文本文件，该文件已被处理并删除了 html 元素等，而无需为每个文本文件调用函数?

The question I have is how can I apply this code to every text file in a folder(directory), and for each text file produce a new text file which is processed and where the html elements etc. are removed, without having to call the function for each and every text file?

推荐答案

我会将这项工作留给操作系统，只需将硬编码的输入文件替换为来自外部源的输入，在 argv 数组中，然后在循环内或使用匹配许多文件的正则表达式调用脚本，例如:

I would leave that work to the OS, simply replace the hardcoded input file with input from external source, in argv array, and invoke the script inside a loop or with a regular expression that matches many files, like:

from bs4 import BeautifulSoup
import sys

for fi in sys.argv[1:]:
    with open(fi) as markup:
        soup = BeautifulSoup(markup.read())

    with open("strip_" + fi, "w") as f: 
        f.write(soup.get_text().encode('utf-8'))

然后像这样运行它: