List files in a folder as a stream to begin process immediately(将文件夹中的文件作为流列出以立即开始处理)
问题描述
我有一个文件夹,里面有 100 万个文件.
I get a folder with 1 million files in it.
当列出此文件夹中的文件时,我想立即开始处理,使用 Python 或其他脚本语言.
I would like to begin process immediately, when listing files in this folder, in Python or other script langage.
通常的函数(python 中的 os.listdir...)被阻塞,我的程序必须等待列表的末尾,这可能需要很长时间.
The usual functions (os.listdir in python...) are blocking and my program has to wait the end of the list, which can take a long time.
列出大文件夹的最佳方法是什么?
What's the best way to list huge folders ?
推荐答案
如果方便,改变你的目录结构;但如果没有,你可以使用ctypes调用 opendir
和 readdir
.
If convenient, change your directory structure; but if not, you can use ctypes to call opendir
and readdir
.
这是该代码的副本;我所做的就是正确缩进它,添加 try/finally
块,然后修复一个错误.您可能需要调试它.特别是结构布局.
Here is a copy of that code; all I did was indent it properly, add the try/finally
block, and fix a bug. You might have to debug it. Particularly the struct layout.
请注意,此代码不可可移植.你需要在 Windows 上使用不同的函数,我认为结构因 Unix 而异.
Note that this code is not portable. You would need to use different functions on Windows, and I think the structs vary from Unix to Unix.
#!/usr/bin/python
"""
An equivalent os.listdir but as a generator using ctypes
"""
from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER
from ctypes.util import find_library
class c_dir(Structure):
"""Opaque type for directory entries, corresponds to struct DIR"""
pass
c_dir_p = POINTER(c_dir)
class c_dirent(Structure):
"""Directory entry"""
# FIXME not sure these are the exactly correct types!
_fields_ = (
('d_ino', c_long), # inode number
('d_off', c_long), # offset to the next dirent
('d_reclen', c_ushort), # length of this record
('d_type', c_byte), # type of file; not supported by all file system types
('d_name', c_char * 4096) # filename
)
c_dirent_p = POINTER(c_dirent)
c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p
# FIXME Should probably use readdir_r here
readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p
closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int
def listdir(path):
"""
A generator to return the names of files in the directory passed in
"""
dir_p = opendir(path)
try:
while True:
p = readdir(dir_p)
if not p:
break
name = p.contents.d_name
if name not in (".", ".."):
yield name
finally:
closedir(dir_p)
if __name__ == "__main__":
for name in listdir("."):
print name
这篇关于将文件夹中的文件作为流列出以立即开始处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:将文件夹中的文件作为流列出以立即开始处理


基础教程推荐
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- 筛选NumPy数组 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01