Select randomly x files in subdirectories(随机选择子目录中的 x 个文件)
问题描述
我需要在一个数据集中随机抽取 10 个文件(图像),但这个数据集是分层结构的.
I need to take exactly 10 files (images) in a dataset randomly, but this dataset is hierarchically structured.
所以我需要为每个包含图像的子目录随机保存其中的 10 个.有没有一种简单的方法可以做到这一点,或者我应该手动做到这一点?
So I need that for each subdirectory that contains images hold just 10 of them randomly. Is there an easy way to do that or I should do it manually?
def getListOfFiles(dirName):
### create a list of file and sub directories
### names in the given directory
listOfFile = os.listdir(dirName)
allFiles = list()
### Iterate over all the entries
for entry in listOfFile:
### Create full path
fullPath = os.path.join(dirName, entry)
### If entry is a directory then get the list of files in this directory
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
else:
allFiles.append(random.sample(fullPath, 10))
return allFiles
dirName = 'C:/Users/bla/bla'
### Get the list of all files in directory tree at given path
listOfFiles = getListOfFiles(dirName)
with open("elements.txt", mode='x') as f:
for elem in listOfFiles:
f.write(elem + '
')
推荐答案
从未知大小目录列表中采样的好方法是使用 水库采样.使用这种方法,您不必预先运行并列出目录中的所有文件.逐一阅读并示例.当您必须跨多个目录对固定数量的文件进行采样时,它甚至可以工作.
Good approach to sample from unknown size directory listing is to use Reservoir Sampling. With this approach, you don't have to run upfront and list all files in the directory. Read it one-by-one and sample. It even works when you have to sample fixed number of files across multiple directories.
最好使用基于生成器的目录扫描代码,它一次选择一个文件,因此您不必预先使用大量内存来保存所有文件名.
It would be good to use generator-based directory scanning code, which picks one file at a time, thus you don't use gobs of memory upfront to hold all file names.
顺理成章(注意!未指定的代码!)
Along the lines (NB! undested code!)
import numpy as np
import os
def ResSampleFiles(dirname, N):
"""pick N files from directory"""
sampled_files = list()
k = 0
for item in scandir(dirname):
if item.is_dir():
continue
full_path = os.path.join(dirname, item.name)
if k < N:
sampled_files.append(full_path)
else:
idx = np.random.randint(0, k+1)
if (idx < N):
sampled_files[idx] = full_path
k += 1
return sampled_files
这篇关于随机选择子目录中的 x 个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:随机选择子目录中的 x 个文件


基础教程推荐
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- 筛选NumPy数组 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01