Get count of matching word in string of pandas column with a predefined list(使用预定义列表获取 pandas 列中匹配单词的计数)
问题描述
我有一个 DataFrame
包含 index
和 text
列.
I have a DataFrame
contains index
and text
columns.
例如:
index | text
1 | "I have a pen, but I lost it today."
2 | "I have pineapple and pen, but I lost it today."
现在我有一个很长的列表,我想将 text
中的每个单词与列表进行匹配.
Now I have a long list, and I want to match each of the words in text
with the list.
假设:
long_list = ['pen', 'pineapple']
我想创建一个 FunctionTransformer
来匹配 long_list
中的单词与列值的每个单词,如果匹配,则返回计数.
I would want to create a FunctionTransformer
to match words in the long_list
with each word of the column value, if there is a match, return the count.
index | text | count
1 | "I have a pen, but I lost it today." | 1
2 | "I have pineapple and pen, but I lost it today." | 2
我是这样做的:
def count_words(df):
long_list = ['pen', 'pineapple']
count = 0
for c in df['tweet_text']:
if c in long_list:
count = count + 1
df['count'] = count
return df
count_word = FunctionTransformer(count_words, validate=False)
我如何开发其他 FunctionTransformer
的示例如下:
An example of how I develop my other FunctionTransformer
will be:
def convert_twitter_datetime(df):
df['hour'] = pd.to_datetime(df['created_at'], format='%a %b %d %H:%M:%S +0000 %Y').dt.strftime('%H').astype(int)
return df
convert_datetime = FunctionTransformer(convert_twitter_datetime, validate=False)
推荐答案
灵感来自@Quang Hoang 的回答
Inspired by @Quang Hoang's answer
import pandas as pd
import sklearn as sk
y=['pen', 'pineapple']
def count_strings(X, y):
pattern = r'{}'.format('|'.join(y))
return X['text'].str.count(pattern)
string_transformer = sk.preprocessing.FunctionTransformer(count_strings, kw_args={'y': y})
df['count'] = string_transformer.fit_transform(X=df)
结果
text count
1 "I have a pen, but I lost it today." 1
2 "I have pineapple and pen, but I lost it today. 2
对于下面的df2
:
#df2
text
1 "I have a pen, but I lost it today. pen pen"
2 "I have pineapple and pen, but I lost it today."
我们得到
string_transformer.transform(X=df2)
#result
1 3
2 2
Name: text, dtype: int64
这表明,我们将函数转换为 sklearn
样式的对象.为了进一步抽象这一点,我们可以将列名作为关键字参数传递给 count_strings
.
This shows, that we converted the function to an sklearn
-style object. To abstact this even further we can hand over the column name as key-word argument to count_strings
.
这篇关于使用预定义列表获取 pandas 列中匹配单词的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用预定义列表获取 pandas 列中匹配单词的计数


基础教程推荐
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 筛选NumPy数组 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01