python regular expression to split string and get all words is not working(拆分字符串和获取所有单词的Python正则表达式不起作用)
                            本文介绍了拆分字符串和获取所有单词的Python正则表达式不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
                        
                        问题描述
我正在尝试使用regular expression与python一起使用split字符串,并获取所有匹配的文字。
RE:w+(.?w+)*
此需要仅捕获[a-zA-Z0-9_]类内容。
Here is example
但是,当我尝试匹配并获取字符串中的所有内容时,它没有返回正确的结果。
代码段:
>>> import re
>>> from pprint import pprint
>>> pattern = r"w+(.?w+)*"
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same.
... Oh wait, it also need to filter out the symbols like !@#$%^&*()-+=[]{}.,;:'"`| (`.`)/
... 
... I guess that's it."""
>>> pprint(re.findall(r"w+(.?w+)*", string))
[' etc', ' well', ' same', ' wait', ' like', ' it']
它只返回部分单词,但实际上它应该返回所有单词、数字和下划线[如链接示例中]。
python版本:Python 3.6.2(默认,2017-07 17,16:44:45)
谢谢。
推荐答案
您需要使用非捕获组(请参阅here原因)并转义点(请参阅here哪些字符应该在正则表达式中转义):
>>> import re
>>> from pprint import pprint
>>> pattern = r"w+(?:.?w+)*"
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same.
... Oh wait, it also need to filter out the symbols like !@#$%^&*()-+=[]{}.,;:'"`| (`.`)/
... 
... I guess that's it."""
>>> pprint(re.findall(pattern, string, re.A))
['this', 'is', 'some', 'test', 'string', 'and', 'there', 'are', 'some', 'digits', 'as', 'well', 'that', 'need', 'to', 'be', 'captured', 'as', 'well', 'like', '1234567890', 'and', '321', 'etc', 'But', 'it', 'should', 'also', 'select', '_', 'as', 'well', 'I', 'm', 'pretty', 'sure', 'that', 'that', 'RE', 'does', 'exactly', 'the', 'same', 'Oh', 'wait', 'it', 'also', 'need', 'to', 'filter', 'out', 'the', 'symbols', 'like', 'I', 'guess', 'that', 's', 'it']
此外,若要仅匹配ASCII字母、数字和_,您必须传递re.A标志。
请参阅Python demo。
这篇关于拆分字符串和获取所有单词的Python正则表达式不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
				 沃梦达教程
				
			本文标题为:拆分字符串和获取所有单词的Python正则表达式不起作用
				
        
 
            
        基础教程推荐
             猜你喜欢
        
	     - 修改列表中的数据帧不起作用 2022-01-01
 - PermissionError: pip 从 8.1.1 升级到 8.1.2 2022-01-01
 - 在同一图形上绘制Bokeh的烛台和音量条 2022-01-01
 - 无法导入 Pytorch [WinError 126] 找不到指定的模块 2022-01-01
 - Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙? 2022-01-01
 - 求两个直方图的卷积 2022-01-01
 - 包装空间模型 2022-01-01
 - 在Python中从Azure BLOB存储中读取文件 2022-01-01
 - 使用大型矩阵时禁止 Pycharm 输出中的自动换行符 2022-01-01
 - PANDA VALUE_COUNTS包含GROUP BY之前的所有值 2022-01-01
 
    	
    	
    	
    	
    	
    	
    	
    	
				
				
				
				