Python中datetime,time等类型都有strptime方法,将时间字符串根据格式解析成相应的对象。很多时候我们的需求只是解析”%Y-%m-%d %H:%M:%S”格式的字符串,而strptime会根据locale作相应不同的处理,增加了不必要的复杂度,在某些场合成为了性能瓶颈。
在python的mail-list上早有人提出这个问题,里面提到使用正则表达式解析。
遂动手测试下:
#!/usr/bin/env python import datetime import os import re import timeit, cProfile def strptime(): with open('time.txt', 'r') as f: for line in f: line = line.rstrip(os.linesep) dt = datetime.datetime.strptime(line, "%Y-%m-%d %H:%M:%S") def reg(): rep = re.compile(r'(\d{4})-(\d{2})-(\d{2})\s(\d{2}):(\d{2}):(\d{2})') with open('time.txt', 'r') as f: for line in f: line = line.rstrip(os.linesep) m = rep.match(line) dt = datetime.datetime(int(m.group(1)), int(m.group(2)), int(m.group(3)), int(m.group(4)), int(m.group(5)), int(m.group(6)) ) if __name__ == '__main__': t1 = timeit.Timer("reg()","from __main__ import reg") t2 = timeit.Timer("strptime()", "from __main__ import strptime") cProfile.run("t1.timeit(3);print") print"" cProfile.run("t2.timeit(3);print")
在开发机上得到如下结果:
72741 function calls (72698 primitive calls) in 1.119 CPU seconds 219657 function calls (219510 primitive calls) in 3.176 CPU seconds
可见提升还是很大的。
2011-10-17 EDIT: 有童鞋在评论提出另一种方法绕过语言检查,提高strptime速度:
import _strptime _strptime._getlang = lambda: (None, None)
测试结果:
138657 function calls (138510 primitive calls) in 2.161 CPU seconds
比起未优化的有1/3提升,但仍是正则优化的两倍耗时,供大家参考