注意Python中strptime的效率问题

Python中datetime,time等类型都有strptime方法,将时间字符串根据格式解析成相应的对象。很多时候我们的需求只是解析”%Y-%m-%d %H:%M:%S”格式的字符串,而strptime会根据locale作相应不同的处理,增加了不必要的复杂度,在某些场合成为了性能瓶颈。

在python的mail-list上早有人提出这个问题,里面提到使用正则表达式解析。

遂动手测试下:

#!/usr/bin/env python
import datetime
import os
import re
import timeit, cProfile

def strptime():
    with open('time.txt', 'r') as f:
        for line in f:
            line = line.rstrip(os.linesep)
            dt = datetime.datetime.strptime(line, "%Y-%m-%d %H:%M:%S")


def reg():
    rep = re.compile(r'(\d{4})-(\d{2})-(\d{2})\s(\d{2}):(\d{2}):(\d{2})')
    with open('time.txt', 'r') as f:
        for line in f:
            line = line.rstrip(os.linesep)
            m = rep.match(line)
            dt = datetime.datetime(int(m.group(1)),
             int(m.group(2)),
             int(m.group(3)),
             int(m.group(4)),
             int(m.group(5)),
             int(m.group(6))
             )

if __name__ == '__main__':

    t1 = timeit.Timer("reg()","from __main__ import reg")
    t2 = timeit.Timer("strptime()", "from __main__ import strptime")

    cProfile.run("t1.timeit(3);print")
    print""
    cProfile.run("t2.timeit(3);print")

在开发机上得到如下结果:

         72741 function calls (72698 primitive calls) in 1.119 CPU seconds

         219657 function calls (219510 primitive calls) in 3.176 CPU seconds

可见提升还是很大的。

2011-10-17 EDIT: 有童鞋在评论提出另一种方法绕过语言检查,提高strptime速度:

import _strptime
_strptime._getlang = lambda: (None, None)

测试结果:

         138657 function calls (138510 primitive calls) in 2.161 CPU seconds

比起未优化的有1/3提升,但仍是正则优化的两倍耗时,供大家参考