一个用 heapy 进行 Python 程序内存调试的问题

(Pdb) h0 = hp.heap() (Pdb) h0 Partition of a set of 4254865 objects. Total size = 700955416 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 646764 15 427504800 61 427504800 61 dict (no owner) 1 3073581 72 210210328 30 637715128 91 unicode 2 120775 3 17186360 2 654901488 93 list 3 71772 2 10795432 2 665696920 95 str 4 37269 1 10594328 2 676291248 96 _sre.SRE_Pattern 5 124398 3 10406784 1 686698032 98 tuple 6 721 0 2114200 0 688812232 98 dict of module 7 83485 2 2003640 0 690815872 99 int 8 11706 0 1498368 0 692314240 99 types.CodeType 9 11157 0 1338840 0 693653080 99 function <653 more rows. Type e.g. '_.more' to view.>

可以看到，dict 占用了大部分内存。然而这道这点儿当然不够，我还要知道“哪些字典”大，比如，哪个模块里创建的，怎么引用到它，之类问题。

难处就在于，这个情景下，嵌套的字典非常多，给一段代码模拟这个场景：

def rand(): return random.random() def rand_int(i): return random.randint(0, i) def rdt(max_depth, max_width): r = {} if max_depth <= 1: for i in range(rand_int(max_width)): r[rand()] = rand() else: for i in range(rand_int(max_width)): r[rand()] = rdt(rand_int(max_depth) - 1, max_width) return r t0 = rdt(9, 9) t1 = rdt(7, 14) t2 = rdt(5, 19) t3 = rdt(3, 24)

现在，请问我直接创建的四个 t，哪个大？

“大”当然不是指那一个 PyObject 占用的空间大就行了，而是所有直接间接 referents dicts 都加在一起，才有意义。

所以类似这样的解答不行：

>>> (h0[0] - h0[0].referents).byvia Partition of a set of 5 objects. Total size = 2168 bytes. Index Count % Size % Cumulative % Referred Via: 0 1 20 1048 48 1048 48 "['t4']" 1 1 20 280 13 1328 61 "['t0']" 2 1 20 280 13 1608 74 "['t1']" 3 1 20 280 13 1888 87 "['t2']" 4 1 20 280 13 2168 100 "['t3']"

各位牛人，请教了

3 条回复 2017-08-03 15:46:26 +08:00

v166ex

2017-08-02 16:43:43 +08:00

这个问题我只能参观了！

NoAnLove

2017-08-02 23:03:32 +08:00

以前从来没有遇到过这种问题。提供一个思路，你需要 Pympler，每个变量创建前后调用 tracker.SummaryTracker，然后根据变化量就能知道内存占用了。

当然，更直观的方法就是 asizeof

```
>>> from pympler import asizeof
>>> asizeof.asizeof(t0)
4880
```

nthhdy

2017-08-03 15:46:26 +08:00

多谢 pympler 这个工具推荐
我搜 python memory profile and debug，为什么没搜出这个工具呢，只看到了 line profiler，heapy 啥的

这样搞是一个思路了，很笨的一个思路呵呵，很慢，跑十几分钟才跑出十分之一：

```python
from pympler.asizeof import asizeof

h0 = hp.heap()

ds = h0[0]

# 获取"根"dict set
root_ds = (ds - ds.referrers).byid

# 看一看每个“根 dict ”的 recursive size
sizes = []
for i in range(len(root_ds)):
info = i, asizeof(root_ds[i])
sizes.append(info)

# check size
```