
1 ferstar Aug 10, 2017 刚好手上有个类似的数据集,唯一与楼主不同的是每一行是一个[100, 150]的整数,我是这样统计的: --- ```python from collections import Counter import pandas as pd size = 2 ** 10 counter = Counter() for chunk in pd.read_csv('file.csv', header=None, chunksize=size): counter.update([i[0] for i in chunk.values]) print(counter) ``` --- 大概输出如下: ``` Counter({100: 41, 101: 40, 102: 40, ... 150: 35}) ``` |
2 caomaocao Aug 10, 2017 Counter() 或者 Mapreduce 的思想做哦~ |
3 chuanqirenwu Aug 10, 2017 dask 一行搞定。 dd.groupby().count(),和 pandas 一样的 API,但是把 fill in memory 拓展到 fill in disk。 |
4 zhusimaji Aug 10, 2017 via iPhone Counter 可以试试,有分布式观景首选 mapreduce |
5 zhusimaji Aug 10, 2017 via iPhone 分布式环境 |