使用 Python 处理 pcap 文件 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
wwttc

使用 Python 处理 pcap 文件

  •  
  •   wwttc Sep 14, 2014 21000 views
    This topic created in 4243 days ago, the information mentioned may be changed or developed.
    我从CAIDA下载了一个.pcap文件。想读取其中的内容。

    1.使用scapy
    >>> from scapy.all import *
    >>> pkts = rdpcap("equinix-chicago.dirA.20140320-125911.UTC.anon.pcap")
    然后就不断跳出以下的错误
    WARNING: bad ihl (0). Assuming ihl=5
    WARNING: bad ihl (0). Assuming ihl=5
    。。。


    2.使用dpkt读取
    >>> f = file("equinix-chicago.dirA.20140320-125911.UTC.anon.pcap")
    >>> pcap = dpkt.pcap.Reader(f)
    >>> for ts, buf in pcap:
    >>> eth = dpkt.ethernet.Ethernet(buf)
    >>> print eth
    输出的结果都是乱码的:
    E(uy@94?Q"]X??P?6d??q?4Pl??
    E?G?7?2??:JV>[L{a?_?
    E<@:Q??ac??$??PGT?#????
    /9?
    EGb?@|'^-?&C?LG^CY???K?~D?ub?????D??;?h?
    D?J???U?

    请问应该怎样才能正确读取pcap文件呢?
    13 replies    2018-03-05 22:11:42 +08:00
    izoabr
        1
    izoabr  
       Sep 14, 2014
    type(eth)

    dir(eth)
    pimin
        2
    pimin  
       Sep 14, 2014
    抓包内容又不都是字符串,你这样打印肯定是乱码啊
    %x打印hex
    em70
        3
    em70  
       Sep 14, 2014
    要先去查这种格式的协议是怎样的,读二进制来分析,python这方便比较弱,这时候C的优势就出来了
    wwttc
        4
    wwttc  
    OP
       Sep 14, 2014
    @izoabr
    <class 'dpkt.ethernet.Ethernet'>

    ['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__hash__', '__hdr__', '__hdr_defaults__', '__hdr_fields__', '__hdr_fmt__', '__hdr_len__', '__init__', '__len__', '__metaclass__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_typesw', '_unpack_data', 'data', 'dst', 'get_type', 'pack', 'pack_hdr', 'set_type', 'src', 'type', 'unpack']
    izoabr
        5
    izoabr  
       Sep 14, 2014
    我倒觉得这个时候python的包很管用,dpkt直接就解析了,eth.data就出来了就
    wwttc
        6
    wwttc  
    OP
       Sep 14, 2014
    @pimin
    >>> for ts, buf in pcap:
    ... eth = dpkt.ethernet.Ethernet(buf)
    ... print '%x' % eth
    ...
    Traceback (most recent call last):
    File "<stdin>", line 3, in <module>
    TypeError: %x format: a number is required, not Ethernet
    wwttc
        7
    wwttc  
    OP
       Sep 14, 2014
    @izoabr
    print eth.data 结果都是乱码的
    allenforrest
        8
    allenforrest  
       Sep 15, 2014
    @wwttc eth.data 里面都是二进制数据,直接 print 是没有意义的,建议用 binascii.b2a_hex 转换一下,再 print
    s51431980
        9
    s51431980  
       Sep 15, 2014
    pcap是有固定格式的二进制文件,前24byte是文件头,之后是一个接一个package,每个package前16字节是package header,之后是package数据,从ip头、tcp头直到应用层数据。搜索pcap文件格式,可以找到相关文章。

    我用ruby写过计算没时间间隔数据浏览统计的脚本,仅供参考 https://gist.github.com/t09def/ee79369fe4593d7491ac
    wwttc
        10
    wwttc  
    OP
       Sep 15, 2014
    @allenforrest 转换了那就打印出了一堆十六进制数,还是不能够读取
    kapoyegou
        11
    kapoyegou  
       Sep 8, 2015
    虽然已经是一年多以前的文章了,但为给后来人留个参考。我是刚刚也碰到了这种情况,似乎只是单纯的是包的问题,能转成字符串的它就会转,转不了的又不以十六进制数打印,就会出乱码
    deepurple
        12
    deepurple  
       Dec 26, 2015
    def mac_addr(mac_string):
    """Print out MAC address given a string

    Args:
    mac_string: the string representation of a MAC address
    Returns:
    printable MAC address
    """
    return ':'.join('%02x' % ord(b) for b in mac_string)


    dpkt.ethernet.Ethernet(f) 生成的是 Ethernet 对象,不能直接 print ,需要分别取出对应字段,对于 MAC 地址和 IP 地址这些,还需要转换一下打印格式

    具体参考:
    https://dpkt.readthedocs.org/en/latest/_modules/examples/print_packets.html#print_packets
    http://stackoverflow.com/questions/21015447/how-to-parse-the-header-files-of-the-pcap-file
    codeyou
        13
    codeyou  
       Mar 5, 2018
    我直接用 scapy 的 sniff 嗅探数据包,将返回的数据包用 str 方法转换成字符串,结果打印是乱码?有人知道怎么解决吗?
    About     Help     Advertise     Blog     API     FAQ     Solana     1848 Online   Highest 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 62ms UTC 16:16 PVG 00:16 LAX 09:16 JFK 12:16
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86