如何用 beatuful soup 做一个抓取小说分页阅读，然后整合在一起？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

For Existing Member Sign In

推荐学习书目

Learn Python the Hard Way

Python Sites

PyPI - Python Package Index

http://diveintopython.org/toc/index.html

Pocoo

值得关注的项目

PyPy

Celery

Jinja2

Read the Docs

gevent

pyenv

virtualenv

Sentry

Shovel

Pyflakes

pytest

Python 编程

pep8 Checker

Styles

PEP 8

Google Python Style Guide

Code Style from The Hitchhiker's Guide

This topic created in 3890 days ago, the information mentioned may be changed or developed.

我是想用查找下一页，然后 request 下一页地址然后循环直到没有下一页了。将标题和文章输出成 text ，但是 find_all 找不到下一页 string ，求解

7 replies 2015-09-16 08:12:57 +08:00

shoaly

span class="ago" title="2015-09-03 12:21:41 +08:00">Sep 3, 2015

给你一个建议, 抛弃 beautiful soup, 转战 pyquery, 一个 python 下面的 Jquery, 专治各种 dom 的处理

byfsdhr

Sep 3, 2015

我去

byfsdhr

Sep 3, 2015

@shoaly 你看 scrapy 怎么样？

29488503878

Sep 3, 2015

采集还是用 scrapy 框架比较好，解析 html 用 xpath 很省力

shoaly

Sep 4, 2015

@byfsdhr scrapy 是负责采集本身, 但是我个人不是太喜欢 xpath 的定位方式, 更喜欢 pyQuery, 因为 jquery 用太熟悉了哈

bbking

Sep 4, 2015

你看看页面是怎么生成的，有可能页面 url 就是 page=1,2...，如果是 js 生成的，那就用 js 的方法处理咯

beviniy

Sep 16, 2015

一直用 lxml 模块的 xpath 解析很好用