Scrapy， xpath 解析求助 - V2EX

首页注册登录

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 3646 天前的主题，其中的信息可能已经有所发展或是发生改变。

我想爬取某一个网页，一个div下的每一条a，但是第一条是标题，而且和剩下的结构不同，会造成如下错误：
我的想法是爬取的内容应该为：
{“省”：["a","b","c"],“市”:["d","e","f"],“区”：["g","h","i"]},但会变成：
{"省"：["a","b","c"],“市”:["d","e","f"],“区”：["地区"，"g"，"h"]
应该怎么办，我如何从第二条开始爬取。我本想在定义sites时改为 //div/a[2], 但是不成功。
scrapy新手求助！！！

6 条回复 • 2015-04-07 17:57:14 +08:00

1

Septembers

2015-04-07 16:33:48 +08:00 via Android

没样本这不是扯淡么？

2

willdatascience

OP

2015-04-07 16:36:11 +08:00

@Septembers 额。要是能截图我就发html了。。

3

Septembers

2015-04-07 16:53:40 +08:00 via Android

@willdatascience gist

4

aaaa007cn

2015-04-07 16:55:05 +08:00

//div/a[position()>1]
//div/a/following-sibling::a

5

zjuster

2015-04-07 17:17:41 +08:00

//div/a[2] 是只抽取第二个a结点，试试/a[position()>1]，

常用的xpath配置到w3school看看，都有。

6

oseau

2015-04-07 17:57:14 +08:00

http://zvon.org/comp/r/tut-XPath_1.html

关于 · 帮助文档 · 博客 · API · FAQ · 实用小工具 · 1069 人在线 最高记录 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 21ms · UTC 18:26 · PVG 02:26 · LAX 11:26 · JFK 14:26
Developed with CodeLauncher
♥ Do have faith in what you're doing.