新闻联播，各类新闻财经网站以及发改委网站爬虫

hailong0707 · 2015-09-04T01:54:22Z

开发环境 Python 2.7 Mysql Scrapy 0.24 (当前 Scrapy 已经有官方 1.0 版本，可以正常运行，会有 warning ) BeautifulSoup 4.x 爬虫 spider_news_cctv 2002 年至今的所有新闻联播内容，总计 4W 多 https://github.com/hailong0707/spider_news_cctv spider_news_all 证券日报，证券时报，证券日报网，南华早报，中国经营网，经济观察报，财经网，证券时报网，中证网，华尔街见闻相关模块的整站爬取。证券日报，证券时报数据量可观，近 70W. https://github.com/hailong0707/spider_news_all spider_news_gov 中国发展改革委员会内容爬取已经文档下载，包含对 Word 文档的内容解析。 https://github.com/hailong0707/spider_news_gov spider_news_finance SinaFinance, FTChinese, CFI 三个财经类网站的数据爬取 https://github.com/hailong0707/spider_news_finance 免责提供工具的参考学习，数据的用途等其他相关风险，自负。

爱意满满的作品展示区。

This topic created in 3919 days ago, the information mentioned may be changed or developed.

开发环境

Python 2.7
Mysql
Scrapy 0.24 (当前 Scrapy 已经有官方 1.0 版本，可以正常运行，会有 warning )
BeautifulSoup 4.x

爬虫

spider_news_cctv
2002 年至今的所有新闻联播内容，总计 4W 多
https://github.com/hailong0707/spider_news_cctv
spider_news_all
证券日报，证券时报，证券日报网，南华早报，中国经营网，经济观察报，财经网，证券时报网，中证网，华尔街见闻相关模块的整站爬取。证券日报，证券时报数据量可观，近 70W.
https://github.com/hailong0707/spider_news_all
spider_news_gov
中国发展改革委员会内容爬取已经文档下载，包含对 Word 文档的内容解析。
https://github.com/hailong0707/spider_news_gov
spider_news_finance
SinaFinance, FTChinese, CFI 三个财经类网站的数据爬取
https://github.com/hailong0707/spider_news_finance