scrapy爬取名人名言

13 阅读 0 评论 0 点赞

爬取名人名言：http://quotes.toscrape.com/

1 创建爬虫项目，在终端中输入：

scrapy startproject quotes

在这里插入图片描述

2 创建之后，在spiders文件夹下面创建爬虫文件quotes.py，内容如下：

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor


class Quotes(CrawlSpider):
    name = "quotes"
    allowed_domains = ["quotes.toscrape.com"]
    start_urls = ['http://quotes.toscrape.com/']

    rules = (
        Rule(LinkExtractor(allow='/page/\d+'), callback='parse_quotes', follow=True),
        Rule(LinkExtractor(allow='/author/\w+'), callback='parse_author')
    )

    def parse_quotes(self, response):
        for quote in response.css('quote'):
            yield {
                'content': quote.css('.text::text').extract_first(),
                'author': quote.css('.author::text').extract_first(),
                'tags': quote.css('.tag::text').extract_first()
            }

    def parse_author(selfself, response):
        name = response.css('.author-title::text').extract_first()
        author_born_date = response.css('.author-born-date::text').extract_first()
        author_born_location = response.css('.author-born-location::text').extract_first()
        author_description = response.css('.author-description::text').extract_first()
        return ({
            'name': name,
            'author_born_date': author_born_date,
            'author_born_location': author_born_location,
            'author_description': author_description
        })

目录结构如下：
在这里插入图片描述

3 运行爬虫

在终端中执行scrapy crawl quotes，结果如图所示：
在这里插入图片描述
到此，一个简单的爬虫就完成了。

本站资源均来自互联网，仅供研究学习，禁止违法使用和商用，产生法律纠纷本站概不负责！如果侵犯了您的权益请与我们联系！

转载请注明出处：免费源码网-免费的源码资源网站 » scrapy爬取名人名言

点赞(0) 打赏

本文分类：文章资讯
本文标签：scrapy爬取名人名言
浏览次数：13 次浏览
本文链接：https://freeymw.com/article/35154.html

上一篇 > wordpress ripro-v5-8.3开心版主题源码
下一篇 > Hadoop---MapReduce(2)

评论列表共有 0 条评论

暂无评论

scrapy爬取名人名言

1 创建爬虫项目，在终端中输入：

2 创建之后，在spiders文件夹下面创建爬虫文件quotes.py，内容如下：

3 运行爬虫

评论列表 共有 0 条评论

发表评论 取消回复

评论列表共有 0 条评论

发表评论取消回复