1.寻找网络请求

从相应可以看到,下面该请求会返回当前页面所有数据,并且不同页面的请求有一定的规律


2.使用urllib库获取页面数据 

 baseurl = 'https://sc.chinaz.com/tupian/paxingdongwutupian'
 url = ''
 if i>1:
   url = baseurl + "_" + str(i) + ".html"
 elif i==1:
   url = baseurl + ".html"
 response = urllib.request.urlopen(url)
 res = response.read().decode('utf-8')

3.使用xpath解析出图片地址列表

tree = etree.HTML(res)
list = tree.xpath('//img[@class = "lazy"]/@data-original')

4.下载图片

urllib.request.urlretrieve(tu_url,'./pictures3/pic'+str(a)+'.jpg')

5.完整代码如下

import urllib.request
from lxml import etree

if __name__ == '__main__':
    a = 1
    def download(url):
        global a
        # print(url)
        response = urllib.request.urlopen(url)
        res = response.read().decode('utf-8')
        tree = etree.HTML(res)
        list = tree.xpath('//img[@class = "lazy"]/@data-original')
        for i in list:
            tu_url = "http:"+i
            # print(tu_url)
            urllib.request.urlretrieve(tu_url,'./pictures3/pic'+str(a)+'.jpg')
            a += 1
            # print(a)
           # print(res)
    baseurl = 'https://sc.chinaz.com/tupian/paxingdongwutupian'
    start = input("你想从哪一页开始")
    end = input("你想从查到哪页")
    for i in range(int(start),int(end)+1):
        url = ''
        if i>1:
            url = baseurl + "_" + str(i) + ".html"
        elif i==1:
            url = baseurl + ".html"
        download(url)
        i+=1

点赞(0) 打赏

评论列表 共有 0 条评论

暂无评论

微信公众账号

微信扫一扫加关注

发表
评论
返回
顶部