1.寻找网络请求
从相应可以看到,下面该请求会返回当前页面所有数据,并且不同页面的请求有一定的规律
2.使用urllib库获取页面数据
baseurl = 'https://sc.chinaz.com/tupian/paxingdongwutupian'
url = ''
if i>1:
url = baseurl + "_" + str(i) + ".html"
elif i==1:
url = baseurl + ".html"
response = urllib.request.urlopen(url)
res = response.read().decode('utf-8')
3.使用xpath解析出图片地址列表
tree = etree.HTML(res)
list = tree.xpath('//img[@class = "lazy"]/@data-original')
4.下载图片
urllib.request.urlretrieve(tu_url,'./pictures3/pic'+str(a)+'.jpg')
5.完整代码如下
import urllib.request
from lxml import etree
if __name__ == '__main__':
a = 1
def download(url):
global a
# print(url)
response = urllib.request.urlopen(url)
res = response.read().decode('utf-8')
tree = etree.HTML(res)
list = tree.xpath('//img[@class = "lazy"]/@data-original')
for i in list:
tu_url = "http:"+i
# print(tu_url)
urllib.request.urlretrieve(tu_url,'./pictures3/pic'+str(a)+'.jpg')
a += 1
# print(a)
# print(res)
baseurl = 'https://sc.chinaz.com/tupian/paxingdongwutupian'
start = input("你想从哪一页开始")
end = input("你想从查到哪页")
for i in range(int(start),int(end)+1):
url = ''
if i>1:
url = baseurl + "_" + str(i) + ".html"
elif i==1:
url = baseurl + ".html"
download(url)
i+=1
本站资源均来自互联网,仅供研究学习,禁止违法使用和商用,产生法律纠纷本站概不负责!如果侵犯了您的权益请与我们联系!
转载请注明出处: 免费源码网-免费的源码资源网站 » Python爬取站长素材图片【爬虫学习day.01】
发表评论 取消回复