##目标

爬取美图天空的图片,并自动下载至本地

##环境

Python3.4+BeautifulSoup

##步骤

  1. 首先打开http://www.tootk.net/tupian/bizhi/分析网页内容 QQ20150718-2@2x

  2. class="w170img"为每一条内容

  3. 里面的img内的srcalt分别为图片链接和名字

  4. 有了这两样就可以把图片下载下来了

##代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#   语言:Python 3.4

import urllib.request
import bs4,os

page_sum = 10 #设置下载页数

path = os.getcwd()
path = os.path.join(path,'images')
if not os.path.exists(path):
os.mkdir(path) #创建文件夹
url = "http://www.tootk.net" #url地址
headers = { #伪装浏览器
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)'
' Chrome/32.0.1700.76 Safari/537.36'
}

for count in range(page_sum):
req = urllib.request.Request(
url = url+"/tupian/bizhi/list_13_"+str(count+1)+".html",
headers = headers
)#url拼接
print(req.full_url)
content = urllib.request.urlopen(req).read()

soup = bs4.BeautifulSoup(content,"html5lib") # BeautifulSoup 获取网页内容

liResult = (soup.find_all("div", attrs={"class": "w170img"}))

for content in liResult:
image = content.img#遍历img

lplink = image.get('src')#获取图片链接
title = image.get('alt')#获取图片名称
link = url+lplink.replace("-lp","")#转换高清图片链接
filename = path + os.sep + title + ".jpg"#本地存储目录

print(link)

urllib.request.urlretrieve(link,filename) #存储

参考链接: