LOC的大佬们分享了一个网站,收集了很多岛国的妹子图和她们的推特
地址:岛国妹子推特
推特不是很感兴趣,就爬一下图片好了~
爬虫介绍
爬虫环境:
- Python2.7.9 可更替为3,自行更替
- BeautifulSoup4
- requests
代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| from bs4 import BeautifulSoup import requests import urllib2 import random
def spy(url): req = urllib2.Request(url) req = urllib2.urlopen(req) page = req.read() soup = BeautifulSoup(page, "html.parser") for imgSoup in soup.find_all('div', {"class": "row"}): for i in imgSoup.find_all('div', {'class': 'photo'}): for j in i.find('div', {'class': 'photo-link-outer'}).find('a').find_all('img'): img = j.get("src") print img str = random.sample('zyxwvutsrqponmlkjihgfedcba', 6) downImg(img, str) nexturl = soup.find('p',{'class':'go-to-next-page'}) nexturl = nexturl.find('a').get('href') pageurl = "http://jigadori.fkoji.com"+nexturl spy(pageurl)
def downImg(img,m): try: r = requests.get(img) except Exception , e: print "图片获取失败" return with open('./img/good%s.jpg' % m, 'wb') as f: f.write(r.content) if __name__ == '__main__': url = "http://jigadori.fkoji.com" spy(url)
|
整体思路
看一下,网页构造,发现首页底部有下一页标签,BeautifulSoup取Class取值递归获取下一页地址
图片同上
整体难度不高,有兴趣的可以拿这个网站练练手~
演示截图
最后更新时间:
承接各类外包私活,有意邮箱联系 killnetsec#gmail.com~