這篇文章將為大家詳細(xì)講解有關(guān)Python如何爬取網(wǎng)站動(dòng)漫圖片,小編覺得挺實(shí)用的,因此分享給大家做個(gè)參考,希望大家閱讀完這篇文章后可以有所收獲。
網(wǎng)站制作、成都做網(wǎng)站的關(guān)注點(diǎn)不是能為您做些什么網(wǎng)站,而是怎么做網(wǎng)站,有沒有做好網(wǎng)站,給創(chuàng)新互聯(lián)建站一個(gè)展示的機(jī)會(huì)來證明自己,這并不會(huì)花費(fèi)您太多時(shí)間,或許會(huì)給您帶來新的靈感和驚喜。面向用戶友好,注重用戶體驗(yàn),一切以用戶為中心。
目標(biāo)網(wǎng)站 https://divnil.com
首先看看這網(wǎng)站是怎樣加載數(shù)據(jù)的;打開網(wǎng)站后發(fā)現(xiàn)底部有下一頁的按鈕,ok,爬這個(gè)網(wǎng)站就很簡單了;
我們目標(biāo)是獲取每張圖片的高清的源地址,并且下載圖片到桌面;先隨便打開一張圖片看看詳細(xì);emmm,只有一張圖
看起來還挺清晰的,單擊新窗口打開圖片
然后下載圖片,說實(shí)話,這圖片很小,我很擔(dān)心不是高清原圖(管他的);
1、先去主頁面獲取每個(gè)圖片的詳細(xì)頁面的鏈接
這鏈接還是比較好獲取的,直接 F12 審核元素,或者右鍵查看代碼,手機(jī)上chrome和firefox在url前面加上 "view-source"
比如: view-source:https://www.baidu.com/
2、從詳細(xì)頁面獲取圖片大圖地址
隨便打開一個(gè)圖片詳細(xì)頁面如圖:
接著只需要單擊網(wǎng)頁上的圖片就能定位到代碼了:
3、用大圖地址下載該圖片
這個(gè)很簡單,看代碼
先安裝 Requests 和 BeautifulSoup 庫
pip install requests bs4
導(dǎo)入庫
import requestsfrom bs4 import BeautifulSoupimport sys
請(qǐng)求獲取網(wǎng)頁源代碼
url = "https://divnil.com/wallpaper/iphone8/%E3%82%A2%E3%83%8B%E3%83%A1%E3%81%AE%E5%A3%81%E7%B4%99_2.html" headers = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0", } resp = requests.get(url, headers=headers) if resp.status_code != requests.codes.OK: print("Request Error, Code: %d"% resp.status_code) sys.exit()
然后解析出所有圖片的詳細(xì)地址
soup = BeautifulSoup(resp.text, "html.parser") contents = soup.findAll("div", id="contents")[0] wallpapers = contents.findAll("a", rel="wallpaper") links = [] for wallpaper in wallpapers: links.append(wallpaper['href'])
接著在詳細(xì)網(wǎng)頁里獲取那個(gè)看似高清的圖片的不確定是否為真實(shí)圖片鏈接并下載(/滑稽)
import os head = "https://divnil.com/wallpaper/iphone8/" if os.path.exists("./Divnil") != True: os.mkdir("./Divnil") for url in links: url = head + url resp = requests.get(url, headers=headers) if resp.status_code != requests.codes.OK: print("URL: %s REQUESTS ERROR. CODE: %d" % (url, resp.status_code)) continue soup = BeautifulSoup(resp.text, "html.parser") img = soup.find("div", id="contents").contents.find("img", id="main_content") img_url = head + img['"original'].replace("../", "") img_name = img['alt'] print("start download %s ..." % img_url) resp = requests.get(img_url, headers=headers) if resp.status_code != requests.codes.OK: print("IMAGE %s DOWNLOAD FAILED." % img_name) with open("./Divnil/" + img_name + ".jpg", "wb") as f: f.write(resp.content)
完成,貼上所有代碼
import requests from bs4 import BeautifulSoup import sys import os class Divnil: def __init__(self): self.url = "https://divnil.com/wallpaper/iphone8/%E3%82%A2%E3%83%8B%E3%83%A1%E3%81%AE%E5%A3%81%E7%B4%99.html" self.head = "https://divnil.com/wallpaper/iphone8/" self.headers = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0", } def getImageInfoUrl(self): resp = requests.get(self.url, headers=self.headers) if resp.status_code != requests.codes.OK: print("Request Error, Code: %d"% resp.status_code) sys.exit() soup = BeautifulSoup(resp.text, "html.parser") contents = soup.find("div", id="contents") wallpapers = contents.findAll("a", rel="wallpaper") self.links = [] for wallpaper in wallpapers: self.links.append(wallpaper['href']) def downloadImage(self): if os.path.exists("./Divnil") != True: os.mkdir("./Divnil") for url in self.links: url = self.head + url resp = requests.get(url, headers=self.headers) if resp.status_code != requests.codes.OK: print("URL: %s REQUESTS ERROR. CODE: %d" % (url, resp.status_code)) continue soup = BeautifulSoup(resp.text, "html.parser") img = soup.find("div", id="contents").find("img", id="main_content") img_url = self.head + img['original'].replace("../", "") img_name = img['alt'] print("start download %s ..." % img_url) resp = requests.get(img_url, headers=self.headers) if resp.status_code != requests.codes.OK: print("IMAGE %s DOWNLOAD FAILED." % img_name) continue if '/' in img_name: img_name = img_name.split('/')[1] with open("./Divnil/" + img_name + ".jpg", "wb") as f: f.write(resp.content) def main(self): self.getImageInfoUrl() self.downloadImage() if __name__ == "__main__": divnil = Divnil() divnil.main()
關(guān)于“Python如何爬取網(wǎng)站動(dòng)漫圖片”這篇文章就分享到這里了,希望以上內(nèi)容可以對(duì)大家有一定的幫助,使各位可以學(xué)到更多知識(shí),如果覺得文章不錯(cuò),請(qǐng)把它分享出去讓更多的人看到。