Python如何爬取網(wǎng)站動(dòng)漫圖片

這篇文章將為大家詳細(xì)講解有關(guān)Python如何爬取網(wǎng)站動(dòng)漫圖片，小編覺得挺實(shí)用的，因此分享給大家做個(gè)參考，希望大家閱讀完這篇文章后可以有所收獲。

網(wǎng)站制作、成都做網(wǎng)站的關(guān)注點(diǎn)不是能為您做些什么網(wǎng)站，而是怎么做網(wǎng)站，有沒有做好網(wǎng)站，給創(chuàng)新互聯(lián)建站一個(gè)展示的機(jī)會(huì)來證明自己，這并不會(huì)花費(fèi)您太多時(shí)間，或許會(huì)給您帶來新的靈感和驚喜。面向用戶友好，注重用戶體驗(yàn)，一切以用戶為中心。

正文

目標(biāo)網(wǎng)站 https://divnil.com

首先看看這網(wǎng)站是怎樣加載數(shù)據(jù)的;打開網(wǎng)站后發(fā)現(xiàn)底部有下一頁的按鈕，ok，爬這個(gè)網(wǎng)站就很簡單了;

我們目標(biāo)是獲取每張圖片的高清的源地址，并且下載圖片到桌面;先隨便打開一張圖片看看詳細(xì);emmm，只有一張圖

看起來還挺清晰的，單擊新窗口打開圖片

Python如何爬取網(wǎng)站動(dòng)漫圖片

然后下載圖片，說實(shí)話，這圖片很小，我很擔(dān)心不是高清原圖(管他的);

接著分析我們從何入手

1、先去主頁面獲取每個(gè)圖片的詳細(xì)頁面的鏈接

這鏈接還是比較好獲取的，直接 F12 審核元素，或者右鍵查看代碼，手機(jī)上chrome和firefox在url前面加上 "view-source"

比如： view-source:https://www.baidu.com/

Python如何爬取網(wǎng)站動(dòng)漫圖片

2、從詳細(xì)頁面獲取圖片大圖地址
隨便打開一個(gè)圖片詳細(xì)頁面如圖：

Python如何爬取網(wǎng)站動(dòng)漫圖片

接著只需要單擊網(wǎng)頁上的圖片就能定位到代碼了：

3、用大圖地址下載該圖片

這個(gè)很簡單，看代碼

先安裝 Requests 和 BeautifulSoup 庫

pip install requests bs4

導(dǎo)入庫

import requestsfrom bs4 import BeautifulSoupimport sys

請(qǐng)求獲取網(wǎng)頁源代碼

url = "https://divnil.com/wallpaper/iphone8/%E3%82%A2%E3%83%8B%E3%83%A1%E3%81%AE%E5%A3%81%E7%B4%99_2.html"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0",
}
resp = requests.get(url, headers=headers)
if resp.status_code != requests.codes.OK:
print("Request Error, Code: %d"% resp.status_code)
sys.exit()

然后解析出所有圖片的詳細(xì)地址

soup = BeautifulSoup(resp.text, "html.parser")
contents = soup.findAll("div", id="contents")[0]
wallpapers = contents.findAll("a", rel="wallpaper")
links = []
for wallpaper in wallpapers:
 links.append(wallpaper['href'])

接著在詳細(xì)網(wǎng)頁里獲取那個(gè)看似高清的圖片的不確定是否為真實(shí)圖片鏈接并下載(/滑稽)

import os

head = "https://divnil.com/wallpaper/iphone8/"
if os.path.exists("./Divnil") != True:
 os.mkdir("./Divnil")

for url in links:
 url = head + url
 resp = requests.get(url, headers=headers)
 if  resp.status_code != requests.codes.OK:
   print("URL: %s REQUESTS ERROR. CODE: %d" % (url, resp.status_code))
   continue
 soup = BeautifulSoup(resp.text, "html.parser")
 img =  soup.find("div", id="contents").contents.find("img", id="main_content")
 img_url = head + img['"original'].replace("../", "")
 img_name = img['alt']
 print("start download %s ..." % img_url)

 resp = requests.get(img_url, headers=headers)
 if resp.status_code != requests.codes.OK:
   print("IMAGE %s DOWNLOAD FAILED." % img_name)

 with open("./Divnil/" + img_name + ".jpg", "wb") as f:
   f.write(resp.content)

完成，貼上所有代碼

import requests
from bs4 import BeautifulSoup
import sys
import os


class Divnil:

   def __init__(self):
       self.url = "https://divnil.com/wallpaper/iphone8/%E3%82%A2%E3%83%8B%E3%83%A1%E3%81%AE%E5%A3%81%E7%B4%99.html"
       self.head = "https://divnil.com/wallpaper/iphone8/"
       self.headers = {
           "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0",
       }
   

   def getImageInfoUrl(self):

       resp = requests.get(self.url, headers=self.headers)
       if resp.status_code != requests.codes.OK:
           print("Request Error, Code: %d"% resp.status_code)
           sys.exit()

       soup = BeautifulSoup(resp.text, "html.parser")
       
       contents = soup.find("div", id="contents")
       wallpapers = contents.findAll("a", rel="wallpaper")
       
       self.links = []
       for wallpaper in wallpapers:
           self.links.append(wallpaper['href'])

   
   def downloadImage(self):

       if os.path.exists("./Divnil") != True:
           os.mkdir("./Divnil")

       for url in self.links:
           
           url = self.head + url
           
           resp = requests.get(url, headers=self.headers)
           if  resp.status_code != requests.codes.OK:
               print("URL: %s REQUESTS ERROR. CODE: %d" % (url, resp.status_code))
               continue
           
           soup = BeautifulSoup(resp.text, "html.parser")
           
           img = soup.find("div", id="contents").find("img", id="main_content")
           img_url = self.head + img['original'].replace("../", "")
           img_name = img['alt']
           
           print("start download %s ..." % img_url)

           resp = requests.get(img_url, headers=self.headers)
           if resp.status_code != requests.codes.OK:
               print("IMAGE %s DOWNLOAD FAILED." % img_name)
               continue

           if '/' in img_name:
               img_name = img_name.split('/')[1]

           with open("./Divnil/" + img_name + ".jpg", "wb") as f:
               f.write(resp.content)


   def main(self):
       self.getImageInfoUrl()
       self.downloadImage()


if __name__ == "__main__":
   divnil = Divnil()
   divnil.main()

關(guān)于“Python如何爬取網(wǎng)站動(dòng)漫圖片”這篇文章就分享到這里了，希望以上內(nèi)容可以對(duì)大家有一定的幫助，使各位可以學(xué)到更多知識(shí)，如果覺得文章不錯(cuò)，請(qǐng)把它分享出去讓更多的人看到。

文章名稱：Python如何爬取網(wǎng)站動(dòng)漫圖片
URL分享：http://weahome.cn/article/ihsppj.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

Python如何爬取網(wǎng)站動(dòng)漫圖片

正文

接著分析我們從何入手

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管