這篇文章主要介紹了Python怎么爬取漫畫圖片,具有一定借鑒價(jià)值,感興趣的朋友可以參考下,希望大家閱讀完這篇文章之后大有收獲,下面讓小編帶著大家一起了解一下。
在成都網(wǎng)站設(shè)計(jì)、成都做網(wǎng)站中從網(wǎng)站色彩、結(jié)構(gòu)布局、欄目設(shè)置、關(guān)鍵詞群組等細(xì)微處著手,突出企業(yè)的產(chǎn)品/服務(wù)/品牌,幫助企業(yè)鎖定精準(zhǔn)用戶,提高在線咨詢和轉(zhuǎn)化,使成都網(wǎng)站營銷成為有效果、有回報(bào)的無錫營銷推廣。創(chuàng)新互聯(lián)專業(yè)成都網(wǎng)站建設(shè)10多年了,客戶滿意度97.8%,歡迎成都創(chuàng)新互聯(lián)客戶聯(lián)系。
開發(fā)環(huán)境:
Python 3.6
Pycharm
目標(biāo)地址
https://www.dmzj.com/info/yaoshenji.html
代碼
導(dǎo)入工具
import requests import os import re from bs4 import BeautifulSoup from contextlib import closing from tqdm import tqdm import time
獲取動漫章節(jié)鏈接和章節(jié)名
r = requests.get(url=target_url) bs = BeautifulSoup(r.text, 'lxml') list_con_li = bs.find('ul', class_="list_con_li") cartoon_list = list_con_li.find_all('a') chapter_names = [] chapter_urls = [] for cartoon in cartoon_list: href = cartoon.get('href') name = cartoon.text chapter_names.insert(0, name) chapter_urls.insert(0, href) print(chapter_urls)
下載漫畫
for i, url in enumerate(tqdm(chapter_urls)): print(i,url) download_header = { 'Referer':url, 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36' } name = chapter_names[i] # 去掉. while '.' in name: name = name.replace('.', '') chapter_save_dir = os.path.join(save_dir, name) if name not in os.listdir(save_dir): os.mkdir(chapter_save_dir) r = requests.get(url=url) html = BeautifulSoup(r.text, 'lxml') script_info = html.script pics = re.findall('\d{13,14}', str(script_info)) for j, pic in enumerate(pics): if len(pic) == 13: pics[j] = pic + '0' pics = sorted(pics, key=lambda x: int(x)) chapterpic_hou = re.findall('\|(\d{5})\|', str(script_info))[0] chapterpic_qian = re.findall('\|(\d{4})\|', str(script_info))[0] for idx, pic in enumerate(pics): if pic[-1] == '0': url = 'https://images.dmzj.com/img/chapterpic/' + chapterpic_qian + '/' + chapterpic_hou + '/' + pic[ :-1] + '.jpg' else: url = 'https://images.dmzj.com/img/chapterpic/' + chapterpic_qian + '/' + chapterpic_hou + '/' + pic + '.jpg' pic_name = '%03d.jpg' % (idx + 1) pic_save_path = os.path.join(chapter_save_dir, pic_name) print(url) response = requests.get(url,headers=download_header) # with closing(requests.get(url, headers=download_header, stream=True)) as response: # chunk_size = 1024 # content_size = int(response.headers['content-length']) print(response) if response.status_code == 200: with open(pic_save_path, "wb") as file: # for data in response.iter_content(chunk_size=chunk_size): file.write(response.content) else: print('鏈接異常') time.sleep(2)
創(chuàng)建保存目錄
save_dir = '妖神記' if save_dir not in os.listdir('./'): os.mkdir(save_dir) target_url = "https://www.dmzj.com/info/yaoshenji.html"
感謝你能夠認(rèn)真閱讀完這篇文章,希望小編分享的“Python怎么爬取漫畫圖片”這篇文章對大家有幫助,同時也希望大家多多支持創(chuàng)新互聯(lián),關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道,更多相關(guān)知識等著你來學(xué)習(xí)!