Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲

這篇文章給大家介紹Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲，內(nèi)容非常詳細(xì)，感興趣的小伙伴們可以參考借鑒，希望對大家能有所幫助。

讓客戶滿意是我們工作的目標(biāo)，不斷超越客戶的期望值來自于我們對這個(gè)行業(yè)的熱愛。我們立志把好的技術(shù)通過有效、簡單的方式提供給客戶，將通過不懈努力成為客戶在信息化領(lǐng)域值得信任、有價(jià)值的長期合作伙伴，公司提供的服務(wù)項(xiàng)目有：空間域名、網(wǎng)絡(luò)空間、營銷軟件、網(wǎng)站建設(shè)、班瑪網(wǎng)站維護(hù)、網(wǎng)站推廣。

第一部分：

獲取網(wǎng)頁信息：

import requests
url = "https://voice.baidu.com/act/newpneumonia/newpneumonia"
response = requests.get(url)

第二部分：

可以觀察數(shù)據(jù)的特點(diǎn)：數(shù)據(jù)包含在script標(biāo)簽里，使用xpath來獲取數(shù)據(jù)。導(dǎo)入一個(gè)模塊 from lxml import etree 生成一個(gè)html對象并且進(jìn)行解析可以得到一個(gè)類型為list的內(nèi)容，使用第一項(xiàng)就可以得到全部內(nèi)容接下來首先獲取component的內(nèi)容，這時(shí)使用json模塊，將字符串類型轉(zhuǎn)變?yōu)樽值?Python的數(shù)據(jù)結(jié)構(gòu)）為了獲取國內(nèi)的數(shù)據(jù)，需要在component中找到caseList

接下來上代碼：

from lxml import etree
import json
# 生成HTML對象
html = etree.HTML(response.text)
result = html.xpath('//script[@type="application/json"]/text()')
result = result[0]
# json.load()方法可以將字符串轉(zhuǎn)化為python數(shù)據(jù)類型
result = json.loads(result)
result_in = result['component'][0]['caseList']

第三部分：

將國內(nèi)的數(shù)據(jù)存儲到excel表格中：使用openyxl模塊，import openpyxl 首先創(chuàng)建一個(gè)工作簿，在工作簿下創(chuàng)建一個(gè)工作表接下來給工作表命名和給工作表賦予屬性

代碼如下：

import openpyxl
#創(chuàng)建工作簿
wb = openpyxl.Workbook()
#創(chuàng)建工作表
ws = wb.active
ws.title = "國內(nèi)疫情"
ws.append(['省份', '累計(jì)確診', '死亡', '治愈', '現(xiàn)有確診', '累計(jì)確診增量', '死亡增量', '治愈增量', '現(xiàn)有確診增量'])
'''
area --> 大多為省份
city --> 城市
confirmed --> 累計(jì)
crued --> 值域
relativeTime -->
confirmedRelative --> 累計(jì)的增量
curedRelative --> 值域的增量
curConfirm --> 現(xiàn)有確鎮(zhèn)
curConfirmRelative --> 現(xiàn)有確鎮(zhèn)的增量
'''
for each in result_in:
    temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['curConfirm'],
                 each['confirmedRelative'], each['diedRelative'], each['curedRelative'],
                 each['curConfirmRelative']]
    for i in range(len(temp_list)):
        if temp_list[i] == '':
            temp_list[i] = '0'
    ws.append(temp_list)
wb.save('./data.xlsx')

第四部分：

將國外數(shù)據(jù)存儲到excel中：在component的globalList中得到國外的數(shù)據(jù) 然后創(chuàng)建excel表格中的sheet即可，分別表示不同的大洲

代碼如下：

data_out = result['component'][0]['globalList']
for each in data_out:
    sheet_title = each['area']
    # 創(chuàng)建一個(gè)新的工作表
    ws_out = wb.create_sheet(sheet_title)
    ws_out.append(['國家', '累計(jì)確診', '死亡', '治愈', '現(xiàn)有確診', '累計(jì)確診增量'])
    for country in each['subList']:
        list_temp = [country['country'], country['confirmed'], country['died'], country['crued'],
                     country['curConfirm'], country['confirmedRelative']]
        for i in range(len(list_temp)):
            if list_temp[i] == '':
                list_temp[i] = '0'
        ws_out.append(list_temp)
wb.save('./data.xlsx')

整體代碼如下：

import requests
from lxml import etree
import json
import openpyxl
 
url = "https://voice.baidu.com/act/newpneumonia/newpneumonia"
response = requests.get(url)
#print(response.text)
# 生成HTML對象
html = etree.HTML(response.text)
result = html.xpath('//script[@type="application/json"]/text()')
result = result[0]
# json.load()方法可以將字符串轉(zhuǎn)化為python數(shù)據(jù)類型
result = json.loads(result)
#創(chuàng)建工作簿
wb = openpyxl.Workbook()
#創(chuàng)建工作表
ws = wb.active
ws.title = "國內(nèi)疫情"
ws.append(['省份', '累計(jì)確診', '死亡', '治愈', '現(xiàn)有確診', '累計(jì)確診增量', '死亡增量', '治愈增量', '現(xiàn)有確診增量'])
result_in = result['component'][0]['caseList']
data_out = result['component'][0]['globalList']
'''
area --> 大多為省份
city --> 城市
confirmed --> 累計(jì)
crued --> 值域
relativeTime -->
confirmedRelative --> 累計(jì)的增量
curedRelative --> 值域的增量
curConfirm --> 現(xiàn)有確鎮(zhèn)
curConfirmRelative --> 現(xiàn)有確鎮(zhèn)的增量
'''
for each in result_in:
    temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['curConfirm'],
                 each['confirmedRelative'], each['diedRelative'], each['curedRelative'],
                 each['curConfirmRelative']]
    for i in range(len(temp_list)):
        if temp_list[i] == '':
            temp_list[i] = '0'
    ws.append(temp_list)
# 獲取國外疫情數(shù)據(jù)
for each in data_out:
    sheet_title = each['area']
    # 創(chuàng)建一個(gè)新的工作表
    ws_out = wb.create_sheet(sheet_title)
    ws_out.append(['國家', '累計(jì)確診', '死亡', '治愈', '現(xiàn)有確診', '累計(jì)確診增量'])
    for country in each['subList']:
        list_temp = [country['country'], country['confirmed'], country['died'], country['crued'],
                     country['curConfirm'], country['confirmedRelative']]
        for i in range(len(list_temp)):
            if list_temp[i] == '':
                list_temp[i] = '0'
        ws_out.append(list_temp)
wb.save('./data.xlsx')

結(jié)果如下：

國內(nèi)： Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲

國外： Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲

申請即送：

Python軟件安裝包，Python實(shí)戰(zhàn)教程
資料免費(fèi)領(lǐng)取，包括 Python基礎(chǔ)學(xué)習(xí)、進(jìn)階學(xué)習(xí)、爬蟲、人工智能、自動化運(yùn)維、自動化測試等

Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲

關(guān)于Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲就分享到這里了，希望以上內(nèi)容可以對大家有一定的幫助，可以學(xué)到更多知識。如果覺得文章不錯，可以把它分享出去讓更多的人看到。

分享題目：Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲
轉(zhuǎn)載來源：http://weahome.cn/article/gcoiei.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲

推薦：

申請即送：

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

Python中怎么實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)爬蟲

推薦 ：

申請即送：

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管

推薦：