這篇文章主要介紹“怎么用Python抓取國家醫(yī)療費用數(shù)據(jù)”,在日常操作中,相信很多人在怎么用Python抓取國家醫(yī)療費用數(shù)據(jù)問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”怎么用Python抓取國家醫(yī)療費用數(shù)據(jù)”的疑惑有所幫助!接下來,請跟著小編一起來學(xué)習(xí)吧!
創(chuàng)新互聯(lián)是一家集網(wǎng)站建設(shè),蘇尼特右企業(yè)網(wǎng)站建設(shè),蘇尼特右品牌網(wǎng)站建設(shè),網(wǎng)站定制,蘇尼特右網(wǎng)站建設(shè)報價,網(wǎng)絡(luò)營銷,網(wǎng)絡(luò)優(yōu)化,蘇尼特右網(wǎng)站推廣為一體的創(chuàng)新建站企業(yè),幫助傳統(tǒng)企業(yè)提升企業(yè)形象加強企業(yè)競爭力??沙浞譂M足這一群體相比中小企業(yè)更為豐富、高端、多元的互聯(lián)網(wǎng)需求。同時我們時刻保持專業(yè)、時尚、前沿,時刻以成就客戶成長自我,堅持不斷學(xué)習(xí)、思考、沉淀、凈化自己,讓我們?yōu)楦嗟钠髽I(yè)打造出實用型網(wǎng)站。
整個世界正被大流行困擾著,不同國家拿出了不同的應(yīng)對策略,也取得了不同效果。這也是本文的腦洞來源,筆者打算研究一下各國在醫(yī)療基礎(chǔ)設(shè)置上的開支,對幾個國家的醫(yī)療費用進行數(shù)據(jù)可視化。
由于沒有找到最近一年的可靠數(shù)據(jù)來源,所以這里使用的是2016年的數(shù)據(jù)。數(shù)據(jù)清楚哪個國家花得最多、哪個國家花得最少。我一直想試試在Python中網(wǎng)絡(luò)抓取和數(shù)據(jù)可視化,這算是個不錯的項目。雖然手動將數(shù)據(jù)輸入Excel肯定快得多,但是這樣就不會有寶貴的機會來練習(xí)一些技能了。
數(shù)據(jù)科學(xué)就是利用各種工具包來解決問題,網(wǎng)絡(luò)抓取和正則表達式是我需要研究的兩個領(lǐng)域。結(jié)果簡短但復(fù)雜,這一項目展示了如何將三種技術(shù)結(jié)合起來解決數(shù)據(jù)科學(xué)問題。
要求
網(wǎng)絡(luò)抓取主要分為兩部分:
通過發(fā)出HTTP請求來獲取數(shù)據(jù)
通過解析HTMLDOM來提取重要數(shù)據(jù)
庫和工具
Requests能夠非常簡單地發(fā)送HTTP請求。
Pandas是一個Python包,提供快速、靈活和有表現(xiàn)力的數(shù)據(jù)結(jié)構(gòu)。
Web Scraper可以幫助在不設(shè)置任何自動化瀏覽器的情況下抓取動態(tài)網(wǎng)站。
Beautiful Soup是一個Python庫,用于從HTML和XML文件中提取數(shù)據(jù)。
matplotlib是一個綜合的庫,用于在Python中創(chuàng)建靜態(tài)、動畫和交互式可視化效果。
設(shè)置
設(shè)置非常簡單,只需創(chuàng)建一個文件夾,并安裝BeautifulSoup和Requests。此處假設(shè)已經(jīng)安裝了Python3.x,再根據(jù)指令來創(chuàng)建文件夾并安裝庫。
mkdir scraper pip install beautifulsoup4 pip install requests pip install matplotlib pip install pandas
現(xiàn)在,在該文件夾中創(chuàng)建一個任意名稱的文件。這里用的是scraping.py.,然后在文件中導(dǎo)入Beautiful Soup和 requests,如下所示:
import pandas as pd from bs4 import BeautifulSoup import matplotlib.pyplot as plt import requests
抓取的內(nèi)容:國家名;人均開銷。
網(wǎng)絡(luò)抓取
現(xiàn)在,所有scraper設(shè)置都已準備好,應(yīng)向target URL發(fā)出GET請求以獲得原始HTML數(shù)據(jù)。
r =requests.get( https://api.scrapingdog.com/scrape?api_key=&url=https://data.worldbank.org/indicator/SH.XPD.CHEX.PC.CD?most_recent_value_desc=false&dynamic=true ).text
這將得出target URL的HTML代碼,我們必須使用Beautiful Soup來解析HTML。
soup = BeautifulSoup(r,’html.parser’) country=list() expense=list()
筆者用兩張空表來存儲國家名和每個國家24小時內(nèi)的開支??梢钥吹剑總€國家都存儲在一個“項目”標簽中,把所有的項目標簽都存儲在一張列表中。
try: Countries=soup.find_all(“div”,{“class”:”item”}) except: Countries=None
世界上有190個國家,為每個國家的醫(yī)療開支運行一個for循環(huán):
for i in range(0,190): country.append(Countries[i+1].find_all(“div”,{“class”:None})[0].text.replace(“”,””)) expense.append(round(float(Countries[i+1].find_all(“div”,{“class”:None})[2].text.replace(“”,””).replace(‘,’,’’)))/365) Data = {‘country’:country,’expense’: expense}
因為我想看看這些國家每天是如何花錢的,所以把這筆費用除以365。如果把給定的數(shù)據(jù)直接除以365,這可能會更容易些,但這樣就沒有學(xué)習(xí)的意義了。現(xiàn)在的“數(shù)據(jù)”看起來是這樣的:
{ country : [ Central AfricanRepublic , Burundi , Mozambique , Congo, Dem. Rep. , Gambia, The , Niger , Madagascar , Ethiopia , Malawi , Mali , Eritrea , Benin , Chad , Bangladesh , Tanzania , Guinea , Uganda , Haiti , Togo , Guinea-Bissau , Pakistan , Burkina Faso , Nepal , Mauritania , Rwanda , Senegal , PapuaNew Guinea , Lao PDR , Tajikistan , Zambia , Afghanistan , Comoros , Myanmar , India , Cameroon , Syrian Arab Republic , Kenya , Ghana ,"Cote d Ivoire", Liberia , Djibouti , Congo, Rep. , Yemen, Rep. , Kyrgyz Republic , Cambodia , Nigeria , Timor-Leste , Lesotho , SierraLeone , Bhutan , Zimbabwe , Angola , Sao Tome and Principe , SolomonIslands , Vanuatu , Indonesia , Vietnam , Philippines , Egypt, Arab Rep. , Uzbekistan , Mongolia , Ukraine , Sudan , Iraq , Sri Lanka , CaboVerde , Moldova , Morocco , Fiji , Kiribati , Nicaragua , Guyana , Honduras , Tonga , Bolivia , Gabon , Eswatini , Thailand , Jordan , Samoa , Guatemala , St. Vincent and the Grenadines , Tunisia , Algeria , Kazakhstan , Azerbaijan , Albania , Equatorial Guinea , El Salvador , Jamaica , Belize , Georgia , Libya , Peru , Belarus , Paraguay , NorthMacedonia , Colombia , Suriname , Armenia , Malaysia , Botswana , Micronesia, Fed. Sts. , China , Namibia , Dominican Republic , Iran,Islamic Rep. , Dominica , Turkmenistan , South Africa , Bosnia andHerzegovina , Mexico , Turkey , Russian Federation , Romania , St. Lucia , Serbia , Ecuador , Tuvalu , Grenada , Montenegro , Mauritius , Seychelles , Bulgaria , Antigua and Barbuda , Brunei Darussalam , Oman , Lebanon , Poland , Marshall Islands , Latvia , Croatia , Costa Rica , St. Kitts and Nevis , Hungary , Argentina , Cuba , Lithuania , Nauru , Brazil , Panama , Maldives , Trinidad and Tobago , Kuwait , Bahrain , Saudi Arabia , Barbados , Slovak Republic , Estonia , Chile , CzechRepublic , United Arab Emirates , Uruguay , Greece , Venezuela, RB , Cyprus , Palau , Portugal , Qatar , Slovenia , Bahamas, The , Korea,Rep. , Malta , Spain , Singapore , Italy , Israel , Monaco , SanMarino , New Zealand , Andorra , United Kingdom , Finland , Belgium , Japan , France , Canada , Austria , Germany , Netherlands , Ireland , Australia , Iceland , Denmark , Sweden , Luxembourg , Norway , Switzerland , United States , World ], expense : [0.043835616438356165,0.049315068493150684, 0.052054794520547946, 0.057534246575342465,0.057534246575342465, 0.06301369863013699, 0.06575342465753424,0.07671232876712329, 0.0821917808219178, 0.0821917808219178,0.0821917808219178, 0.0821917808219178, 0.08767123287671233,0.09315068493150686, 0.09863013698630137, 0.10136986301369863,0.10410958904109589, 0.10410958904109589, 0.10684931506849316,0.10684931506849316, 0.1095890410958904, 0.11232876712328767,0.1232876712328767, 0.12876712328767123, 0.13150684931506848,0.14520547945205478, 0.1506849315068493, 0.1506849315068493, 0.15342465753424658,0.15616438356164383, 0.15616438356164383, 0.16164383561643836,0.16986301369863013, 0.1726027397260274, 0.17534246575342466,0.18082191780821918, 0.18082191780821918, 0.1863013698630137,0.1863013698630137, 0.1863013698630137, 0.1917808219178082, 0.1917808219178082,0.19726027397260273, 0.2, 0.2136986301369863, 0.21643835616438356,0.2191780821917808, 0.2356164383561644, 0.2356164383561644, 0.2493150684931507,0.25753424657534246, 0.2602739726027397, 0.2876712328767123, 0.29041095890410956,0.3013698630136986, 0.30684931506849317, 0.336986301369863,0.35342465753424657, 0.3589041095890411, 0.3698630136986301,0.3863013698630137, 0.3863013698630137, 0.41643835616438357,0.4191780821917808, 0.4191780821917808, 0.43561643835616437, 0.4684931506849315,0.4684931506849315, 0.4931506849315068, 0.5150684931506849, 0.5150684931506849,0.5260273972602739, 0.547945205479452, 0.5561643835616439, 0.5835616438356165,0.6027397260273972, 0.6054794520547945, 0.6082191780821918, 0.6136986301369863,0.6219178082191781, 0.6602739726027397, 0.684931506849315, 0.7013698630136986,0.7123287671232876, 0.7178082191780822, 0.7342465753424657, 0.7452054794520548,0.7698630136986301, 0.8054794520547945, 0.810958904109589, 0.8328767123287671,0.8438356164383561, 0.8575342465753425, 0.8657534246575342, 0.8712328767123287,0.8958904109589041, 0.8986301369863013, 0.9315068493150684, 0.9753424657534246,0.9835616438356164, 0.9917808219178083, 1.0410958904109588, 1.0602739726027397,1.0904109589041096, 1.104109589041096, 1.1342465753424658, 1.1369863013698631,1.1479452054794521, 1.158904109589041, 1.1726027397260275, 1.2164383561643837,1.2657534246575342, 1.284931506849315, 1.284931506849315, 1.3041095890410959,1.3424657534246576, 1.3534246575342466, 1.3835616438356164, 1.389041095890411,1.4136986301369863, 1.4575342465753425, 1.515068493150685, 1.6356164383561644,1.6767123287671233, 1.7068493150684931, 1.7287671232876711, 1.7753424657534247,1.8136986301369864, 2.2164383561643834, 2.3315068493150686, 2.3945205479452056,2.421917808219178, 2.4356164383561643, 2.5506849315068494, 2.5835616438356164,2.6164383561643834, 2.66027397260274, 2.706849315068493, 2.7726027397260276,2.7835616438356166, 2.852054794520548, 2.871232876712329, 2.915068493150685,2.926027397260274, 3.010958904109589, 3.1424657534246574, 3.1890410958904107,3.23013698630137, 3.2465753424657535, 3.263013698630137, 3.621917808219178,3.6246575342465754, 3.778082191780822, 4.13972602739726, 4.323287671232877,4.476712328767123, 4.586301369863014, 4.934246575342466, 5.005479452054795,5.024657534246575, 5.027397260273973, 5.6, 6.3780821917808215,6.5479452054794525, 6.745205479452054, 7.504109589041096, 7.772602739726027,8.054794520547945, 8.254794520547945, 10.26027397260274, 10.506849315068493,10.843835616438357, 11.27945205479452, 11.367123287671232, 11.597260273972603,11.67945205479452, 12.213698630136987, 12.843835616438357, 12.915068493150685,12.991780821917809, 13.038356164383561, 13.704109589041096, 13.873972602739727,15.24931506849315, 15.646575342465754, 17.18082191780822, 20.487671232876714,26.947945205479453, 27.041095890410958, 2.8109589041095893]}
數(shù)據(jù)幀
繪制圖表之前,必須使用Pandas準備一個數(shù)據(jù)幀。首先我們得明確DataFrame是什么:
DataFrame是一個二維大小可變的、潛在的異構(gòu)表格式數(shù)據(jù)結(jié)構(gòu),帶有標記的軸(行和列)。創(chuàng)造一個數(shù)據(jù)幀非常簡單直接:
df = pd.DataFrame(Data,columns=[‘country’, ‘expense’])
可視化
我們大部分時間都花在收集和格式化數(shù)據(jù)上,現(xiàn)在到了做圖的時候啦,可以使用matplotlib和seaborn 來可視化數(shù)據(jù)。如果不太在意美觀,可以使用內(nèi)置的數(shù)據(jù)幀繪圖方法快速顯示結(jié)果:
df.plot(kind = ‘bar’, x=’country’, y=’expense’) plt.show()
現(xiàn)在,結(jié)論出來了:許多國家每天的支出都低于一美元。這些國家中大多數(shù)都位于亞洲和非洲,看來世界衛(wèi)生組織應(yīng)更關(guān)注這些國家。
到此,關(guān)于“怎么用Python抓取國家醫(yī)療費用數(shù)據(jù)”的學(xué)習(xí)就結(jié)束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學(xué)習(xí),快去試試吧!若想繼續(xù)學(xué)習(xí)更多相關(guān)知識,請繼續(xù)關(guān)注創(chuàng)新互聯(lián)網(wǎng)站,小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>
本文標題:怎么用Python抓取國家醫(yī)療費用數(shù)據(jù)
文章路徑:http://weahome.cn/article/psisod.html