真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

成都創(chuàng)新互聯(lián)網(wǎng)站制作重慶分公司

利用python爬取貝殼網(wǎng)租房信息

? ? 最近準(zhǔn)備換房子,在網(wǎng)站上尋找各種房源信息,看得眼花繚亂,于是想著能否將基本信息匯總起來便于查找,便用python將基本信息爬下來放到excel,這樣一來就容易搜索了。

網(wǎng)站建設(shè)公司,為您提供網(wǎng)站建設(shè),網(wǎng)站制作,網(wǎng)頁設(shè)計(jì)及定制網(wǎng)站建設(shè)服務(wù),專注于企業(yè)網(wǎng)站制作,高端網(wǎng)頁制作,對(duì)成都戶外休閑椅等多個(gè)行業(yè)擁有豐富的網(wǎng)站建設(shè)經(jīng)驗(yàn)的網(wǎng)站建設(shè)公司。專業(yè)網(wǎng)站設(shè)計(jì),網(wǎng)站優(yōu)化推廣哪家好,專業(yè)成都網(wǎng)站營銷優(yōu)化,H5建站,響應(yīng)式網(wǎng)站。

? ? 1. 利用lxml中的xpath提取信息

? ? xpath是一門在 xml文檔中查找信息的語言,xpath可用來在 xml 文檔中對(duì)元素和屬性進(jìn)行遍歷。對(duì)比正則表達(dá)式 re兩者可以完成同樣的工作,實(shí)現(xiàn)的功能也差不多,但xpath明顯比re具有優(yōu)勢(shì)。具有如下優(yōu)點(diǎn):(1)可在xml中查找信息 ;(2)支持html的查找;(3)通過元素和屬性進(jìn)行導(dǎo)航

? ? 2. 利用xlsxwriter模塊將信息保存至excel

? ? xlsxwriter是操作excel的庫,可以幫助我們高效快速的,大批量的,自動(dòng)化的操作excel。它可以寫數(shù)據(jù),畫圖,完成大部分常用的excel操作。缺點(diǎn)是xlsxwriter 只能創(chuàng)建新文件,不可以修改原有文件,如果創(chuàng)建新文件時(shí)與原有文件同名,則會(huì)覆蓋原有文件。

? ? 3. 爬取思路

? ? 觀察發(fā)現(xiàn)貝殼網(wǎng)租房信息總共是100頁,我們可以分每頁獲取到html代碼,然后提取需要的信息保存至字典,將所有頁面的信息匯總,最后將字典數(shù)據(jù)寫入excel。

? ? 4. 爬蟲源代碼

#?@Author:?Rainbowhhy
#?@Date??:?19-6-25?下午6:35


import?requests
import?time
from?lxml?import?etree
import?xlsxwriter


def?get_html(page):
????"""獲取網(wǎng)站html代碼"""
????url?=?"https://bj.zu.ke.com/zufang/pg{}/#contentList".format(page)
????headers?=?{
????????'user-agent':?'Mozilla/5.0?(X11;?Linux?x86_64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/70.0.3538.77?Safari/537.36'
????}
????response?=?requests.get(url,?headers=headers).text
????return?response


def?parse_html(htmlcode,?data):
????"""解析html代碼"""
????content?=?etree.HTML(htmlcode)
????results?=?content.xpath('///div[@class="content__article"]/div[1]/div')
????for?result?in?results[:]:
????????community?=?result.xpath('./div[1]/p[@class="content__list--item--title?twoline"]/a/text()')[0].replace('\n',
????????????????????????????????????????????????????????????????????????????????????????????????????????????????'').strip().split()[
????????????0]
????????address?=?"-".join(result.xpath('./div/p[@class="content__list--item--des"]/a/text()'))
????????landlord?=?result.xpath('./div/p[@class="content__list--item--brand?oneline"]/text()')[0].replace('\n',
??????????????????????????????????????????????????????????????????????????????????????????????????????????'').strip()?if?len(
????????????result.xpath('./div/p[@class="content__list--item--brand?oneline"]/text()'))?>?0?else?""
????????postime?=?result.xpath('./div/p[@class="content__list--item--time?oneline"]/text()')[0]
????????introduction?=?",".join(result.xpath('./div/p[@class="content__list--item--bottom?oneline"]/i/text()'))
????????price?=?result.xpath('./div/span/em/text()')[0]
????????description?=?"".join(result.xpath('./div/p[2]/text()')).replace('\n',?'').replace('-',?'').strip().split()
????????area?=?description[0]
????????count?=?len(description)
????????if?count?==?6:
????????????orientation?=?description[1]?+?description[2]?+?description[3]?+?description[4]
????????elif?count?==?5:
????????????orientation?=?description[1]?+?description[2]?+?description[3]
????????elif?count?==?4:
????????????orientation?=?description[1]?+?description[2]
????????elif?count?==?3:
????????????orientation?=?description[1]
????????else:
????????????orientation?=?""
????????pattern?=?description[-1]
????????floor?=?"".join(result.xpath('./div/p[2]/span/text()')[1].replace('\n',?'').strip().split()).strip()?if?len(
????????????result.xpath('./div/p[2]/span/text()'))?>?1?else?""
????????date_time?=?time.strftime("%Y-%m-%d",?time.localtime())
????????"""數(shù)據(jù)存入字典"""
????????data_dict?=?{
????????????"community":?community,
????????????"address":?address,
????????????"landlord":?landlord,
????????????"postime":?postime,
????????????"introduction":?introduction,
????????????"price":?'¥'?+?price,
????????????"area":?area,
????????????"orientation":?orientation,
????????????"pattern":?pattern,
????????????"floor":?floor,
????????????"date_time":?date_time
????????}

????????data.append(data_dict)


def?excel_storage(response):
????"""將字典數(shù)據(jù)寫入excel"""
????workbook?=?xlsxwriter.Workbook('./beikeHouse.xlsx')
????worksheet?=?workbook.add_worksheet()
????"""設(shè)置標(biāo)題加粗"""
????bold_format?=?workbook.add_format({'bold':?True})
????worksheet.write('A1',?'小區(qū)名稱',?bold_format)
????worksheet.write('B1',?'租房地址',?bold_format)
????worksheet.write('C1',?'房屋來源',?bold_format)
????worksheet.write('D1',?'發(fā)布時(shí)間',?bold_format)
????worksheet.write('E1',?'租房說明',?bold_format)
????worksheet.write('F1',?'房屋價(jià)格',?bold_format)
????worksheet.write('G1',?'房屋面積',?bold_format)
????worksheet.write('H1',?'房屋朝向',?bold_format)
????worksheet.write('I1',?'房屋戶型',?bold_format)
????worksheet.write('J1',?'房屋樓層',?bold_format)
????worksheet.write('K1',?'查看日期',?bold_format)

????row?=?1
????col?=?0
????for?item?in?response:
????????worksheet.write_string(row,?col?+?0,?item['community'])
????????worksheet.write_string(row,?col?+?1,?item['address'])
????????worksheet.write_string(row,?col?+?2,?item['landlord'])
????????worksheet.write_string(row,?col?+?3,?item['postime'])
????????worksheet.write_string(row,?col?+?4,?item['introduction'])
????????worksheet.write_string(row,?col?+?5,?item['price'])
????????worksheet.write_string(row,?col?+?6,?item['area'])
????????worksheet.write_string(row,?col?+?7,?item['orientation'])
????????worksheet.write_string(row,?col?+?8,?item['pattern'])
????????worksheet.write_string(row,?col?+?9,?item['floor'])
????????worksheet.write_string(row,?col?+?10,?item['date_time'])
????????row?+=?1
????workbook.close()


def?main():
????all_datas?=?[]
????"""網(wǎng)站總共100頁,循環(huán)100次"""
????for?page?in?range(1,?100):
????????html?=?get_html(page)
????????parse_html(html,?all_datas)
????excel_storage(all_datas)


if?__name__?==?'__main__':
????main()

? ? 5. 信息截圖

? ??利用python爬取貝殼網(wǎng)租房信息


當(dāng)前題目:利用python爬取貝殼網(wǎng)租房信息
文章出自:http://weahome.cn/article/ipsjjg.html

其他資訊

在線咨詢

微信咨詢

電話咨詢

028-86922220(工作日)

18980820575(7×24)

提交需求

返回頂部