本篇內(nèi)容主要講解“怎么用Python抓取百度地圖里的店名信息”,感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷,實(shí)用性強(qiáng)。下面就讓小編來帶大家學(xué)習(xí)“怎么用Python抓取百度地圖里的店名信息”吧!
創(chuàng)新互聯(lián)公司堅(jiān)持“要么做到,要么別承諾”的工作理念,服務(wù)領(lǐng)域包括:網(wǎng)站設(shè)計(jì)、成都做網(wǎng)站、企業(yè)官網(wǎng)、英文網(wǎng)站、手機(jī)端網(wǎng)站、網(wǎng)站推廣等服務(wù),滿足客戶于互聯(lián)網(wǎng)時(shí)代的青神網(wǎng)站設(shè)計(jì)、移動(dòng)媒體設(shè)計(jì)的需求,幫助企業(yè)找到有效的互聯(lián)網(wǎng)解決方案。努力成為您成熟可靠的網(wǎng)絡(luò)建設(shè)合作伙伴!
代碼如下:
import requests
import re
import csv
import time
def BusinessFromBaiduDitu(citycode = '287',key_word='篩網(wǎng)',pageno=0):
parameter = {
"newmap": "1",
"reqflag": "pcmap",
"biz": "1",
"from": "webmap",
"da_par": "direct",
"pcevaname": "pc4.1",
"qt": "con",
"c": citycode, # 城市代碼
"wd": key_word, # 搜索關(guān)鍵詞
"wd2": "",
"pn": pageno, # 頁數(shù)
"nn": pageno * 10,
"db": "0",
"sug": "0",
"addr": "0",
"da_src": "pcmappg.poi.page",
"on_gel": "1",
"src": "7",
"gr": "3",
"l": "12",
"tn": "B_NORMAL_MAP", # "u_loc": "12621219.536556,2630747.285024",
"ie": "utf-8", # "b": "(11845157.18,3047692.2;11922085.18,3073932.2)", #這個(gè)應(yīng)該是地理位置坐標(biāo),可以忽略
"t": "1468896652886"}
headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36(KHTML, like Gecko) Chrome/56.0.2924.87Safari/537.36'}
url = 'http://map.baidu.com/'
htm = requests.get(url, params=parameter, headers=headers)
htm = htm.text.encode('latin-1').decode('unicode_escape') # 轉(zhuǎn)碼
pattern = r'(?<=\baddress_norm":"\[).+?(?="ty":)'
htm = re.findall(pattern, htm) # 按段落匹配
for r in htm:
pattern = r'(?<=\b"\},"name":").+?(?=")'
name = re.findall(pattern, r) #if not name:
pattern = r'(?<=\b,"name":").+?(?=")'
name = re.findall(pattern, r) #print(name[0]) # 名稱
pattern = r'.+?(?=")'
adr = re.findall(pattern, r)
pattern = r'\(.+?\['
address = re.sub(pattern, ' ', adr[0])
pattern = r'\(.+?\]'
address = re.sub(pattern, ' ', address) #print(address) # 地址
pattern = r'(?<="phone":").+?(?=")'
phone = re.findall(pattern, r)
try:
if phone[0] and '",' != phone[0]:
phone_list = phone[0].split(sep=',')
for number in phone_list:
if re.match('1', number):
print(citycode+name[0]+','+address+','+number)
writer.writerow((name[0], address, number))
except:
continue
print(citycode + ' ' + key_word + ' ' + str(pageno))
現(xiàn)在開始寫我搜“絲網(wǎng)”“篩網(wǎng)”(key_word)的代碼獲取想要的數(shù)據(jù),也要改城市代碼(citycode)城市代碼文件鏈接
#citynumlist是百度地圖城市代碼列表
citynumlist = ['33','34','35'
'''''''''''''''''
'370','371','372']
keywordlist = ['絲網(wǎng)','篩網(wǎng)']
start = time.time()
num = 1
#建立csv文件,保存數(shù)據(jù)
csvFile = open(r'/Users/apple888/PycharmProjects/百度地圖/Data/%s.csv' % 'CityData','a+', newline='', encoding='utf-8')
writer = csv.writer(csvFile)
writer.writerow(('name', 'address', 'number'))
for citycode in citynumlist:
for kw in keywordlist:
for page in range(10):
BusinessFromBaiduDitu(citycode=citycode, key_word=kw, pageno=page)
#防止訪問頻率太高,避免被百度公司封
time.sleep(1)
if num%20 == 0:
time.sleep(2)
if num%100== 0:
time.sleep(3)
if num%200==0:
time.sleep(7)
num = num + 1
end = time.time()
lasttime = int((end-start))
print('耗時(shí)'+str(lasttime)+'s')
程序運(yùn)行了大約三個(gè)小時(shí),抓取了1085條有用信息信息
到此,相信大家對(duì)“怎么用Python抓取百度地圖里的店名信息”有了更深的了解,不妨來實(shí)際操作一番吧!這里是創(chuàng)新互聯(lián)網(wǎng)站,更多相關(guān)內(nèi)容可以進(jìn)入相關(guān)頻道進(jìn)行查詢,關(guān)注我們,繼續(xù)學(xué)習(xí)!