python爬蟲時(shí)怎么使用R連續(xù)抓取多個(gè)頁面

這篇文章將為大家詳細(xì)講解有關(guān)python爬蟲時(shí)怎么使用R連續(xù)抓取多個(gè)頁面，小編覺得挺實(shí)用的，因此分享給大家做個(gè)參考，希望大家閱讀完這篇文章后可以有所收獲。

我們提供的服務(wù)有：網(wǎng)站設(shè)計(jì)、做網(wǎng)站、微信公眾號(hào)開發(fā)、網(wǎng)站優(yōu)化、網(wǎng)站認(rèn)證、圖們ssl等。為上1000+企事業(yè)單位解決了網(wǎng)站和推廣的問題。提供周到的售前咨詢和貼心的售后服務(wù)，是有科學(xué)管理、有技術(shù)的圖們網(wǎng)站制作公司

當(dāng)抓取多頁的html數(shù)據(jù)，但容易被困在通用方法部分的功能上，而導(dǎo)致無法實(shí)現(xiàn)連續(xù)抓取多個(gè)頁面。這個(gè)時(shí)候可以仔細(xì)觀察代碼當(dāng)轉(zhuǎn)到網(wǎng)頁收集信息，將其添加到數(shù)據(jù)框，然后移至下一頁就可以解決了。

示例：

多網(wǎng)頁抓取時(shí)會(huì)出現(xiàn)以下問題。

#attempt
library(purrr)
url_base <-"https://secure.capitalbikeshare.com/profile/trips/QNURCMF2Q6"
map_df(1:70, function(i) {
cat(".")
pg <- read_html(sprintf(url_base, i))   
data.frame( startd=html_text(html_nodes(pg, ".ed-table__col_trip-start-date")), 
endd=html_text(html_nodes(pg,".ed-table__col_trip-end-date")),
duration=html_text(html_nodes(pg, ".ed-table__col_trip-duration"))
)
}) -> table

#attempt 2 (with just one data column)
url_base <-"https://secure.capitalbikeshare.com/profile/trips/QNURCMF2Q6"
map_df(1:70, function(i) {
page %>% html_nodes(".ed-table__item_odd") %>% html_text()
}) -> table

解決方案：

library(rvest)pgsession<-html_session(login)pgform<-html_form(pgsession)[[2]]filled_form<-set_values(pgform, email="*****", password="*****")submit_form(pgsession, filled_form)#pre allocate the final results dataframe.results<-data.frame()  for (i in 1:5){
  url<-"http://stackoverflow.com/users/**********?tab=answers&sort=activity&page="
  url<-paste0(url, i)
  page<-jump_to(pgsession, url)

  #collect question votes and question title
  summary<-html_nodes(page, "div .answer-summary")
  question<-matrix(html_text(html_nodes(summary, "div"), trim=TRUE), ncol=2, byrow = TRUE)

  #find date answered, hyperlink and whether it was accepted
  dateans<-html_node(summary, "span") %>% html_attr("title")
  hyperlink<-html_node(summary, "div a") %>% html_attr("href")
  accepted<-html_node(summary, "div") %>% html_attr("class")

  #create temp results then bind to final results 
  rtemp<-cbind(question, dateans, accepted, hyperlink)
  results<-rbind(results, rtemp)}#Dataframe Clean-upnames(results)<-c("Votes", "Answer", "Date", "Accepted", "HyperLink")results$Votes<-as.integer(as.character(results$Votes))results$Accepted<-ifelse(results$Accepted=="answer-votes default", 0, 1)

以上就是連續(xù)抓取多個(gè)頁面的使用方法，營銷推廣、爬蟲數(shù)據(jù)采集、廣告補(bǔ)量等ip問題，可以嘗試下太陽http代理，助力解決多行業(yè)ip問題，免費(fèi)送10000ip試用。

關(guān)于“python爬蟲時(shí)怎么使用R連續(xù)抓取多個(gè)頁面”這篇文章就分享到這里了，希望以上內(nèi)容可以對大家有一定的幫助，使各位可以學(xué)到更多知識(shí)，如果覺得文章不錯(cuò)，請把它分享出去讓更多的人看到。

文章名稱：python爬蟲時(shí)怎么使用R連續(xù)抓取多個(gè)頁面
轉(zhuǎn)載來源：http://weahome.cn/article/pigicp.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

python爬蟲時(shí)怎么使用R連續(xù)抓取多個(gè)頁面

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管