真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

成都創(chuàng)新互聯(lián)網(wǎng)站制作重慶分公司

使用python采集腳本之家電子書資源并自動下載到本地的實例腳本-創(chuàng)新互聯(lián)

jb51上面的資源還比較全,就準備用python來實現(xiàn)自動采集信息,與下載啦。

專注于為中小企業(yè)提供網(wǎng)站建設(shè)、網(wǎng)站制作服務(wù),電腦端+手機端+微信端的三站合一,更高效的管理,為中小企業(yè)千陽免費做網(wǎng)站提供優(yōu)質(zhì)的服務(wù)。我們立足成都,凝聚了一批互聯(lián)網(wǎng)行業(yè)人才,有力地推動了千余家企業(yè)的穩(wěn)健成長,幫助中小企業(yè)通過網(wǎng)站建設(shè)實現(xiàn)規(guī)模擴充和轉(zhuǎn)變。

Python具有豐富和強大的庫,使用urllib,re等就可以輕松開發(fā)出一個網(wǎng)絡(luò)信息采集器!

下面,是我寫的一個實例腳本,用來采集某技術(shù)網(wǎng)站的特定欄目的所有電子書資源,并下載到本地保存!

軟件運行截圖如下:

使用python采集腳本之家電子書資源并自動下載到本地的實例腳本

在腳本運行時期,不但會打印出信息到shell窗口,還會保存日志到txt文件,記錄采集到的頁面地址,書籍的名稱,大小,服務(wù)器本地下載地址以及百度網(wǎng)盤的下載地址!

實例采集并下載創(chuàng)新互聯(lián)成都網(wǎng)站設(shè)計公司的python欄目電子書資源:

# -*- coding:utf-8 -*-
import re
import urllib2
import urllib
import sys
import os
reload(sys)
sys.setdefaultencoding('utf-8')
def getHtml(url):
 request = urllib2.Request(url)
 page = urllib2.urlopen(request)
 htmlcontent = page.read()
 #解決中文亂碼問題
 htmlcontent = htmlcontent.decode('gbk', 'ignore').encode("utf8",'ignore')
 return htmlcontent
def report(count, blockSize, totalSize):
 percent = int(count*blockSize*100/totalSize)
 sys.stdout.write("r%d%%" % percent + ' complete')
 sys.stdout.flush()
def getBookInfo(url):
 htmlcontent = getHtml(url);
 #print "htmlcontent=",htmlcontent; # you should see the ouput html
 #

crifan

regex_title = '(?P.+?)</h2>'; title = re.search(regex_title, htmlcontent); if(title): title = title.group("title"); print "書籍名字:",title; file_object.write('書籍名字:'+title+'r'); #<li>書籍大?。?span itemprop="fileSize">27.2MB</span></li> filesize = re.search('<spans+?itemprop="fileSize">(?P<filesize>.+?)</span>', htmlcontent); if(filesize): filesize = filesize.group("filesize"); print "文件大小:",filesize; file_object.write('文件大小:'+filesize+'r'); #<div id="squ6kqw" class="picthumb"><a target="_blank" bookimg = re.search('<divs+?class="picthumb"><a href="(?P<bookimg>.+?)" rel="external nofollow" target="_blank"', htmlcontent); if(bookimg): bookimg = bookimg.group("bookimg"); print "封面圖片:",bookimg; file_object.write('封面圖片:'+bookimg+'r'); #<li><a target="_blank">酷云中國電信下載</a></li> downurl1 = re.search('<li><a href="(?P<downurl1>.+?)" rel="external nofollow" target="_blank">酷云中國電信下載</a></li>', htmlcontent); if(downurl1): downurl1 = downurl1.group("downurl1"); print "下載地址1:",downurl1; file_object.write('下載地址1:'+downurl1+'r'); sys.stdout.write('rFetching ' + title + '...n') title = title.replace(' ', ''); title = title.replace('/', ''); saveFile = '/Users/superl/Desktop/pythonbook/'+title+'.rar'; if os.path.exists(saveFile): print "該文件已經(jīng)下載了!"; else: urllib.urlretrieve(downurl1, saveFile, reporthook=report); sys.stdout.write("rDownload complete, saved as %s" % (saveFile) + 'nn') sys.stdout.flush() file_object.write('文件下載成功!r'); else: print "下載地址1不存在"; file_error.write(url+'r'); file_error.write(title+"下載地址1不存在!文件沒有自動下載!r"); file_error.write('r'); #<li><a rel="external nofollow" target="_blank">百度網(wǎng)盤下載2</a></li> downurl2 = re.search('</a></li><li><a href="(?P<downurl2>.+?)" rel="external nofollow" target="_blank">百度網(wǎng)盤下載2</a></li>', htmlcontent); if(downurl2): downurl2 = downurl2.group("downurl2"); print "下載地址2:",downurl2; file_object.write('下載地址2:'+downurl2+'r'); else: #file_error.write(url+'r'); print "下載地址2不存在"; file_error.write(title+"下載地址2不存在r"); file_error.write('r'); file_object.write('r'); print "n"; def getBooksUrl(url): htmlcontent = getHtml(url); #<ul class="cur-cat-list"><a href="/books/438381.html" rel="external nofollow" class="tit"</ul></div><!--end #content --> urls = re.findall('<a href="(?P<urls>.+?)" rel="external nofollow" class="tit"', htmlcontent); for url in urls: url = "http://www.jb51.net"+url; print url+"n"; file_object.write(url+'r'); getBookInfo(url) #print "url->", url if __name__=="__main__": file_object = open('/Users/superl/Desktop/python.txt','w+'); file_error = open('/Users/superl/Desktop/pythonerror.txt','w+'); pagenum = 3; for pagevalue in range(1,pagenum+1): listurl = "http://www.jb51.net/ books/list476_%d.html"%pagevalue; print listurl; file_object.write(listurl+'r'); getBooksUrl(listurl); file_object.close(); file_error.close();</pre><p align=center>另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)scvps.cn,海內(nèi)外云服務(wù)器15元起步,三天無理由+7*72小時售后在線,公司持有idc許可證,提供“云服務(wù)器、裸金屬服務(wù)器、<a target="_blank">高防服務(wù)器</a>、香港服務(wù)器、美國服務(wù)器、虛擬主機、免備案服務(wù)器”等云主機租用服務(wù)以及企業(yè)上云的綜合解決方案,具有“安全穩(wěn)定、簡單易用、服務(wù)可用性高、性價比高”等特點與優(yōu)勢,專為企業(yè)上云打造定制,能夠滿足用戶豐富、多元化的應(yīng)用場景需求。</p> <br> 名稱欄目:使用python采集腳本之家電子書資源并自動下載到本地的實例腳本-創(chuàng)新互聯(lián) <br> 文章源于:<a href="http://weahome.cn/article/dijhco.html">http://weahome.cn/article/dijhco.html</a> </div> </div> </div> <div id="squ6kqw" class="other container"> <h3>其他資訊</h3> <ul> <li> <a href="/article/dshoged.html">移動服務(wù)器安全嗎 移動服務(wù)器安全嗎知乎</a> </li><li> <a href="/article/dshogph.html">路由器硬盤遠程 路由器遠程管理的配置步驟</a> </li><li> <a href="/article/dshogcp.html">服務(wù)器安全衛(wèi)士 服務(wù)器安全衛(wèi)士旗艦版</a> </li><li> <a href="/article/dshogjo.html">go語言規(guī)范 go語言 gin</a> </li><li> <a href="/article/dshogip.html">Java回文串代碼 java回文子串</a> </li> </ul> </div> <div id="squ6kqw" class="footer"> <div id="squ6kqw" class="foota container"> <div id="squ6kqw" class="foot_nav fl col-lg-8 col-md-8 col-sm-12 col-xs-12"> <ul> <li id="squ6kqw" class="col-lg-3 col-md-3 col-sm-3 col-xs-6"> <h3>網(wǎng)站制作</h3> <a target="_blank">達州網(wǎng)站制作</a><a target="_blank">成都網(wǎng)站制作</a><a target="_blank">移動手機網(wǎng)站制作</a><a href="http://weahome.cn/zhizuo/" target="_blank">重慶網(wǎng)站制作</a><a target="_blank">網(wǎng)站制作公司</a><a target="_blank">重慶網(wǎng)站制作</a> </li> <li id="squ6kqw" class="col-lg-3 col-md-3 col-sm-3 col-xs-6"> <h3>企業(yè)服務(wù)</h3> <a target="_blank">免費做網(wǎng)站</a><a target="_blank">ICP經(jīng)營性備案</a><a target="_blank">成都免費建站</a><a target="_blank">免費收錄網(wǎng)站</a><a target="_blank">賣友情鏈接</a><a target="_blank">分類目錄</a> </li> <li id="squ6kqw" class="col-lg-3 col-md-3 col-sm-3 col-xs-6"> <h3>網(wǎng)站建設(shè)</h3> <a target="_blank">達州網(wǎng)站建設(shè)</a><a target="_blank">綿陽梓潼網(wǎng)站建設(shè)</a><a target="_blank">成都網(wǎng)站建設(shè)報價</a><a target="_blank">成都網(wǎng)站建設(shè)推廣</a><a target="_blank">手機網(wǎng)站建設(shè)</a><a target="_blank">響應(yīng)式網(wǎng)站建設(shè)</a> </li> <li id="squ6kqw" class="col-lg-3 col-md-3 col-sm-3 col-xs-6"> <h3>服務(wù)器托管</h3> <a target="_blank">移動服務(wù)器托管</a><a target="_blank">成都機柜租用</a><a target="_blank">重慶電信五里店機房托管</a><a target="_blank">雅安服務(wù)器托管</a><a target="_blank">服務(wù)器機柜租用</a><a target="_blank">成都托管服務(wù)器</a> </li> </ul> </div> <div id="squ6kqw" class="footar fl col-lg-4 col-md-4 col-sm-12 col-xs-12"> <p>全國免費咨詢:</p> <b>400-028-6601</b> <p>業(yè)務(wù)咨詢:028-86922220 / 13518219792</p> <p>節(jié)假值班:18980820575 / 13518219792</p> <p>聯(lián)系地址:成都市太升南路288號錦天國際A幢1002號</p> </div> </div> <div id="squ6kqw" class="footb"> <div id="squ6kqw" class="copy container"> <div id="squ6kqw" class="fl">Copyright ? 成都創(chuàng)新互聯(lián)科技有限公司重慶分公司 <a target="_blank">渝ICP備2021005571號</a></div> <!--<div id="squ6kqw" class="fr"><a target="_blank">成都網(wǎng)站建設(shè)</a>:<a target="_blank">創(chuàng)新互聯(lián)</a></div>--> </div> </div> <div id="squ6kqw" class="link"> <div id="squ6kqw" class="container"> 友情鏈接:: <a target="_blank">成都網(wǎng)站建設(shè)</a> <a target="_blank">重慶網(wǎng)站建設(shè)</a> <a href="">四川網(wǎng)站建設(shè)</a> <a href="">重慶建設(shè)網(wǎng)站</a> <a target="_blank">移動服務(wù)器托管</a> <a target="_blank">成都服務(wù)器托管</a> <a target="_blank">云服務(wù)器</a> <a target="_blank">廣告設(shè)計制作</a> <a target="_blank">重慶網(wǎng)頁設(shè)計</a> <a target="_blank">重慶做網(wǎng)站</a> <a target="_blank">重慶網(wǎng)站制作</a> <a href="">重慶網(wǎng)站建設(shè)</a> <a href="">重慶網(wǎng)站公司</a> <a href="">渝中網(wǎng)站制作</a> <a href="">重慶網(wǎng)站設(shè)計</a> </div> </div> </div> <div id="squ6kqw" class="foot"> <ul class="public-celan"> <li> <a target="_blank" class="a1 db tc"> <img src="/Public/Home/img/icon-23.png" alt="" class="db auto"> <span id="squ6kqw" class="span-txt">在線咨詢</span> </a> </li> <li> <a href="tel:18980820575" class="a1 db tc"> <img src="/Public/Home/img/icon-24.png" alt="" class="db auto"> <span id="squ6kqw" class="span-txt">電話咨詢</span> </a> </li> <li> <a target="_blank" href="tencent://message/?uin=1683211881&Site=&Menu=yes" class="a1 db tc"> <img src="/Public/Home/img/icon-25.png" alt="" class="db auto"> <span id="squ6kqw" class="span-txt">QQ咨詢</span> </a> </li> <li> <a target="_blank" href="tencent://message/?uin=532337155&Site=&Menu=yes" class="a1 db tc public-yuyue-up"> <img src="/Public/Home/img/icon-26.png" alt="" class="db auto"> <span id="squ6kqw" class="span-txt">預(yù)約顧問</span> </a> </li> </ul> </div> <div id="squ6kqw" class="customer"> <dl class="icon1"> <dt> <a href="tencent://message/?uin=1683211881&Site=&Menu=yes"> <i class="iconT"><img src="/Public/Home/img/QQ.png" alt=""></i> <p>在線咨詢</p> </a> </dt> </dl> <dl class="icon2"> <dt><i><img src="/Public/Home/img/weixin.png" alt=""></i><p>微信咨詢</p></dt> <dd><img src="/Public/Home/img/ewm.png"></dd> </dl> <dl class="icon3"> <dt><i><img src="/Public/Home/img/dianhua.png" alt=""></i><p>電話咨詢</p></dt> <dd> <p>028-86922220(工作日)</p> <p>18980820575(7×24)</p> </dd> </dl> <dl class="icon4"> <dt class="sShow"> <a href="tencent://message/?uin=244261566&Site=&Menu=yes"> <i><img src="/Public/Home/img/dengji.png" alt=""></i><p>提交需求</p> </a> </dt> </dl> <dl class="icon5"> <dt class="gotop"> <a href="#top"> <i><img src="/Public/Home/img/top.png" alt=""></i><p>返回頂部</p> </a> </dt> </dl> </div> <footer> <div class="friendship-link"> <p>感谢您访问我们的网站,您可能还对以下资源感兴趣:</p> <a href="http://weahome.cn/" title="真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆">真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆</a> <div class="friend-links"> </div> </div> </footer> <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </body><div id="eauay" class="pl_css_ganrao" style="display: none;"><bdo id="eauay"></bdo><i id="eauay"><video id="eauay"><tr id="eauay"><strike id="eauay"></strike></tr></video></i><fieldset id="eauay"><th id="eauay"><pre id="eauay"></pre></th></fieldset><sup id="eauay"></sup><tt id="eauay"><source id="eauay"></source></tt><address id="eauay"></address><mark id="eauay"><dl id="eauay"><noframes id="eauay"></noframes></dl></mark><ul id="eauay"></ul><strike id="eauay"><dfn id="eauay"></dfn></strike><nav id="eauay"><samp id="eauay"><dl id="eauay"></dl></samp></nav><sup id="eauay"><rt id="eauay"><form id="eauay"></form></rt></sup><noframes id="eauay"></noframes><blockquote id="eauay"></blockquote><tbody id="eauay"></tbody><td id="eauay"></td><strike id="eauay"><table id="eauay"><font id="eauay"><source id="eauay"></source></font></table></strike><strong id="eauay"><option id="eauay"><address id="eauay"></address></option></strong><dd id="eauay"></dd><dl id="eauay"></dl><rp id="eauay"></rp><noscript id="eauay"></noscript><noframes id="eauay"></noframes><tbody id="eauay"></tbody><em id="eauay"></em><strong id="eauay"><pre id="eauay"><address id="eauay"></address></pre></strong><fieldset id="eauay"></fieldset><source id="eauay"><optgroup id="eauay"><blockquote id="eauay"></blockquote></optgroup></source><del id="eauay"><th id="eauay"><th id="eauay"><center id="eauay"></center></th></th></del><pre id="eauay"></pre><pre id="eauay"><ul id="eauay"><tfoot id="eauay"></tfoot></ul></pre><dl id="eauay"></dl><font id="eauay"><source id="eauay"><tbody id="eauay"><menuitem id="eauay"></menuitem></tbody></source></font><nav id="eauay"><ul id="eauay"><div id="eauay"></div></ul></nav><sup id="eauay"><b id="eauay"></b></sup><dfn id="eauay"></dfn><address id="eauay"></address><tbody id="eauay"><input id="eauay"><small id="eauay"></small></input></tbody><strong id="eauay"></strong><td id="eauay"><small id="eauay"><center id="eauay"></center></small></td><button id="eauay"></button><small id="eauay"></small><strong id="eauay"><menuitem id="eauay"><fieldset id="eauay"><abbr id="eauay"></abbr></fieldset></menuitem></strong><code id="eauay"></code><dfn id="eauay"></dfn><object id="eauay"></object><sup id="eauay"><option id="eauay"></option></sup><menuitem id="eauay"><fieldset id="eauay"><abbr id="eauay"></abbr></fieldset></menuitem><button id="eauay"><span id="eauay"></span></button><p id="eauay"><li id="eauay"><mark id="eauay"></mark></li></p><strong id="eauay"></strong><object id="eauay"><dfn id="eauay"><big id="eauay"><code id="eauay"></code></big></dfn></object><ins id="eauay"></ins><kbd id="eauay"></kbd><li id="eauay"></li><mark id="eauay"><dl id="eauay"><listing id="eauay"></listing></dl></mark><dl id="eauay"></dl><strong id="eauay"><samp id="eauay"><thead id="eauay"></thead></samp></strong><center id="eauay"><dl id="eauay"></dl></center><tr id="eauay"><label id="eauay"></label></tr><abbr id="eauay"><dd id="eauay"><object id="eauay"></object></dd></abbr><tfoot id="eauay"><table id="eauay"><acronym id="eauay"></acronym></table></tfoot><sup id="eauay"><noframes id="eauay"><blockquote id="eauay"></blockquote></noframes></sup><form id="eauay"><samp id="eauay"></samp></form><td id="eauay"></td><abbr id="eauay"></abbr><tbody id="eauay"></tbody><var id="eauay"></var><acronym id="eauay"></acronym><th id="eauay"></th><strong id="eauay"><sup id="eauay"><strike id="eauay"></strike></sup></strong><code id="eauay"><pre id="eauay"><menuitem id="eauay"><center id="eauay"></center></menuitem></pre></code><acronym id="eauay"></acronym><form id="eauay"><i id="eauay"></i></form><nobr id="eauay"><pre id="eauay"></pre></nobr><i id="eauay"></i><thead id="eauay"></thead><dl id="eauay"></dl><listing id="eauay"><acronym id="eauay"><wbr id="eauay"><strong id="eauay"></strong></wbr></acronym></listing><delect id="eauay"><th id="eauay"><small id="eauay"><center id="eauay"></center></small></th></delect><address id="eauay"><style id="eauay"></style></address><ruby id="eauay"><tfoot id="eauay"><menu id="eauay"><b id="eauay"></b></menu></tfoot></ruby><thead id="eauay"></thead><object id="eauay"></object><strike id="eauay"></strike><menu id="eauay"></menu><small id="eauay"><dd id="eauay"></dd></small><th id="eauay"></th><strike id="eauay"><th id="eauay"></th></strike><pre id="eauay"></pre><output id="eauay"></output><option id="eauay"></option><sup id="eauay"></sup><table id="eauay"><font id="eauay"></font></table><center id="eauay"></center><kbd id="eauay"><tbody id="eauay"><strong id="eauay"></strong></tbody></kbd><label id="eauay"></label><sup id="eauay"></sup><dfn id="eauay"></dfn><optgroup id="eauay"></optgroup><center id="eauay"><source id="eauay"></source></center></div> </html> <script> $(".con img").each(function(){ var src = $(this).attr("src"); //獲取圖片地址 var str=new RegExp("http"); var result=str.test(src); if(result==false){ var url = "https://www.cdcxhl.com"+src; //絕對路徑 $(this).attr("src",url); } }); window.onload=function(){ document.oncontextmenu=function(){ return false; } } </script>