Python3.x+迅雷x自動下載高分電影的實現(xiàn)方法-創(chuàng)新互聯(lián)

快要過年了，大家都在忙些什么呢？一到年底公司各種搶票，備年貨，被這過年的氣氛一烘，都歸心似箭，哪還有心思上班啊。歸心似箭=產(chǎn)出低下=一行代碼十個錯=無聊。于是想起了以前學(xué)過一段時間的Python，自己平時也挺愛看電影的，手動點進去看電影詳情然后一部一部的去下載太煩了，何不用Python寫個自動下載電影的工具呢？誒，這么一想就不無聊了。以前還沒那么多XX會員的時候，想看看電影都是去XX天堂去找電影資源，大部分想看的電影還是有的，就它了，爬它！

創(chuàng)新互聯(lián)是專業(yè)的陽江網(wǎng)站建設(shè)公司，陽江接單;提供網(wǎng)站設(shè)計制作、網(wǎng)站建設(shè),網(wǎng)頁設(shè)計,網(wǎng)站設(shè)計,建網(wǎng)站,PHP網(wǎng)站建設(shè)等專業(yè)做網(wǎng)站服務(wù);采用PHP框架,可快速的進行陽江網(wǎng)站開發(fā)網(wǎng)頁制作和功能擴展;專業(yè)做搜索引擎喜愛的網(wǎng)站,專業(yè)的做網(wǎng)站團隊,希望更多企業(yè)前來合作!

話說以前玩Python的時候爬過挺多網(wǎng)站的，都是在公司干的(Python不屬于公司的業(yè)務(wù)范圍，純屬自己折騰著好玩)，我那個負責運維的同事天天跑過來說：你又在爬啥啊，你去看看新聞，某某爬東西又被抓了！出了事你自己負責?。“パ轿业哪镉H，嚇的都沒繼續(xù)玩下去了。這個博客是爬取某天堂的資源(具體是哪個天堂下面的代碼里會有的)，會不會被抓??？單純的作為技術(shù)討論，個人練手，不做商業(yè)用途應(yīng)該沒事吧？寫到這里小手不禁微微顫抖...

得嘞，死就死吧，我不入地獄誰入地獄，先看最終實現(xiàn)效果:

如上，這個下載工具是有界面的(牛皮吧)，只要輸入一個根地址和電影評分，就可以自動爬電影了，要完成這個工具需要具備以下知識點：

PyCharm的安裝和使用 這個不多說，猿們都懂，不屬于猿類的我也沒辦法科普了，就是個IDE
tkinter 這是個Python GUI開發(fā)的庫，圖中這個簡陋的可憐的界面就是基于TK開發(fā)的，不想要界面也可以去掉，絲毫不影響爬電影，加上用戶界面可以顯得屌一點，當然最主要的是我想學(xué)習(xí)一點新知識靜態(tài)網(wǎng)頁的分析技巧 相對于動態(tài)網(wǎng)站的爬取，靜態(tài)網(wǎng)站的爬取就顯得小菜了，F(xiàn)12會按吧，右鍵查看網(wǎng)頁源代碼會吧，通過這些簡單的操作就可以查看網(wǎng)頁的排版布局規(guī)則，然后根據(jù)這些規(guī)則寫爬蟲，soeasy
數(shù)據(jù)持久化 已經(jīng)下載過的電影，下次再爬電影的時候不希望再下載一次吧，那就把下載過的鏈接存儲起來，下載電影之前去比對是否下載過，以過濾重復(fù)下載
迅雷X的下載安裝 這個就更不用多說了，作為當代社會主義有為青年，誰沒用過迅雷？誰的硬盤里沒有幾部動作類型的片子？

差不多就這些了，至于實現(xiàn)的技術(shù)細節(jié)的話，也不多，requests+BeautifulSoup的使用，re正則，Python數(shù)據(jù)類型，Python線程，dbm、pickle等數(shù)據(jù)持久化庫的使用，等等，這個工具也就這么些知識范疇了。當然，Python是面向?qū)ο蟮模幊趟枷胧撬姓Z言通用的，這個不是一朝一夕的事，也沒辦法通過語言描述清楚。各位對號入座，以上哪個知識面不足的自己去翻資料學(xué)習(xí)，我可是直接貼代碼的。

說到Python的學(xué)習(xí)還是多說兩句吧，以前學(xué)習(xí)Python爬蟲的時候看的是 @工匠若水 https://blog.csdn.net/yanbober的博客，這哥們的Python文章寫的真不錯，對于有過編程經(jīng)驗卻從沒接觸過Python的人很有幫助，基本上很快就能上手一個小項目。得嘞，擼代碼：

import url_manager
import html_parser
import html_download
import persist_util
from tkinter import *
from threading import Thread
import os
 
class SpiderMain(object):
  def __init__(self):
    self.mUrlManager = url_manager.UrlManager()
    self.mHtmlParser = html_parser.HtmlParser()
    self.mHtmlDownload = html_download.HtmlDownload()
    self.mPersist = persist_util.PersistUtil()
 
  # 加載歷史下載鏈接
  def load_history(self):
    history_download_links = self.mPersist.load_history_links()
    if history_download_links is not None and len(history_download_links) > 0:
      for download_link in history_download_links:
        self.mUrlManager.add_download_url(download_link)
        d_log("加載歷史下載鏈接: " + download_link)
 
  # 保存歷史下載鏈接
  def save_history(self):
    history_download_links = self.mUrlManager.get_download_url()
    if history_download_links is not None and len(history_download_links) > 0:
      self.mPersist.save_history_links(history_download_links)
 
  def craw_movie_links(self, root_url, score=8):
    count = 0;
    self.mUrlManager.add_url(root_url)
    while self.mUrlManager.has_continue():
      try:
        count = count + 1
        url = self.mUrlManager.get_url()
        d_log("craw %d : %s" % (count, url))
        headers = {
          'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36',
          'Referer': url
        }
        content = self.mHtmlDownload.down_html(url, retry_count=3, headers=headers)
        if content is not None:
          doc = content.decode('gb2312', 'ignore')
          movie_urls, next_link = self.mHtmlParser.parser_movie_link(doc)
          if movie_urls is not None and len(movie_urls) > 0:
            for movie_url in movie_urls:
              d_log('movie info url: ' + movie_url)
              content = self.mHtmlDownload.down_html(movie_url, retry_count=3, headers=headers)
              if content is not None:
                doc = content.decode('gb2312', 'ignore')
                movie_name, movie_score, movie_xunlei_links = self.mHtmlParser.parser_movie_info(doc, score=score)
                if movie_xunlei_links is not None and len(movie_xunlei_links) > 0:
                  for xunlei_link in movie_xunlei_links:
                    # 判斷該電影是否已經(jīng)下載過了
                    is_download = self.mUrlManager.has_download(xunlei_link)
                    if is_download == False:
                      # 沒下載過的電影添加到迅雷下載列表
                      d_log('開始下載 ' + movie_name + ', 鏈接地址: ' + xunlei_link)
                      self.mUrlManager.add_download_url(xunlei_link)
                      os.system(r'"D:\迅雷\Thunder\Program\Thunder.exe" {url}'.format(url=xunlei_link))
                      # 每下載一部電影都實時更新數(shù)據(jù)庫，這樣可以保證即使程序異常退出也不會重復(fù)下載該電影
                      self.save_history()
          if next_link is not None:
            d_log('next link: ' + next_link)
            self.mUrlManager.add_url(next_link)
      except Exception as e:
        d_log('錯誤信息: ' + str(e))
 
 
def runner(rootLink=None, scoreLimit=None):
  if rootLink is None:
    return
  spider = SpiderMain()
  spider.load_history()
  if scoreLimit is None:
    spider.craw_movie_links(rootLink)
  else:
    spider.craw_movie_links(rootLink, score=float(scoreLimit))
  spider.save_history()
 
# rootLink = 'https://www.dytt8.net/html/gndy/dyzz/index.html'
# rootLink = 'https://www.dytt8.net/html/gndy/dyzz/list_23_207.html'
def start(rootLink, scoreLimit):
  loop_thread = Thread(target=runner, args=(rootLink, scoreLimit,), name='LOOP THREAD')
  #loop_thread.setDaemon(True)
  loop_thread.start()
  #loop_thread.join() # 不能讓主線程等待，否則GUI界面將卡死
  btn_start.configure(state='disable')
 
# 刷新GUI界面，文字滾動效果
def d_log(log):
  s = log + '\n'
  txt.insert(END, s)
  txt.see(END)
 
if __name__ == "__main__":
  rootGUI = Tk()
  rootGUI.title('XX電影自動下載工具')
  # 設(shè)置窗體背景顏色
  black_background = '#000000'
  rootGUI.configure(background=black_background)
  # 獲取屏幕寬度和高度
  screen_w, screen_h = rootGUI.maxsize()
  # 居中顯示窗體
  window_x = (screen_w - 640) / 2
  window_y = (screen_h - 480) / 2
  window_xy = '640x480+%d+%d' % (window_x, window_y)
  rootGUI.geometry(window_xy)
 
  lable_link = Label(rootGUI, text='解析根地址: ',\
            bg='black',\
            fg='red', \
            font=('宋體', 12), \
            relief=FLAT)
  lable_link.place(x=20, y=20)
 
  lable_link_width = lable_link.winfo_reqwidth()
  lable_link_height = lable_link.winfo_reqheight()
 
  input_link = Entry(rootGUI)
  input_link.place(x=20+lable_link_width, y=20, relwidth=0.5)
 
  lable_score = Label(rootGUI, text='電影評分限制: ', \
            bg='black', \
            fg='red', \
            font=('宋體', 12), \
            relief=FLAT)
  lable_score.place(x=20, y=20+lable_link_height+10)
 
  input_score = Entry(rootGUI)
  input_score.place(x=20+lable_link_width, y=20+lable_link_height+10, relwidth=0.3)
 
  btn_start = Button(rootGUI, text='開始下載', command=lambda: start(input_link.get(), input_score.get()))
  btn_start.place(relx=0.4, rely=0.2, relwidth=0.1, relheight=0.1)
 
  txt = Text(rootGUI)
  txt.place(rely=0.4, relwidth=1, relheight=0.5)
 
  rootGUI.mainloop()

網(wǎng)站標題：Python3.x+迅雷x自動下載高分電影的實現(xiàn)方法-創(chuàng)新互聯(lián)
地址分享：http://weahome.cn/article/doopch.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

Python3.x+迅雷x自動下載高分電影的實現(xiàn)方法-創(chuàng)新互聯(lián)

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管