這期內(nèi)容當中小編將會給大家?guī)碛嘘PPython中各類URL采集器編寫腳本是怎樣的,文章內(nèi)容豐富且以專業(yè)的角度為大家分析和敘述,閱讀完這篇文章希望大家可以有所收獲。
站在用戶的角度思考問題,與客戶深入溝通,找到紅安網(wǎng)站設計與紅安網(wǎng)站推廣的解決方案,憑借多年的經(jīng)驗,讓設計與互聯(lián)網(wǎng)技術結合,創(chuàng)造個性化、用戶體驗好的作品,建站類型包括:成都做網(wǎng)站、成都網(wǎng)站制作、企業(yè)官網(wǎng)、英文網(wǎng)站、手機端網(wǎng)站、網(wǎng)站推廣、空間域名、網(wǎng)頁空間、企業(yè)郵箱。業(yè)務覆蓋紅安地區(qū)。
0x02 ZoomEyeAPI腳本編寫
ZoomEye是一款針對網(wǎng)絡空間的搜索引擎,收錄了互聯(lián)網(wǎng)空間中的設備、網(wǎng)站及其使用的服務或組件等信息。
ZoomEye 擁有兩大探測引擎:Xmap 和 Wmap,分別針對網(wǎng)絡空間中的設備及網(wǎng)站, 通過 24 小時不間斷的探測、識別,標識出互聯(lián)網(wǎng)設備及網(wǎng)站所使用的服務及組件。 研究人員可以通過 ZoomEye 方便的了解組件的普及率及漏洞的危害范圍等信息。
雖然被稱為 “黑客友好” 的搜索引擎,但 ZoomEye 并不會主動對網(wǎng)絡設備、網(wǎng)站發(fā)起攻擊,收錄的數(shù)據(jù)也僅用于安全研究。ZoomEye更像是互聯(lián)網(wǎng)空間的一張航海圖。
先登錄,然后獲取access_token
#-*- coding: UTF-8 -*- import requests import json user = raw_input('[-] PLEASE INPUT YOUR USERNAME:') passwd = raw_input('[-] PLEASE INPUT YOUR PASSWORD:') def Login(): data_info = {'username' : user,'password' : passwd} data_encoded = json.dumps(data_info) respond = requests.post(url = 'https://api.zoomeye.org/user/login',data = data_encoded) try: r_decoded = json.loads(respond.text) access_token = r_decoded['access_token'] except KeyError: return '[-] INFO : USERNAME OR PASSWORD IS WRONG, PLEASE TRY AGAIN' return access_token if __name__ == '__main__': print Login()
然后,API手冊是這么寫的,根據(jù)這個,咱們先寫一個HOST的單頁面采集的....
#-*- coding: UTF-8 -*- import requests import json user = raw_input('[-] PLEASE INPUT YOUR USERNAME:') passwd = raw_input('[-] PLEASE INPUT YOUR PASSWORD:') def Login(): data_info = {'username' : user,'password' : passwd} data_encoded = json.dumps(data_info) respond = requests.post(url = 'https://api.zoomeye.org/user/login',data = data_encoded) try: r_decoded = json.loads(respond.text) access_token = r_decoded['access_token'] except KeyError: return '[-] INFO : USERNAME OR PASSWORD IS WRONG, PLEASE TRY AGAIN' return access_token def search(): headers = {'Authorization': 'JWT ' + Login()} r = requests.get(url = 'https://api.zoomeye.org/host/search?query=tomcat&page=1', headers = headers) response = json.loads(r.text) print response if __name__ == '__main__': search()
返回的信息量極大啊,但它也是個JSON數(shù)據(jù),SO,我們可以取出IP部分...
for x in response['matches']: print x['ip']
之后,HOST的單頁面采集也就OK了,WEB的也五五開,留著你們自己分析,其實差不多,后文會貼的
接下來,就是用FOR循環(huán)....獲取多頁的IP
#-*- coding: UTF-8 -*- import requests import json def Login(): data_info = {'username' : user,'password' : passwd} data_encoded = json.dumps(data_info) respond = requests.post(url = 'https://api.zoomeye.org/user/login',data = data_encoded) try: r_decoded = json.loads(respond.text) access_token = r_decoded['access_token'] except KeyError: return '[-] INFO : USERNAME OR PASSWORD IS WRONG, PLEASE TRY AGAIN' return access_token def search(): headers = {'Authorization': 'JWT ' + Login()} for i in range(1,int(PAGECOUNT)): r = requests.get(url = 'https://api.zoomeye.org/host/search?query=tomcat&page='+str(i), headers = headers) response = json.loads(r.text) for x in response['matches']: print x['ip'] if __name__ == '__main__': user = raw_input('[-] PLEASE INPUT YOUR USERNAME:') passwd = raw_input('[-] PLEASE INPUT YOUR PASSWORD:') PAGECOUNT = raw_input('[-] PLEASE INPUT YOUR SEARCH_PAGE_COUNT(eg:10):') search()
這樣就取出了你想要的頁碼的數(shù)據(jù),然后就是完善+美觀代碼了.....
#-*- coding: UTF-8 -*- import requests import json def Login(user,passwd): data_info = {'username' : user,'password' : passwd} data_encoded = json.dumps(data_info) respond = requests.post(url = 'https://api.zoomeye.org/user/login',data = data_encoded) try: r_decoded = json.loads(respond.text) access_token = r_decoded['access_token'] except KeyError: return '[-] INFO : USERNAME OR PASSWORD IS WRONG, PLEASE TRY AGAIN' return access_token def search(queryType,queryStr,PAGECOUNT,user,passwd): headers = {'Authorization': 'JWT ' + Login(user,passwd)} for i in range(1,int(PAGECOUNT)): r = requests.get(url = 'https://api.zoomeye.org/'+ queryType +'/search?query='+queryStr+'&page=' + str(i), headers = headers) response = json.loads(r.text) try: if queryType == "host": for x in response['matches']: print x['ip'] if queryType == "web": for x in response['matches']: print x['ip'][0] except KeyError: print "[ERROR] No hosts found" def main(): print " _____ _____ ____ " print "|__ /___ ___ _ __ ___ | ____| _ ___/ ___| ___ __ _ _ __" print " / // _ \ / _ \| '_ ` _ \| _|| | | |/ _ \___ \ / __/ _` | '_ \ " print " / /| (_) | (_) | | | | | | |__| |_| | __/___) | (_| (_| | | | |" print "/____\___/ \___/|_| |_| |_|_____\__, |\___|____/ \___\__,_|_| |_|" print " |___/ " user = raw_input('[-] PLEASE INPUT YOUR USERNAME:') passwd = raw_input('[-] PLEASE INPUT YOUR PASSWORD:') PAGECOUNT = raw_input('[-] PLEASE INPUT YOUR SEARCH_PAGE_COUNT(eg:10):') queryType = raw_input('[-] PLEASE INPUT YOUR SEARCH_TYPE(eg:web/host):') queryStr = raw_input('[-] PLEASE INPUT YOUR KEYWORD(eg:tomcat):') Login(user,passwd) search(queryType,queryStr,PAGECOUNT,user,passwd) if __name__ == '__main__': main()
0x03 ShoDanAPI腳本編寫
Shodan是互聯(lián)網(wǎng)上最可怕的搜索引擎。
CNNMoney的一篇文章寫道,雖然目前人們都認為谷歌是最強勁的搜索引擎,但Shodan才是互聯(lián)網(wǎng)上最可怕的搜索引擎。
與谷歌不同的是,Shodan不是在網(wǎng)上搜索網(wǎng)址,而是直接進入互聯(lián)網(wǎng)的背后通道。Shodan可以說是一款“黑暗”谷歌,一刻不停的在尋找著所有和互聯(lián)網(wǎng)關聯(lián)的服務器、攝像頭、打印機、路由器等等。每個月Shodan都會在大約5億個服務器上日夜不停地搜集信息。
Shodan所搜集到的信息是極其驚人的。凡是鏈接到互聯(lián)網(wǎng)的紅綠燈、安全攝像頭、家庭自動化設備以及加熱系統(tǒng)等等都會被輕易的搜索到。Shodan的使用者曾發(fā)現(xiàn)過一個水上公園的控制系統(tǒng),一個加油站,甚至一個酒店的葡萄酒冷卻器。而網(wǎng)站的研究者也曾使用Shodan定位到了核電站的指揮和控制系統(tǒng)及一個粒子回旋加速器。
Shodan真正值得注意的能力就是能找到幾乎所有和互聯(lián)網(wǎng)相關聯(lián)的東西。而Shodan真正的可怕之處就是這些設備幾乎都沒有安裝安全防御措施,其可以隨意進入。
淺安dalao寫過,介紹的也很詳細.....
地址傳送門:基于ShodanApi接口的調(diào)用python版
先說基于API查詢。。。官方文檔:http://shodan.readthedocs.io/en/latest/tutorial.html
每次查詢要扣除1積分.....,而用shodan庫模塊不需要....
寫個簡單的,他跟Zoomeye的五五開,就不細寫了...
#-*- coding: UTF-8 -*- import requests import json def getip(): API_KEY = ************* url = 'https://api.shodan.io/shodan/host/search?key='+API_KEY+'&query=apache' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} req = requests.get(url=url,headers=headers) content = json.loads(req.text) for i in content['matches']: print i['ip_str'] if __name__ == '__main__': getip()
接下來,就是基于shodan模塊的...直接引用淺安dalao的。。。我懶得寫....
安裝:pip install shodan
#-*- coding: UTF-8 -*- import shodan import sys API_KEY = ‘YOU_API_KEY’ #調(diào)用shodan api FACETS = [ ('country',100), # 匹配出前一百位的國家數(shù)量,100可自定義 ] FACET_TITLES = { 'country': 'Top 100 Countries', } #輸入判斷 if len(sys.argv) == 1: print 'Search Method:Input the %s and then the keyword' % sys.argv[0] sys.exit() try: api = shodan.Shodan(API_KEY) query = ' '.join(sys.argv[1:]) print "You Search is:" + query result = api.count(query, facets=FACETS) # 使用count比search快 for facet in result['facets']: print FACET_TITLES[facet] for key in result['facets'][facet]: countrie = '%s : %s' % (key['value'], key['count']) print countrie with open(u"搜索" + " " + query + " " + u"關鍵字" +'.txt','a+') as f: f.write(countrie +"\n") f.close() print " " print "save is coutures.txt" print "Search is Complete." except Exception, e: print 'Error: %s' % e
0x04 簡易BaiduURL采集腳本編寫
先是爬去單頁的URL,舉個栗子是爬去阿甫哥哥這個關鍵字的URL
#-*- coding: UTF-8 -*- import requests from bs4 import BeautifulSoup as bs import re def getfromBaidu(word): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} url = 'https://www.baidu.com.cn/s?wd=' + word + '&pn=1' html = requests.get(url=url,headers=headers,timeout=5) soup = bs(html.content, 'lxml', from_encoding='utf-8') bqs = soup.find_all(name='a', attrs={'data-click':re.compile(r'.'), 'class':None}) for i in bqs: r = requests.get(i['href'], headers=headers, timeout=5) print r.url if __name__ == '__main__': getfromBaidu('阿甫哥哥')
然后是多頁的爬取,比如爬取前20頁的
#-*- coding: UTF-8 -*- import requests from bs4 import BeautifulSoup as bs import re def getfromBaidu(word,pageout): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} for k in range(0,(pageout-1)*10,10): url = 'https://www.baidu.com.cn/s?wd=' + word + '&pn=' + str(k) html = requests.get(url=url,headers=headers,timeout=5) soup = bs(html.content, 'lxml', from_encoding='utf-8') bqs = soup.find_all(name='a', attrs={'data-click':re.compile(r'.'), 'class':None}) for i in bqs: r = requests.get(i['href'], headers=headers, timeout=5) print r.url if __name__ == '__main__': getfromBaidu('阿甫哥哥',10)
0x05 【彩蛋篇】論壇自動簽到腳本
之前其實貼出來了,只是怕有些人沒看到....在分享一次....
簽到可以獲取大量魔法幣....他的多種獲取方法,請戳:
https://bbs.ichunqiu.com/thread-36007-1-1.html
實現(xiàn)方法只需要將COOKIE修改為你的即可
實現(xiàn)功能是每天24點自動簽到...掛在服務器上即可....
#-*- coding: UTF-8 -*- import requests import datetime import time import re def sign(): url = 'https://bbs.ichunqiu.com/plugin.php?id=dsu_paulsign:sign' cookie = {'__jsluid':'3e29e6c**********8966d9e0a481220',' UM_distinctid':'1605f635c78159************016-5d4e211f-1fa400-1605f635c7ac0',' pgv_pvi':'4680553472',******...........} headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} r = requests.get(url=url,cookies=cookie,headers=headers) rows = re.findall(r'', r.content) if len(rows)!=0: formhash = rows[0] print '[-]Formhash is: ' + formhash else: print '[-]None formhash!' if '您今天已經(jīng)簽到過了或者簽到時間還未開始' in r.text: print '[-]Already signed!!' else: sign_url = 'https://bbs.ichunqiu.com/plugin.php?id=dsu_paulsign:sign&operation=qiandao&infloat=1&inajax=1' sign_payload = { 'formhash':formhash, 'qdxq':'fd', 'qdmode':'2', 'todaysay':'', 'fastreply':0, } sign_req = requests.post(url=sign_url,data=sign_payload,headers=headers,cookies=cookie) if '簽到成功' in sign_req.text: print '[-]Sign success!!' else: print '[-]Something error...' time.sleep(60) def main(h=0, m=0): while True: while True: now = datetime.datetime.now() if now.hour==h and now.minute==m: break time.sleep(20) sign() if __name__ == '__main__': main()
上述就是小編為大家分享的Python中各類URL采集器編寫腳本是怎樣的了,如果剛好有類似的疑惑,不妨參照上述分析進行理解。如果想知道更多相關知識,歡迎關注創(chuàng)新互聯(lián)行業(yè)資訊頻道。