這期內(nèi)容當(dāng)中小編將會(huì)給大家?guī)?lái)有關(guān)怎么在python中使用scrapy發(fā)送一個(gè)post請(qǐng)求,文章內(nèi)容豐富且以專業(yè)的角度為大家分析和敘述,閱讀完這篇文章希望大家可以有所收獲。
創(chuàng)新互聯(lián)致力于互聯(lián)網(wǎng)網(wǎng)站建設(shè)與網(wǎng)站營(yíng)銷,提供網(wǎng)站制作、網(wǎng)站設(shè)計(jì)、網(wǎng)站開(kāi)發(fā)、seo優(yōu)化、網(wǎng)站排名、互聯(lián)網(wǎng)營(yíng)銷、微信小程序開(kāi)發(fā)、公眾號(hào)商城、等建站開(kāi)發(fā),創(chuàng)新互聯(lián)網(wǎng)站建設(shè)策劃專家,為不同類型的客戶提供良好的互聯(lián)網(wǎng)應(yīng)用定制解決方案,幫助客戶在新的全球化互聯(lián)網(wǎng)環(huán)境中保持優(yōu)勢(shì)。使用requests發(fā)送post請(qǐng)求
先來(lái)看看使用requests來(lái)發(fā)送post請(qǐng)求是多少好用,發(fā)送請(qǐng)求
Requests 簡(jiǎn)便的 API 意味著所有 HTTP 請(qǐng)求類型都是顯而易見(jiàn)的。例如,你可以這樣發(fā)送一個(gè) HTTP POST 請(qǐng)求:
>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})
使用data可以傳遞字典作為參數(shù),同時(shí)也可以傳遞元祖
>>>payload = (('key1', 'value1'), ('key1', 'value2')) >>>r = requests.post('http://httpbin.org/post', data=payload) >>>print(r.text) { ... "form": { "key1": [ "value1", "value2" ] }, ... }
傳遞json是這樣
>>>import json >>>url = 'https://api.github.com/some/endpoint' >>>payload = {'some': 'data'} >>>r = requests.post(url, data=json.dumps(payload))
2.4.2 版的新加功能:
>>>url = 'https://api.github.com/some/endpoint' >>>payload = {'some': 'data'} >>>r = requests.post(url, json=payload)
也就是說(shuō),你不需要對(duì)參數(shù)做什么變化,只需要關(guān)注使用data=還是json=,其余的requests都已經(jīng)幫你做好了。
使用scrapy發(fā)送post請(qǐng)求
通過(guò)源碼可知scrapy默認(rèn)發(fā)送的get請(qǐng)求,當(dāng)我們需要發(fā)送攜帶參數(shù)的請(qǐng)求或登錄時(shí),是需要post、請(qǐng)求的,以下面為例
from scrapy.spider import CrawlSpider from scrapy.selector import Selector import scrapy import json class LaGou(CrawlSpider): name = 'myspider' def start_requests(self): yield scrapy.FormRequest( url='https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false', formdata={ 'first': 'true',#這里不能給bool類型的True,requests模塊中可以 'pn': '1',#這里不能給int類型的1,requests模塊中可以 'kd': 'python' },這里的formdata相當(dāng)于requ模塊中的data,key和value只能是鍵值對(duì)形式 callback=self.parse ) def parse(self, response): datas=json.loads(response.body.decode())['content']['positionResult']['result'] for data in datas: print(data['companyFullName'] + str(data['positionId']))
官方推薦的 Using FormRequest to send data via HTTP POST
return [FormRequest(url="http://www.example.com/post/action", formdata={'name': 'John Doe', 'age': '27'}, callback=self.after_post)]
這里使用的是FormRequest,并使用formdata傳遞參數(shù),看到這里也是一個(gè)字典。
但是,超級(jí)坑的一點(diǎn)來(lái)了,今天折騰了一下午,使用這種方法發(fā)送請(qǐng)求,怎么發(fā)都會(huì)出問(wèn)題,返回的數(shù)據(jù)一直都不是我想要的
return scrapy.FormRequest(url, formdata=(payload))
在網(wǎng)上找了很久,最終找到一種方法,使用scrapy.Request發(fā)送請(qǐng)求,就可以正常的獲取數(shù)據(jù)。
復(fù)制代碼 代碼如下:
return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)
參考:Send Post Request in Scrapy
my_data = {'field1': 'value1', 'field2': 'value2'} request = scrapy.Request( url, method='POST', body=json.dumps(my_data), headers={'Content-Type':'application/json'} )
FormRequest 與 Request 區(qū)別
在文檔中,幾乎看不到差別,
The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.
說(shuō)FormRequest新增加了一個(gè)參數(shù)formdata,接受包含表單數(shù)據(jù)的字典或者可迭代的元組,并將其轉(zhuǎn)化為請(qǐng)求的body。并且FormRequest是繼承Request的
class FormRequest(Request): def __init__(self, *args, **kwargs): formdata = kwargs.pop('formdata', None) if formdata and kwargs.get('method') is None: kwargs['method'] = 'POST' super(FormRequest, self).__init__(*args, **kwargs) if formdata: items = formdata.items() if isinstance(formdata, dict) else formdata querystr = _urlencode(items, self.encoding) if self.method == 'POST': self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded') self._set_body(querystr) else: self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr) ### def _urlencode(seq, enc): values = [(to_bytes(k, enc), to_bytes(v, enc)) for k, vs in seq for v in (vs if is_listlike(vs) else [vs])] return urlencode(values, doseq=1)
最終我們傳遞的{‘key': ‘value', ‘k': ‘v'}會(huì)被轉(zhuǎn)化為'key=value&k=v' 并且默認(rèn)的method是POST,再來(lái)看看Request
class Request(object_ref): def __init__(self, url, callback=None, method='GET', headers=None, body=None, cookies=None, meta=None, encoding='utf-8', priority=0, dont_filter=False, errback=None, flags=None): self._encoding = encoding # this one has to be set first self.method = str(method).upper()
默認(rèn)的方法是GET,其實(shí)并不影響。仍然可以發(fā)送post請(qǐng)求。這讓我想起來(lái)requests中的request用法,這是定義請(qǐng)求的基礎(chǔ)方法。
def request(method, url, **kwargs): """Constructs and sends a :class:`Request`. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`. :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json data to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How many seconds to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) ` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) Either a boolean, in which case it controls whether we verify the server's TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. :return: :class:`Response ` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request('GET', 'http://httpbin.org/get') """ # By using the 'with' statement we are sure the session is closed, thus we # avoid leaving sockets open which can trigger a ResourceWarning in some # cases, and look like a memory leak in others. with sessions.Session() as session: return session.request(method=method, url=url, **kwargs)
上述就是小編為大家分享的怎么在python中使用scrapy發(fā)送一個(gè)post請(qǐng)求了,如果剛好有類似的疑惑,不妨參照上述分析進(jìn)行理解。如果想知道更多相關(guān)知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道。