четверг, 17 сентября 2015 г.

Twisted.python.failure.Failure after Scrapy fetch()

I encountered this error when I tried to fetch() scrapy.Request containing form data in scrapy shell:


r = Request(url=url, 
            body=urllib.urlencode({'formparam1':'value1'}), 
            dont_filter=True)
fetch(r)

With the following output:
2015-09-17 14:48:03 [scrapy] DEBUG: Retrying <GET http://www.website.com/form.aspx> (failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-09-17 14:48:03 [scrapy] DEBUG: Retrying <GET http://www.website.com/form.aspx> (failed 2 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-09-17 14:48:03 [scrapy] DEBUG: Gave up retrying <GET http://www.website.com/form.aspx> (failed 3 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]

Instead, I should be using scrapy.FormRequest which is able to handle form data natively.

r = FormRequest(url=url, 
                formdata=payload, 
                dont_filter=True) 
fetch(r)

2015-09-17 14:53:07 [scrapy] DEBUG: Redirecting (302) to <GET http://www.website.com/form.aspx> from <POST http://www.website.com/form.aspx>
2015-09-17 14:53:08 [scrapy] DEBUG: Crawled (200) <GET http://www.website.com/form.aspx> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f343c6d1b10>
[s]   item       {}
[s]   r          <POST hhttp://www.website.com/form.aspx>
[s]   request    <POST http://www.website.com/form.aspx>
[s]   response   <200 http://www.website.com/form.aspx>
[s]   settings   <scrapy.settings.Settings object at 0x7f343c6d1a90>
[s]   spider     <Spider 'spider' at 0x7f343ae50f50>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser