Spiders Contracts
This allows you to test each callback of your spider by hardcoding a sample url and check various constraints for how the callback processes the response. Each contract is prefixed with an and included in the docstring. See the following example:
This callback is tested using three built-in contracts:
class scrapy.contracts.default.UrlContract[source]
This contract (@url
) sets the sample URL used when checking other contract conditions for this spider. This contract is mandatory. All callbacks lacking this contract are ignored when running the checks:
@url url
class scrapy.contracts.default.CallbackKeywordArgumentsContract
This contract (@cb_kwargs
) sets the cb_kwargs
attribute for the sample request. It must be a valid JSON dictionary.
class scrapy.contracts.default.ReturnsContract[source]
This contract (@returns
) sets lower and upper bounds for the items and requests returned by the spider. The upper bound is optional:
@returns item(s)|request(s) [min [max]]
class scrapy.contracts.default.ScrapesContract
Use the check command to run the contract checks.
If you find you need more power than the built-in Scrapy contracts you can create and load your own contracts in the project by using the setting:
SPIDER_CONTRACTS = {
'myproject.contracts.ItemValidate': 10,
Each contract must inherit from Contract and can override three methods:
class scrapy.contracts.Contract(method, \args*)
Parameters
method (collections.abc.Callable) – callback function to which the contract is associated
args () – list of arguments passed into the docstring (whitespace separated)
adjust_request_args(args)[source]
This receives a
dict
as an argument containing default arguments for request object.Request
is used by default, but this can be changed with therequest_cls
attribute. If multiple contracts in chain have this attribute defined, the last one is used.pre_process(response)
This allows hooking in various checks on the response received from the sample request, before it’s being passed to the callback.
post_process(output)
This allows processing the output of the callback. Iterators are converted listified before being passed to this hook.
Raise from pre_process or if expectations are not met:
class scrapy.exceptions.ContractFail[source]
Error raised in case of a failing contract
Here is a demo contract which checks the presence of a custom header in the response received:
Detecting check runs
When scrapy check
is running, the SCRAPY_CHECK
environment variable is set to the true
string. You can use os.environ to perform any change to your spiders or your settings when is used:
import os
class ExampleSpider(scrapy.Spider):
name = 'example'
def __init__(self):
if os.environ.get('SCRAPY_CHECK'):