Spiders Contracts

    This allows you to test each callback of your spider by hardcoding a sample url and check various constraints for how the callback processes the response. Each contract is prefixed with an and included in the docstring. See the following example:

    This callback is tested using three built-in contracts:

    class scrapy.contracts.default.UrlContract[source]

    This contract (@url) sets the sample URL used when checking other contract conditions for this spider. This contract is mandatory. All callbacks lacking this contract are ignored when running the checks:

    1. @url url

    class scrapy.contracts.default.CallbackKeywordArgumentsContract

    This contract (@cb_kwargs) sets the cb_kwargs attribute for the sample request. It must be a valid JSON dictionary.

    class scrapy.contracts.default.ReturnsContract[source]

    This contract (@returns) sets lower and upper bounds for the items and requests returned by the spider. The upper bound is optional:

    1. @returns item(s)|request(s) [min [max]]

    class scrapy.contracts.default.ScrapesContract

    Use the check command to run the contract checks.

    If you find you need more power than the built-in Scrapy contracts you can create and load your own contracts in the project by using the setting:

    1. SPIDER_CONTRACTS = {
    2. 'myproject.contracts.ItemValidate': 10,

    Each contract must inherit from Contract and can override three methods:

    class scrapy.contracts.Contract(method, \args*)

    • Parameters

      • method (collections.abc.Callable) – callback function to which the contract is associated

      • args () – list of arguments passed into the docstring (whitespace separated)

    • adjust_request_args(args)[source]

      This receives a dict as an argument containing default arguments for request object. Request is used by default, but this can be changed with the request_cls attribute. If multiple contracts in chain have this attribute defined, the last one is used.

    • pre_process(response)

      This allows hooking in various checks on the response received from the sample request, before it’s being passed to the callback.

    • post_process(output)

      This allows processing the output of the callback. Iterators are converted listified before being passed to this hook.

    Raise from pre_process or if expectations are not met:

    class scrapy.exceptions.ContractFail[source]

    Error raised in case of a failing contract

    Here is a demo contract which checks the presence of a custom header in the response received:

    Detecting check runs

    When scrapy check is running, the SCRAPY_CHECK environment variable is set to the true string. You can use os.environ to perform any change to your spiders or your settings when is used:

    1. import os
    2. class ExampleSpider(scrapy.Spider):
    3. name = 'example'
    4. def __init__(self):
    5. if os.environ.get('SCRAPY_CHECK'):