Architecture

    Instrumentation is written to be safe in production and have little overhead.For this reason, they only propagate IDs in-band, to tell the receiver there’sa trace in progress. Completed spans are reported to Zipkin out-of-band,similar to how applications report metrics asynchronously.

    For example, when an operation is being traced and it needs to make an outgoinghttp request, a few headers are added to propagate IDs. Headers are not used tosend details such as the operation name.

    The component in an instrumented app that sends data to Zipkin is called aReporter. Reporters send trace data via one of several transports to Zipkincollectors, which persist trace data to storage. Later, storage is queried bythe API to provide data to the UI.

    Here’s a diagram describing this flow:

    To see if a tracer or instrumentation library already exists for your platform, see.

    As mentioned in the overview, identifiers are sent in-band and details are sentout-of-band to Zipkin. In both cases, trace instrumentation is responsible forcreating valid traces and rendering them properly. For example, a tracer ensuresparity between the data it sends in-band (downstream) and out-of-band (async toZipkin).

    Here’s an example sequence of http tracing where user code calls the resource/foo. This results in a single span, sent asynchronously to Zipkin after usercode receives the http response.

    Trace instrumentation report spans asynchronously to prevent delays or failuresrelating to the tracing system from delaying or breaking user code.

    There are 4 components that make up Zipkin:

    • collector
    • storage
    • search

    Once the trace data arrives at the Zipkin collector daemon, it is validated,stored, and indexed for lookups by the Zipkin collector.

    Zipkin was initially built to store data on Cassandra since Cassandra isscalable, has a flexible schema, and is heavily used within Twitter. However, wemade this component pluggable. In addition to Cassandra, we natively supportElasticSearch and MySQL. Other back-ends might be offered as third partyextensions.

    Once the data is stored and indexed, we need a way to extract it. The querydaemon provides a simple JSON API for finding and retrieving traces. The primaryconsumer of this API is the Web UI.