InfluxDB schema design

Design to query
Use tags and fields

Good schema design can prevent high series cardinality, resulting in better performing queries. If you notice data reads and writes slowing down or want to learn how cardinality affects performance, see how to resolve high cardinality.

The schemas below demonstrate , tag keys, and that are easy to query.

The and waterQualitySensor schemas illustrate the following guidelines:

Each measurement is a simple name that describes a schema.
Keys don’t repeat within a schema.
Keys .
Fields (humidity, pH, and temperature) store numeric data.
Fields data.
Measurements and keys don’t contain data; tag values and field values will store data.

The following points (formatted as line protocol) use the airSensor and waterQualitySensor schemas:

Store data in or field values, not in , field keys, or . If you design your schema to store data in tag and field values, your queries will be easier to write and more efficient.

In addition, you’ll keep cardinality low by not creating measurements and keys as you write data. To learn more about the performance impact of high series cardinality, see how to resolve high cardinality.

Compare schemas

Compare the following valid schemas represented by line protocol.

Recommended: the following schema stores metadata in separate crop, plot, and region tags. The temp field contains variable numeric data.

Good Measurements schema - Data encoded in tags (recommended)
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000

Bad Measurements schema - Data encoded in the measurement (not recommended)
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000

Not recommended: the following schema stores multiple attributes (crop, and region) concatenated (blueberries.plot-1.north) within the field key.

Bad Keys schema - Data encoded in field keys (not recommended)
-------------
weather_sensor blueberries.plot-1.north.temp=50.1 1472515200000000000
weather_sensor blueberries.plot-2.midwest.temp=49.8 1472515200000000000

Compare queries

Compare the following queries of the and Bad Measurements schemas. The queries calculate the average temp for blueberries in the north region

Easy to query: Good Measurements data is easily filtered by region tag values, as in the following example.

Difficult to query: requires regular expressions to extract plot and region from the measurement, as in the following example.

// Query *Bad Measurements*, data encoded in the measurement (not recommended)
from(bucket:"example-bucket")
    |> range(start:2016-08-30T00:00:00Z)
    |> filter(fn: (r) =>  r._measurement =~ /\.north$/ and r._field == "temp")
    |> mean()

Complex measurements make some queries impossible. For example, calculating the average temperature of both plots is not possible with the Bad Measurements schema.

Keep keys simple

In addition to keeping your keys free of data, follow these additional guidelines to make them easier to query:

Avoid keywords and special characters

Avoid keywords and special characters in keys

To simplify query writing, don’t include reserved keywords or special characters in tag and field keys. If you use in keys, then you’ll have to wrap the keys in double quotes. If you use non-alphanumeric characters in keys, then you’ll have to use bracket notation in [Flux]((/flux/v0.x/).

Avoid duplicate names for tags and fields

Avoid using the same name for a tag key and a within the same schema. Your query results may be unpredictable if you have a tag and a field with the same name.

Use tags and fields

Store unique or frequently changing values as field values.

Store values as tag values if they can be reasonably indexed.
Store values as if the values are used in filter() or functions.
Store values as tag values if the values are shared across multiple data points, i.e. metadata about the field.

Because InfluxDB indexes tags, the query engine doesn’t need to scan every record in a bucket to locate a tag value. For example, consider a bucket that stores data about thousands of users. With userId stored in a field, a query for user abcde requires InfluxDB to scan userId in every row.

from(bucket: "example-bucket")
    |> range(start: -7d)
    |> filter(fn: (r) => r._field == "userId" and r._value == "abcde")

To retrieve data more quickly, filter on a tag to reduce the number of rows scanned. The tag should store data that can be reasonably indexed. The following query filters by the company tag to reduce the number of rows scanned for userId.

from(bucket: "example-bucket")
    |> filter(fn: (r) => r.company == "Acme")
    |> filter(fn: (r) => r._field == "userId" and r._value == "abcde")

Use one tag for each data attribute. If your source data contains multiple data attributes in a single parameter, split each attribute into its own tag. When each tag represents one attribute (not multiple concatenated attributes) of your data, you’ll reduce the need for regular expressions in your queries. Without regular expressions, your queries will be easier to write and more performant.

Compare schemas

Compare the following valid schemas represented by line protocol.

Recommended: the following schema splits location data into plot and region tags.

Not recommended: the following schema stores multiple attributes (plot and region) concatenated within the location tag value (plot-1.north).

Bad Tags schema - Multiple data encoded in a single tag
-------------
weather_sensor,crop=blueberries,location=plot-1.north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,location=plot-2.midwest temp=49.8 1472515200000000000

Compare queries

Compare queries of the and Bad Tags schemas. The queries calculate the average temp for blueberries in the north region.

Easy to query: Good Tags data is easily filtered by region tag values, as in the following example.

// Query *Good Tags* schema, data encoded in multiple tags
from(bucket:"example-bucket")
    |> range(start:2016-08-30T00:00:00Z)
    |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
    |> mean()

// Query *Bad Tags* schema, multiple data encoded in a single tag
from(bucket:"example-bucket")
    |> range(start:2016-08-30T00:00:00Z)
    |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")

For an overview of the InfluxDB data model, watch the following video:

Building Blocks of the InfluxDB Data Model

Schema design

InfluxDB schema design

Compare schemas

Compare queries

Keep keys simple

Avoid keywords and special characters in keys

Avoid duplicate names for tags and fields

Use tags and fields

Compare schemas

Compare queries

Related