Handle duplicate data points
For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example:
web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
# New data point
web,host=host2,region=us_west firstByte=15.0 1559260800000000000
After you submit the new data point, InfluxDB overwrites firstByte
with the new field value and leaves the field dnsLookup
alone:
from(bucket: "example-bucket")
|> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
|> filter(fn: (r) => r._measurement == "web")
_time _measurement host region dnsLookup firstByte
2019-05-31T00:00:00Z web host2 us_west 7 15
Preserve duplicate points
Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique.
For example, add a uniq
tag to each data point:
After writing the new point to InfluxDB:
from(bucket: "example-bucket")
|> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
|> filter(fn: (r) => r._measurement == "web")
Table: keys: [_measurement, host, region, uniq]
_time _measurement host region uniq firstByte dnsLookup
-------------------- ------------ ----- ------- ---- --------- ---------
2019-05-31T00:00:00Z web host2 us_west 1 24 7
_time _measurement host region uniq firstByte
-------------------- ------------ ----- ------- ---- ---------
2019-05-31T00:00:00Z web host2 us_west 2 15
Increment the timestamp
Increment the timestamp by a nanosecond to enforce the uniqueness of each point.
from(bucket: "example-bucket")
|> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
|> filter(fn: (r) => r._measurement == "web")
Table: keys: [_measurement, host, region]
_time _measurement host region firstByte dnsLookup
------------------------------ ------------ ----- ------- --------- ---------
2019-05-31T00:00:00.000000000Z web host2 us_west 24 7
2019-05-31T00:00:00.000000001Z web host2 us_west 15
The output of examples queries in this article has been modified to clearly show the different approaches and results for handling duplicate data.