Ecosystem Integration - Spark IoTDB - 《Apache IoTDB User Guide (V0.12.x)》

- User Guide

mvn clean scala:compile compile install

Maven Dependency

spark-shell user guide

spark-shell --jars spark-iotdb-connector-0.12.0.jar,iotdb-jdbc-0.12.0-jar-with-dependencies.jar
import org.apache.iotdb.spark.db._
val df = spark.read.format("org.apache.iotdb.spark.db").option("url","jdbc:iotdb://127.0.0.1:6667/").option("sql","select * from root").load
df.printSchema()
df.show()

Schema Inference

Take the following TsFile structure as an example: There are three Measurements in the TsFile schema: status, temperature, and hardware. The basic information of these three measurements is as follows:

The existing data in the TsFile is as follows:

The wide(default) table form is as follows:

You can also use narrow table form which as follows: (You can see part 4 about how to use narrow form)

Transform between wide and narrow table

from wide to narrow

import org.apache.iotdb.spark.db._
val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load
val narrow_df = Transformer.toNarrowForm(spark, wide_df)

from narrow to wide

Java user guide

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.iotdb.spark.db.*
public class Example {
  public static void main(String[] args) {
    SparkSession spark = SparkSession
        .builder()
        .getOrCreate();
    Dataset<Row> df = spark.read().format("org.apache.iotdb.spark.db")
        .option("url","jdbc:iotdb://127.0.0.1:6667/")
        .option("sql","select * from root").load();
    df.printSchema();
    df.show();
    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df)
    narrowTable.show()
  }

Write Data to IoTDB

You can directly write data to IoTDB whatever the dataframe contains a wide table or a narrow table.
The parameter numPartition is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before writing data. Each partition will open a session to write data to increase the number of concurrent requests.