TiSpark 快速上手
为了让大家快速体验 TiSpark,通过 TiUP 安装的 TiDB 集群中默认已集成 Spark 和 TiSpark jar 包。
- Spark 默认部署在 TiDB 实例部署目录下 spark 目录中
- TiSpark jar 包默认部署在 Spark 部署目录 jars 文件夹下:
TiSpark 示例数据和导入脚本可点击 下载。
在 Oracle JDK 官方下载页面下载 JDK 1.8 当前最新版,本示例中下载的版本为 jdk-8u141-linux-x64.tar.gz
。
解压并根据您的 JDK 部署目录设置环境变量,编辑 ~/.bashrc
文件,比如:
export JAVA_HOME=/home/pingcap/jdk1.8.0_144 &&
export PATH=$JAVA_HOME/bin:$PATH
验证 JDK 有效性:
java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
导入样例数据
wget http://download.pingcap.org/tispark-sample-data.tar.gz && \
tar -zxvf tispark-sample-data.tar.gz && \
cd tispark-sample-data
修改 sample_data.sh
中 TiDB 登录信息,比如:
mysql --local-infile=1 -h 192.168.0.2 -P 4000 -u root < dss.ddl
执行脚本
登录 TiDB 并验证数据包含 TPCH_001
库及以下表:
mysql -uroot -P4000 -h192.168.0.2
show databases;
+--------------------+
| Database |
+--------------------+
| INFORMATION_SCHEMA |
| PERFORMANCE_SCHEMA |
| mysql |
| test |
5 rows in set (0.00 sec)
use TPCH_001;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
+--------------------+
| Tables_in_TPCH_001 |
+--------------------+
| CUSTOMER |
| LINEITEM |
| NATION |
| ORDERS |
| PART |
| PARTSUPP |
| REGION |
| SUPPLIER |
+--------------------+
8 rows in set (0.00 sec)
cd spark &&
bin/spark-shell
然后像使用原生 Spark 一样查询 TiDB 表:
scala> spark.sql("select count(*) from lineitem").show
结果为
|count(1)|
+--------+
+--------+
下面执行另一个复杂一点的 Spark SQL:
scala> spark.sql(
"""select
| l_returnflag,
| l_linestatus,
| sum(l_quantity) as sum_qty,
| sum(l_extendedprice) as sum_base_price,
| sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
| sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
| avg(l_quantity) as avg_qty,
| avg(l_extendedprice) as avg_price,
| avg(l_discount) as avg_disc,
| count(*) as count_order
|from
| lineitem
|where
| l_shipdate <= date '1998-12-01' - interval '90' day
|group by
| l_returnflag,
| l_linestatus
|order by
| l_returnflag,
""".stripMargin).show
结果为:
更多样例请参考 pingcap/tispark-test