Phoenix Storage Handler for Apache Hive

Phoenix 4.8.0+
Hive 1.2.1+

Hive Setup

Make phoenix-version-hive.jar available for Hive:

Step 1: Add to hive-env.sh:

Step 2: Add a property to hive-site.xml so that Hive MapReduce jobs can use the .jar:

<property>
  <name>hive.aux.jars.path</name>
  <value>file://<path></value>
</property>

For INTERNAL tables, Hive manages the lifecycle of the table and data. When a Hive table is created, a corresponding Phoenix table is also created. Once the Hive table is dropped, the Phoenix table is also deleted.

For EXTERNAL tables, Hive works with an existing Phoenix table and manages only Hive metadata. Dropping an EXTERNAL table from Hive deletes only Hive metadata but does not delete the Phoenix table.

  i1 int,
  s1 string,
  f1 float,
  d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
  "phoenix.table.name" = "ext_table",
  "phoenix.zookeeper.quorum" = "localhost",
  "phoenix.zookeeper.znode.parent" = "/hbase",
  "phoenix.zookeeper.client.port" = "2181",
  "phoenix.rowkeys" = "i1",
  "phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);

phoenix.table.name
- Specifies the Phoenix table name
- Default: the same as the Hive table
phoenix.zookeeper.quorum
- Default: localhost
phoenix.zookeeper.znode.parent
- Specifies the ZooKeeper parent node for HBase
- Default: /hbase
phoenix.zookeeper.client.port
- Specifies the ZooKeeper port
- Default: 2181
phoenix.rowkeys
- The list of columns to be the primary key in a Phoenix table
- Required
phoenix.column.mapping
- Mappings between column names for Hive and Phoenix. See for details.

Data Ingestion, Deletions, and Updates

Data ingestion can be done by all ways that Hive and Phoenix support:

Phoenix:

     upsert into table T values (.....);
         Phoenix CSV BulkLoad tools

All delete and update operations should be performed on the Phoenix side. See for more details.

Those options can be set in a Hive command-line interface (CLI) environment.

Parameter	Default Value	Description
hbase.scan.cache	100	Read row size for a unit request
hbase.scan.cacheblock	false	Whether or not cache block
split.by.stats	false	If true, mappers use table statistics. One mapper per guide post.
[hive-table-name].reducer.count	1	Number of reducers. In Tez mode, this affects only single-table queries. See Limitations.
[phoenix-table-name].query.hint		Hint for Phoenix query (for example, `NO_INDEX`)

Limitations

Hive update and delete operations require transaction manager support on both Hive and Phoenix sides. Related Hive and Phoenix JIRAs are listed in the Resources section.
Column mapping does not work correctly with mapping row key columns.
MapReduce and Tez jobs always have a single reducer.

: Implementation, accepted by Apache Phoenix community. Original pull request contains modification for Hive classes.