Map Reduce Integration - 《Apache Phoenix使用文档(英文)》

provides several utility methods to set the input and output configuration parameters to the job.

When a Phoenix table is the source for the Map Reduce job, we can provide a SELECT query or pass a table name and specific columns to import data . To retrieve data from the table within the mapper class, we need to have a class that implements DBWritable and pass it as an argument to PhoenixMapReduceUtil.setInput method. The custom DBWritable class provides implementation for that allows us to retrieve columns for each row. This custom DBWritable class will form the input value to the mapper class.

Similarly, when writing to a Phoenix table, we use the PhoenixMapReduceUtil.setOutput method to set the output table and the columns.

The output key and value class for the job should always be NullWritable and the custom DBWritable class that implements the write method .

a) stock

b) stock_stats

CREATE TABLE IF NOT EXISTS STOCK_STATS (STOCK_NAME VARCHAR NOT NULL , MAX_RECORDING DOUBLE CONSTRAINT pk PRIMARY KEY (STOCK_NAME));

Sample Data

Job Configuration

final Configuration configuration = HBaseConfiguration.create();
final Job job = Job.getInstance(configuration, "phoenix-mr-job");
 
// We can either specify a selectQuery or ignore it when we would like to retrieve all the columns
final String selectQuery = "SELECT STOCK_NAME,RECORDING_YEAR,RECORDINGS_QUARTER FROM STOCK ";
 
// StockWritable is the DBWritable class that enables us to process the Result of the above query
PhoenixMapReduceUtil.setInput(job, StockWritable.class, "STOCK",  selectQuery);  
 
// Set the target Phoenix table and the columns
PhoenixMapReduceUtil.setOutput(job, "STOCK_STATS", "STOCK_NAME,MAX_RECORDING");
 
job.setMapperClass(StockMapper.class);
job.setOutputFormatClass(PhoenixOutputFormat.class);
 
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleWritable.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(StockWritable.class); 
TableMapReduceUtil.addDependencyJars(job);
job.waitForCompletion(true);

Stock Mapper

 public static class StockMapper extends Mapper<NullWritable, StockWritable, Text , DoubleWritable> {
 
    private Text stock = new Text(); 
    private DoubleWritable price = new DoubleWritable ();
 
    @Override
    protected void map(NullWritable key, StockWritable stockWritable, Context context) throws IOException, InterruptedException {
       final String stockName = stockWritable.getStockName();
       double maxPrice = Double.MIN_VALUE;
       for(double recording : recordings) {
         if(maxPrice < recording) {
          maxPrice = recording;
             }
       }
       stock.set(stockName);
       price.set(maxPrice);
       context.write(stock,price);
    }
 
}

Stock Reducer

Packaging & Running

Ensure phoenix-[version]-client.jar is in the classpath of your Map Reduce job jar.
To run the job, use the hadoop jar command with the necessary arguments.