Release 0.57
Note
approx_distinct() should be used in preference to this whenever an approximate answer is allowable as it is substantially faster and does not have any limits on the number of distinct items it can process. COUNT(DISTINCT ...)
must transfer every item over the network and keep each distinct item in memory.
All Hive connectors support reading data from . This requires two additional catalog properties for the Hive connector to specify your AWS Access Key ID and Secret Access Key:
hive.s3.aws-access-key=AKIAIOSFODNN7EXAMPLE
Allow specifying catalog and schema in the JDBC Driver URL.
Allow certain custom
InputFormat
s to work by propagating Hive serialization properties to the .Many execution engine performance improvements.