1.Hive的安装

    • 安装JDK-略过
    • 安装Hadoop-略过
    • 安装Mysql-略过

    2安装Hive

    1. hadoop@Master:~$ sudo cp -R apache-hive-1.1.1-bin /usr/local/hive
    2. hadoop@Master:~$ sudo chmod -R 775 /usr/local/hive/
    3. hadoop@Master:~$ sudo chown hadoop:hadoop /usr/local/hive/
    4. #修改/etc/profile加入HIVE_HOME的变量
    5. export HIVE_HOME=/usr/local/hive
    6. export PATH=$PATH:$HIVE_HOME/bin
    7. export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:/usr/local/hive/lib
    8. #修改hive/conf下的几个template模板并重命名为其他
    9. cp hive-env.sh.template hive-env.sh
    10. cp hive-default.xml.template hive-site.xml
    11. #配置hive-env.sh文件,指定HADOOP_HOME
    12. HADOOP_HOME=/usr/local/hadoop
    13. #修改hive-site.xml文件,指定MySQL数据库驱动、数据库名、用户名及密码,修改的内容如下所示
    14. <property>
    15. <name>javax.jdo.option.ConnectionURL</name>
    16. <value>jdbc:mysql://192.168.1.178:3306/hive?createDatabaseIfNotExist=true</value>
    17. <description>JDBC connect string for a JDBC metastore</description>
    18. </property>
    19. <property>
    20. <name>javax.jdo.option.ConnectionDriverName</name>
    21. <description>Driver class name for a JDBC metastore</description>
    22. </property>
    23. <property>
    24. <name>javax.jdo.option.ConnectionUserName</name>
    25. <value>hive</value>
    26. <description>username to use against metastore database</description>
    27. </property>
    28. <property>
    29. <value>hive</value>
    30. <description>password to use against metastore database</description>
    31. </property>
    32. 其中:
    33. javax.jdo.option.ConnectionURL参数指定的是Hive连接数据库的连接字符串;
    34. javax.jdo.option.ConnectionDriverName参数指定的是驱动的类入口名称;
    35. javax.jdo.option.ConnectionUserName参数指定了数据库的用户名;
    36. javax.jdo.option.ConnectionPassword参数指定了数据库的密码。
    1. export JAVA_HOME=/usr/lib/jvm
    2. export HADOOP_HOME=/usr/local/hadoop
    3. export HIVE_HOME=/usr/local/hive

    4下载mysql-connector-java-5.1.27-bin.jar文件,并放到$HIVE_HOME/lib目录下

    1. hadoop fs -mkdir /tmp
    2. hadoop fs -mkdir /user/hive/warehouse
    3. hadoop fs -chmod g+w /tmp
    4. hadoop fs -chmod g+w /user/hive/warehouse

    6启动hadoop。进入hive shell,输入一些命令查看

    1. hive
    2. show databases;
    3. show tables;

    Hive使用实例

    1查询示例

    1. hive> SHOW TABLES;
    2. OK
    3. testuser
    4. Time taken: 0.707 seconds, Fetched: 1 row(s)
    5. hive> DESC testuser;
    6. OK
    7. id int
    8. username string
    9. hive> SELECT * from testuser limit 10;
    10. OK
    11. 1 sssss
    12. Time taken: 0.865 seconds, Fetched: 2 row(s)
    13. hive>
    14. hive> select count(1) from testuser;
    15. Query ID = hadoop_20160205004747_9d84aaca-887a-43a0-bad9-eddefe4e2219
    16. Total jobs = 1
    17. Launching Job 1 out of 1
    18. Number of reduce tasks determined at compile time: 1
    19. In order to change the average load for a reducer (in bytes):
    20. set hive.exec.reducers.bytes.per.reducer=<number>
    21. In order to limit the maximum number of reducers:
    22. set hive.exec.reducers.max=<number>
    23. In order to set a constant number of reducers:
    24. set mapreduce.job.reduces=<number>
    25. Starting Job = job_1454604205731_0001, Tracking URL = http://Master:8088/proxy/application_1454604205731_0001/
    26. Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1454604205731_0001
    27. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    28. 2016-02-05 00:48:11,942 Stage-1 map = 0%, reduce = 0%
    29. 2016-02-05 00:48:19,561 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.38 sec
    30. 2016-02-05 00:48:28,208 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.77 sec
    31. MapReduce Total cumulative CPU time: 2 seconds 770 msec
    32. Ended Job = job_1454604205731_0001
    33. MapReduce Jobs Launched:
    34. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.77 sec HDFS Read: 6532 HDFS Write: 2 SUCCESS
    35. Total MapReduce CPU Time Spent: 2 seconds 770 msec
    36. OK
    37. 2
    38. Time taken: 35.423 seconds, Fetched: 1 row(s)

    通过这些消息,可以知道该查询生成了一个Mapreduce作业,Hive之美在于用户根本不需要知道MapReduce的存在,用户所需关心的,仅仅是使用一种类似于SQL的语言.

    1. 多次重复实现大量数据插入
    2. hive> insert overwrite table testuser
    3. > select id,count(id)