hive overwrite会删除使用insert into ..select 语句时mapReduce不执行，卡在kill command

机械键盘 | 冬奥会 | matlab | 扫地机器人 | 几何学 | 城市规划 | 易经 | 分子生物学 | 化学实验 | 历史故事 | instagram | 道教 | 戒指 | 细胞生物学 | 按键精灵 | EXO | 基因 | 产品设计 | 混凝土 | Adobe After Effects | 秦时明月之天行九歌 | 春节联欢晚会 | 九龙 | centos | 发型设计 | 脸型 | 滦州市 | 游戏原画 | solidworks | 赛事 | 网站建设 | 圣经 | 营销策划 | 孙悟空 | 百度输入法 | 数学建模 | 燕窝 | 虚拟机 | 管理软件 | PLC | 搜索引擎 | 虚拟专用服务器 | 日历 | 生活经历 | 周易 | 台风 | 吉利帝豪 | 鉴定 | CSS | 三菱商事 | 工业机器人 | 互联网资源 | safari | 鲜奶 | 武侠小说 | 潮牌 | 大白菜 | 脱毛 | 植保无人机 | 股票市场 | 3D Max | 香港购物 | snh48 | 酵素 | 草书 | 双色球 | 三国 | 海军 | 牙膏 | 敏感皮肤 | 狼牙山五壮士 | 几米 | 金雕 | 徐波 | 战斗机 | led | 微信群 | 加湿器 | 航拍 | 外貌 | 运载火箭 | 葡萄 | 内黄县 | 乾隆 | 图形处理器（gpu） | 世界杯 | 坦克世界（游戏） | 鸡蛋 | 机动车辆保险 | 生日 | 中华民国 | 蟑螂 | JSP | 电子书 | 兰蔻（lancome） | 信贷 | 粤语 | 住宅风水 | 外汇 | 互联网创业 | 郭德纲 | 产后护理 | 社会学 | 姓氏 | 中东 | 徽州区 | Adobe Illustrator | 画师 | 爬虫（计算机网络） | 战役 | 实习 | 项目管理 | 免费软件 | 瓷器 | Microsoft SQL Server | 遗传学 | Microsoft Visual Studio | 公路车 | 貂蝉 | 疤痕修复 | 米粉 | 中国中央电视台 | unity（游戏引擎） | 中国人 | 手绘 | 福利 | 索尼笔记本 | 奔驰(mercedes-benz) | 结构工程 | 奥特曼系列 | 阿富汗伊斯兰共和国 | 后宫·甄嬛传（书籍） | 文化差异 | 动物保护 | 古诗 | 男士护肤 | youtube | 3D | 快捷键 | onenote | 艺术家 | 牙齿美白 | 日语学习 | C#编程 | 精神病学 | 嵌入式系统 | 泉州市 | 红木艺术 | 湖南卫视 | 花千骨 | 初中数学 | 飞艇 | 赋 | amd | Legion | 隆鼻 | 暗恋 | 话剧 | 核桃 | 紫檀 | 自动化 | 科学 | 驴 | 户型 | 女性主义 | 进贤县 | 智商 | 日剧 | 医院推荐 | 酸奶 | 婴儿车 | 大城县 | 埇桥区 | gmail | 乐视超级电视 | 孔子 | 痛风 | 光绪皇帝 | QQ三国 | 汽车美容 | 双肩包 | 国产电视剧 | logo设计 | 开关电源 | 努比亚（手机品牌） | 赵一曼 | 八字 | 气候 | 一体机 | 玻璃 | stm32 | 虎牙直播 | 恩施土家族苗族自治州 | 空气质量 | 理发 | ansys | 外国人 | 祁县 | 新泰市 | 锤子科技 |

你的位置：网站首页 >> 频道首页 >>Hadoop >>hive overwrite会删除使用insert into ..select 语句时mapReduce不执行，卡在kill command

hive overwrite会删除使用insert into ..select 语句时mapReduce不执行，卡在kill command

来源：蜘蛛抓取(WebSpider) 时间：2017-02-21 01:10 标签： hive insert into

你的浏览器禁用了JavaScript, 请开启后刷新浏览器获得更好的体验!
执行的HQL语句如下：
select sf.userid, sf.query, count(sf.query) sco, tt.co from (select userid, count(3)
co from sogou_query group by userid order by co desc) as tt join
sogou_query as sf on tt.userid = sf.userid group by sf.userid,sf.query,tt.co
order by tt.co desc,sco desc limit 10;
报错如下：
修改了mapred-site.xml配置文件：
yarn.app.mapreduce.am.resource.mb=700
mapreduce.map.memory.mb=700
mapreduce.reduce.memory.mb=700
修改了yarn-site.xml配置文件：
&property&
&name&yarn.nodemanager.pmem-check-enabled&/name&
&value&false&/value&
&/property&
&property&
&name&yarn.nodemanager.vmem-check-enabled&/name&
&value&false&/value&
&/property&
使用下面命令刷新修改后的配置文件：
hdfs dfsadmin -refreshNodes
yarn rmadmin -refreshNodes
hive中的操作：
java heap space设置的大些。
hive& set mapred.map.child.java.opts=-Xmx512m;
hive& set mapred.reduce.child.java.opts=-Xmx512m;
hive& select sf.userid, sf.query, count(sf.query) sco, tt.co from (select userid, count(3) co from sogou_query group by userid order by co desc) as tt join
& sogou_query as sf on tt.userid = sf.userid group by sf.userid,sf.query,tt.co order by tt.co desc,sco desc limit 10;
Query ID = hive_18_daaa-40e7-b59a-cdfd024ff7c2
Total jobs = 7
Launching Job 1 out of 7
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=&number&
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=&number&
In order to set a constant number of reducers:
set mapreduce.job.reduces=&number&
Starting Job = job_1_0061, Tracking URL =
Kill Command = /usr/lib/hadoop/bin/hadoop job
-kill job_1_0061
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
14:18:28,839 Stage-1 map = 0%,
reduce = 0%
14:18:43,359 Stage-1 map = 50%,
reduce = 0%, Cumulative CPU 11.4 sec
14:18:49,726 Stage-1 map = 83%,
reduce = 0%, Cumulative CPU 17.37 sec
14:18:51,862 Stage-1 map = 100%,
reduce = 0%, Cumulative CPU 19.83 sec
14:19:06,274 Stage-1 map = 100%,
reduce = 71%, Cumulative CPU 25.33 sec
14:19:08,456 Stage-1 map = 100%,
reduce = 100%, Cumulative CPU 27.99 sec
MapReduce Total cumulative CPU time: 27 seconds 990 msec
Ended Job = job_1_0061
Launching Job 2 out of 7
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=&number&
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=&number&
In order to set a constant number of reducers:
set mapreduce.job.reduces=&number&
Starting Job = job_1_0062, Tracking URL =
Kill Command = /usr/lib/hadoop/bin/hadoop job
-kill job_1_0062
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
14:19:23,835 Stage-2 map = 0%,
reduce = 0%
14:19:41,931 Stage-2 map = 100%,
reduce = 0%, Cumulative CPU 8.4 sec
14:19:58,737 Stage-2 map = 100%,
reduce = 72%, Cumulative CPU 14.24 sec
14:19:59,841 Stage-2 map = 100%,
reduce = 100%, Cumulative CPU 15.53 sec
MapReduce Total cumulative CPU time: 15 seconds 530 msec
Ended Job = job_1_0062
Stage-10 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Execution log at: /tmp/hive/hive_18_daaa-40e7-b59a-cdfd024ff7c2.log
Starting to launch local task
maximum memory =
Processing rows:
Hashtable size:
Memory usage:
percentage:
Processing rows:
Hashtable size:
Memory usage:
percentage:
Processing rows:
Hashtable size:
Memory usage:
percentage:
Processing rows:
Hashtable size:
Memory usage:
percentage:
Dump the side-table for tag: 0 with group count: 519876 into file: file:/tmp/hive/9e8-4863-ba21-9ab/hive__14-18-08_927_/-local-10009/HashTable-Stage-8/MapJoin-mapfile10--.hashtable
Uploaded 1 File to: file:/tmp/hive/9e8-4863-ba21-9ab/hive__14-18-08_927_/-local-10009/HashTable-Stage-8/MapJoin-mapfile10--.hashtable ( bytes)
E Time Taken: 9.896 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 4 out of 7
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1_0063, Tracking URL =
Kill Command = /usr/lib/hadoop/bin/hadoop job
-kill job_1_0063
Hadoop job information for Stage-8: number of mappers: 2; number of reducers: 0
14:20:35,408 Stage-8 map = 0%,
reduce = 0%
14:20:57,622 Stage-8 map = 50%,
reduce = 0%, Cumulative CPU 24.99 sec
14:21:10,162 Stage-8 map = 100%,
reduce = 0%, Cumulative CPU 36.35 sec
MapReduce Total cumulative CPU time: 36 seconds 350 msec
Ended Job = job_1_0063
Launching Job 5 out of 7
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=&number&
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=&number&
In order to set a constant number of reducers:
set mapreduce.job.reduces=&number&
Starting Job = job_1_0064, Tracking URL =
Kill Command = /usr/lib/hadoop/bin/hadoop job
-kill job_1_0064
Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 1
14:21:24,987 Stage-4 map = 0%,
reduce = 0%
14:21:38,918 Stage-4 map = 50%,
reduce = 0%, Cumulative CPU 4.36 sec
14:21:45,155 Stage-4 map = 100%,
reduce = 0%, Cumulative CPU 15.73 sec
14:21:59,140 Stage-4 map = 100%,
reduce = 68%, Cumulative CPU 21.18 sec
14:22:02,392 Stage-4 map = 100%,
reduce = 76%, Cumulative CPU 24.01 sec
14:22:05,585 Stage-4 map = 100%,
reduce = 93%, Cumulative CPU 26.83 sec
14:22:06,621 Stage-4 map = 100%,
reduce = 100%, Cumulative CPU 28.06 sec
MapReduce Total cumulative CPU time: 28 seconds 60 msec
Ended Job = job_1_0064
Launching Job 6 out of 7
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=&number&
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=&number&
In order to set a constant number of reducers:
set mapreduce.job.reduces=&number&
Starting Job = job_1_0065, Tracking URL =
Kill Command = /usr/lib/hadoop/bin/hadoop job
-kill job_1_0065
Hadoop job information for Stage-5: number of mappers: 1; number of reducers: 1
14:22:21,459 Stage-5 map = 0%,
reduce = 0%
14:22:42,444 Stage-5 map = 100%,
reduce = 0%, Cumulative CPU 11.04 sec
14:22:55,938 Stage-5 map = 100%,
reduce = 100%, Cumulative CPU 15.01 sec
MapReduce Total cumulative CPU time: 15 seconds 10 msec
Ended Job = job_1_0065
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2
Cumulative CPU: 27.99 sec
HDFS Read:
HDFS Write:
Stage-Stage-2: Map: 1
Cumulative CPU: 15.53 sec
HDFS Read:
HDFS Write:
Stage-Stage-8: Map: 2
Cumulative CPU: 37.25 sec
HDFS Read:
HDFS Write:
Stage-Stage-4: Map: 2
Cumulative CPU: 28.06 sec
HDFS Read:
HDFS Write:
Stage-Stage-5: Map: 1
Cumulative CPU: 15.01 sec
HDFS Read:
HDFS Write: 718 SUCCESS
Total MapReduce CPU Time Spent: 2 minutes 3 seconds 840 msec
[血腥恐怖女尸图片]
[太平间女尸图片]
[漂亮女尸图片]
[国外恐怖女尸图片]
[凶杀现场女尸图片]
[案发现场女尸图片]
[刑场枪决女死囚]
[停尸房女尸图片]
[女尸图片大全]
[解剖漂亮女尸图片]
Time taken: 288.215 seconds, Fetched: 10 row(s)
哦，好的，谢谢！
雷雷我如果用hive cli方式执行没有问题，但是如果用beeline方式运行依然报错呢
beeline启动命令是什么，我试试。
beeline方式很奇怪配置文件不是都修改成700M了，怎么还会报400M的错误：
Diagnostic Messages for this Task:
Container [pid=24135,containerID=container_1_006] is running beyond physical memory limits. Current usage: 450.7 MB of 400 MB
2.1 GB of 840.0 MB virtual memory used. Killing container.
新修改的配置没有生效，重启下hiveserver2服务器就可以了。
日志在mycluster-4机器上/var/log/hive/hive-server2.log
哦谢谢啦！
要回复问题请先或
浏览: 1592
关注: 2 人HiveServer2中使用jdbc客户端用户运行mapreduce - 为程序员服务
HiveServer2中使用jdbc客户端用户运行mapreduce
最近做了个web系统访问hive数据库，类似于官方自带的hwi、安居客的
和大众点评的
)系统，但是和他们的实现不一样，查询Hive语句走的不是cli而是通过jdbc连接hive-server2。为了实现mapreduce任务中资源按用户调度，需要hive查询自动绑定当前用户、将该用户传到yarn服务端并使mapreduce程序以该用户运行。本文主要是记录实现该功能过程中遇到的一些问题以及解决方法,如果你有更好的方法和建议，欢迎留言发表您的看法！
集群环境使用的是cdh4.3，没有开启kerberos认证。
写完这篇文章之后，在微博上收到
，发现cdh4.3中hive-server2已经实现
功能，再此对@单超eric的帮助表示感谢。
so，你可以完全忽略本文后面的内容，直接看cloudera的HiveServer2 Impersonation是怎么做的。
hive-server2的启动
先从hive-server2服务的启动开始说起。
如果你是以服务的方式启动hive-server2进程，则启动hive-server2的用户为hive,运行mapreduce的用户也为hive，启动脚本如下：
/etc/init.d/hive-server2 start
如果你以命令行方式启动hive-server2进程，则启动hive-server2的用户为root,运行mapreduce的用户也为root，启动脚本如下：
hive --service hiveserver2
为什么是上面的结论？这要从hive-server2的启动过程开始说明。
查看HiveServer2.java的代码可以看到，hive-server2启动时会依次启动
cliService
thriftCLIService
，查看cliService的init()方法，可以看到如下代码：
public synchronized void init(HiveConf hiveConf) {
this.hiveConf = hiveC
sessionManager = new SessionManager();
addService(sessionManager);
HiveAuthFactory.loginFromKeytab(hiveConf);
serverUserName = ShimLoader.getHadoopShims().
getShortUserName(ShimLoader.getHadoopShims().getUGIForConf(hiveConf));
} catch (IOException e) {
throw new ServiceException("Unable to login to kerberos with given principal/keytab", e);
} catch (LoginException e) {
throw new ServiceException("Unable to login to kerberos with given principal/keytab", e);
super.init(hiveConf);
从上面的代码可以看到在cliService初始化过程中会做登陆（从kertab中登陆）和获取用户名的操作：
ShimLoader.getHadoopShims().getUGIForConf(hiveConf)
上面代码最终会调用HadoopShimsSecure类的getUGIForConf方法：
public UserGroupInformation getUGIForConf(Configuration conf) throws IOException {
return UserGroupInformation.getCurrentUser();
UserGroupInformation.getCurrentUser()代码如下：
public synchronized
static UserGroupInformation getCurrentUser() throws IOException {
AccessControlContext context = AccessController.getContext();
Subject subject = Subject.getSubject(context);
if (subject == null || subject.getPrincipals(User.class).isEmpty()) {
return getLoginUser();
return new UserGroupInformation(subject);
因为这时候服务刚启动，subject为空，故if分支会调用
getLoginUser()
方法，其代码如下：
public synchronized
static UserGroupInformation getLoginUser() throws IOException {
if (loginUser == null) {
Subject subject = new Subject();
if (isSecurityEnabled()) {
login = newLoginContext(HadoopConfiguration.USER_KERBEROS_CONFIG_NAME,
subject, new HadoopConfiguration());
login = newLoginContext(HadoopConfiguration.SIMPLE_CONFIG_NAME,
subject, new HadoopConfiguration());
login.login();
loginUser = new UserGroupInformation(subject);
loginUser.setLogin(login);
loginUser.setAuthenticationMethod(isSecurityEnabled() ?
AuthenticationMethod.KERBEROS :
AuthenticationMethod.SIMPLE);
loginUser = new UserGroupInformation(login.getSubject());
String fileLocation = System.getenv(HADOOP_TOKEN_FILE_LOCATION);
if (fileLocation != null) {
// Load the token storage file and put all of the tokens into the
// user. Don't use the FileSystem API for reading since it has a lock
// cycle (HADOOP-9212).
Credentials cred = Credentials.readTokenStorageFile(
new File(fileLocation), conf);
loginUser.addCredentials(cred);
loginUser.spawnAutoRenewalThreadForUserCreds();
} catch (LoginException le) {
LOG.debug("failure to login", le);
throw new IOException("failure to login", le);
if (LOG.isDebugEnabled()) {
LOG.debug("UGI loginUser:"+loginUser);
return loginU
因为是第一次调用getLoginUser(),故loginUser为空，接下来会创建LoginContext并调用其login方法，login方法最终会调用HadoopLoginModule的commit()方法。
下图是从hive-server2启动到执行HadoopLoginModule的commit()方法的调用图：
获取登陆用户的关键代码就在commit()，逻辑如下：
如果使用了kerberos，则为kerberos登陆用户。hive-server2中如何使用kerberos登陆，请查看官方文档。
如果kerberos用户为空并且没有开启security，则从系统环境变量中取
HADOOP_USER_NAME
如果环境变量中没有设置
HADOOP_USER_NAME
，则使用系统用户，即启动hive-server2进程的用户。
hive-server2启动过程中会做登陆操作并获取到登陆用户，启动之后再次调用
UserGroupInformation.getCurrentUser()
取到的用户就为登陆用户了，这样会导致所有请求到hive-server2的hql最后都会以这个用户来运行mapreduce。
提交hive任务
现在来看hive任务是怎么提交到yarn服务端然后运行mapreduce的。
为了调试简单，我在本机eclipse的hive源代码中配置
hive-site.xml、core-site.xml、mapred.xml、yarn-site.xml
连接测试集群,添加缺少的yarn依赖并解决hive-builtins中报错的问题，然后运行HiveServer2类的main方法。
，我的电脑当前登陆用户为june，故启动hive-server2的用户为june。
然后，在运行jdbc测试类，运行一个简单的sql语句，大概如下：
public static void test() {
Class.forName("org.apache.hive.jdbc.HiveDriver");
Connection conn = DriverManager.getConnection(
"jdbc:hive2://june-mint:10000/default", "", "");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("select count(1) from t");
while (rs.next())
System.out.println(rs.getString(1));
rs.close();
stmt.close();
conn.close();
} catch (SQLException se) {
se.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
查看yarn监控地址
http://192.168.56.101:8088/cluster
，可以看到提交的mapreduce任务由june用户来运行。
如何修改mapreduce任务的运行用户呢？如果了解hive提交mapreduce任务的过程的话，就应该知道hive任务会通过
org.apache.hadoop.mapred.JobClient
来提交。在JobClient的init方法中有如下代码：
public void init(JobConf conf) throws IOException {
setConf(conf);
cluster = new Cluster(conf);
clientUgi = UserGroupInformation.getCurrentUser();
JobClient类中提交mapreduce任务的代码如下，见submitJobInternal方法：
Job job = clientUgi.doAs(new PrivilegedExceptionAction&Job& () {
public Job run() throws IOException, ClassNotFoundException,
InterruptedException {
Job job = Job.getInstance(conf);
job.submit();
从前面知道，hive-server2启动中会进行登陆操作并且登陆用户为june，故clientUgi对应的登陆用户也为june，故提交的mapreduce任务也通过june用户来运行。
如何修改源代码
从上面代码可以知道，修改clientUgi的获取方式就可以改变提交任务的用户。UserGroupInformation中存在如下静态方法：
public static UserGroupInformation createRemoteUser(String user) {
if (user == null || "".equals(user)) {
throw new IllegalArgumentException("Null user");
Subject subject = new Subject();
subject.getPrincipals().add(new User(user));
UserGroupInformation result = new UserGroupInformation(subject);
result.setAuthenticationMethod(AuthenticationMethod.SIMPLE);
故可以尝试使用该方法，修改JobClient的init方法如下：
public void init(JobConf conf) throws IOException {
setConf(conf);
cluster = new Cluster(conf);
if(UserGroupInformation.isSecurityEnabled()){
clientUgi = UserGroupInformation.getCurrentUser();
String user = conf.get("myExecuteName","NoName");
clientUgi = UserGroupInformation.createRemoteUser(user);
上面代码是在没有开启security的情况下，从环境变量（myExecuteName）获取jdbc客户端指定的用户名，然后创建一个远程的UserGroupInformation。
为什么从环境变量中获取用户名称？
在不考虑安全的情况下，可以由客户端任意指定用户。
没有使用jdbc连接信息中的用户，是因为这样会导致每次获取jdbc连接的时候都要指定用户名，这样就没法使用已有的连接池。
编译代码、替换class文件，然后重新运行HiveServer2以及jdbc测试类，查看yarn监控地址
http://192.168.56.101:8088/cluster
，截图如下：
这时候mapreduce的运行用户变为NoName，这是因为从JobConf环境变量中找不到myExecuteName变量而使用默认值NoName的原因。
查看hive-server2运行日志，会发现任务运行失败，关键异常信息如下：
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=NoName, access=WRITE, inode="/tmp/hive-june/hive__21-18-12_812_949668/_tmp.-ext-10001":june:hadoop:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4705)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4687)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4661)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:2696)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:2663)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:2642)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:610)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename
出现上述异常是因为，mapreduce任务在运行过程中会生成一些临时文件，而NoName用户对临时文件没有写的权限，这些临时文件属于june用户。查看hdfs文件如下：
[root@edh1 lib]# hadoop fs -ls /tmp/
Found 6 items
drwx------
- june hadoop
01:33 /tmp/hadoop-yarn
drwxr-xr-x
- june hadoop
06:52 /tmp/hive-june
/tmp/hive-june
是hive执行过程中保存在hdfs的路径，由
hive.exec.scratchdir
定义，其默认值为
/tmp/hive-${user.name}
，而且这个文件是在
org.apache.hadoop.hive.ql.Context
类的构造方法中获取并在ExecDriver类的execute(DriverContext driverContext)方法中创建的。
类似这样的权限问题还会出现在hdfs文件
重命名、删除临时目录的时候
。为了避免出现这样的异常，需要修改
hive.exec.scratchdir
为当前用户对应的临时目录路径，并使用当前登陆用户创建、重命名、删除临时目录。
hive.exec.scratchdir
对应的临时目录代码如下，在Context类的够找方法中修改：
String user = conf.get(myExecuteName，“”);
if (user != null && user.trim().length() & 0) {
nonLocalScratchPath =
new Path("/tmp/hive-" + user, executionId);
nonLocalScratchPath =
new Path(HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIR),
executionId);
找到这些操作对应的代码似乎太过复杂了，修改的地方也有很多，因为这里是使用的hive-server2，故在对应的jdbc代码中修改似乎会简单很多，例如修改HiveSessionImpl类的以下三个方法：
public OperationHandle executeStatement(String statement, Map&String, String& confOverlay) throws HiveSQLException{}
public void cancelOperation(final OperationHandle opHandle) throws HiveSQLException {}
public void closeOperation(final OperationHandle opHandle) throws HiveSQLException {}
第一个方法是运行sql语句，第二个方法是取消运行，第三个方法是关闭连接。
executeStatement中所做的修改如下，将
operation.run();
if (operation instanceof SQLOperation) {
String user = hiveConf.getVar(ConfVars.HIVE_SERVER2_MAPREDUCE_USERNAME);
ugi = UserGroupInformation.createRemoteUser(user);
ugi.doAs(new PrivilegedExceptionAction&CommandProcessorResponse&() {
public CommandProcessorResponse run() throws HiveSQLException {
operation.run();
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
operation.run();
这里添加了判断，当operation操作时，才执行下面代码，这是为了保证从hive环境变量中获取myExecuteName的值不为空时才创建UserGroupInformation。
myExecuteName是新定义的hive变量，主要是用于jdbc客户端通过set语句设置myExecuteName的值为当前登陆用户名称，然后在执行sql语句。代码如下：
Statement stmt = conn.createStatement();
stmt.execute("set myExecuteName=aaaa");
ResultSet rs = stmt.executeQuery("select count(1) from t");
while (rs.next())
System.out.println(rs.getString(1));
上面修改的类包括：
org.apache.hadoop.mapred.JobClient //从环境变量获取从jdbc客户端传过来的用户，即myExecuteName的值，然后以该值运行mapreduce用户
org.apache.hadoop.hive.ql.Context
//修改hive.exec.scratchdir的地址为从jdbc客户端传过来的用户对应的临时目录
org.apache.hive.service.cli.session.HiveSessionImpl //修改运行sql、取消操作、关闭连接对应的方法
是用javachen用户测试,hdfs上的临时目录如下：
[root@edh1 lib]# hadoop fs -ls /tmp/
Found 7 items
drwx------
01:33 /tmp/hadoop-yarn
drwxr-xr-x
- javachen.com hadoop
07:30 /tmp/hive-javachen.com
drwxr-xr-x
06:52 /tmp/hive-june
drwxr-xr-x
14:13 /tmp/hive-root
drwxrwxrwt
07:30 /tmp/logs
监控页面截图：
除了简单测试之外，还需要测试修改后的代码是否影响源代码的运行以及hive cli的运行。
Enjoy it ！
Rumblings by a Java guy on Java, Hadoop, Pentaho, Python and so on
原文地址：, 感谢原作者分享。
您可能感兴趣的代码

hive overwrite会删除使用insert into ..select 语句时mapReduce不执行，卡在kill command

我要回帖

更多关于 hive insert into 的文章

随机推荐