安装这个spark-1.6.0-bin-spark hadoop区别2.6.tgz,spark hadoop区别版本是2.7的可以吗

博客访问: 4112798
博文数量: 504
博客积分: 13065
博客等级: 上将
技术积分: 9190
注册时间:
https://github.com/eyjian
分类: HADOOP 12:51:57
本文约定Hadoop&2.7.1安装在/data/hadoop/current,而Spark&1.6.0被安装在/data/hadoop/spark,其中/data/hadoop/spark为指向/data/hadoop/spark。
Spark官网为:(Shark官网为:,Shark已成为Spark的一个模块,不再需要单独安装)。
以cluster模式运行Spark,不介绍client模式。
2.&安装Scala
联邦理工学院洛桑(EPFL)的Martin&Odersky于2001年基于Funnel的工作开始设计Scala。
Scala是一种多范式的编程语言,设计初衷是要集成纯面向对象编程和函数式编程的各种特性。运行在Java虚拟机JVM之上,兼容现有的Java程序,并可调用Java类库。Scala包含编译器和类库,以BSD许可证发布。
Spark使用Scala开发的,在安装Spark之前,先在各个节上将Scala安装好。Scala的官网为:,下载网址为:,本文下载的是二进制安装包scala-2.11.7.tgz。
本文以root用户(实则也可以非root用户,建议事先规划好)将Scala安装在/data/scala,其中/data/scala是指向/data/scala-2.11.7的软链接。
安装方法非常简单,将scala-2.11.7.tgz上传到/data目录,然后在/data/目录下对scala-2.11.7.tgz进行解压。
接着,建立软链接:ln&-s&/data/scala-2.11.7&/data/scala。
2.3.&设置环境变量
Scala被安装完成后,需要将它添加到PATH环境变量中,可以直接修改/etc/profile文件,加入以下内容即可:
export&SCALA_HOME=/data/scala
export&PATH=$SCALA_HOME/bin:$PATH
3.&安装Spark
Spark的安装以非root用户进行,本文以hadoop用户安装它。
本文下载的二进制安装包,推荐这种方式,否则编译还得折腾。下载网址为:,本文下载的是spark-1.6.0-bin-hadoop2.6.tgz,这个可以直接跑在YARN上。
1)&将spark-1.6.0-bin-hadoop2.6.tgz上传到目录/data/hadoop下
2)&解压:tar&xzf&spark-1.6.0-bin-hadoop2.6.tgz
3)&建立软链接:ln&-s&spark-1.6.0-bin-hadoop2.6&spark
在yarn上运行spark,不需要每台机器都安装spark,可以只安装在一台机器上。但是只能在被安装的机器上运行spark,原因很简单:需要调用spark的文件。
3.3.1.&修改conf/spark-env.sh
可以spark-env.sh.template复制一份,然后增加以下内容:
HADOOP_CONF_DIR=/data/hadoop/current/etc/hadoop
YARN_CONF_DIR=/data/hadoop/current/etc/hadoop
4.&启动Spark
由于运行在Yarn上,所以没有启动Spark这一过程。而是在执行命令spark-submit时,由Yarn调度运行Spark。
4.1.&运行自带示例
./bin/spark-submit&--class&org.apache.spark.examples.SparkPi&\
&&&&&&&&&&&&&&&&&&&--master&yarn&--deploy-mode&cluster&\
&&&&&&&&&&&&&&&&&&&--driver-memory&4g&\
&&&&&&&&&&&&&&&&&&&--executor-memory&2g&\
&&&&&&&&&&&&&&&&&&&--executor-cores&1&\
&&&&&&&&&&&&&&&&&&&--queue&default&\
&&&&&&&&&&&&&&&&&&&lib/spark-examples*.jar&10
运行输出:
16/02/03&16:08:33&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&RUNNING)
16/02/03&16:08:34&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&RUNNING)
16/02/03&16:08:35&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&RUNNING)
16/02/03&16:08:36&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&RUNNING)
16/02/03&16:08:37&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&RUNNING)
16/02/03&16:08:38&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&RUNNING)
16/02/03&16:08:39&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&RUNNING)
16/02/03&16:08:40&INFO&yarn.Client:&Application&report&for&application_8_0007&(state:&FINISHED)
16/02/03&16:08:40&INFO&yarn.Client:&
&&&&&&&&&client&token:&N/A
&&&&&&&&&diagnostics:&N/A
&&&&&&&&&ApplicationMaster&host:&10.225.168.251
&&&&&&&&&ApplicationMaster&RPC&port:&0
&&&&&&&&&queue:&default
&&&&&&&&&start&time:&5
&&&&&&&&&final&status:&SUCCEEDED
&&&&&&&&&tracking&URL:&http://hadoop-168-254:8088/proxy/application_8_0007/
&&&&&&&&&user:&hadoop
16/02/03&16:08:40&INFO&util.ShutdownHookManager:&Shutdown&hook&called
16/02/03&16:08:40&INFO&util.ShutdownHookManager:&Deleting&directory&/tmp/spark-7fcc-4d8d-c54c5eac
4.2.&SparkSQL&Cli
通过运行即可进入SparkSQL&Cli交互界面,但要在Yarn上以cluster运行,则需要指定参数--master值为yarn(注意不支持参数--deploy-mode的值为cluster,也就是只能以client模式运行在Yarn上):
./bin/spark-sql&--master&yarn
为什么SparkSQL&Cli只能以client模式运行?其实很好理解,既然是交互,需要看到输出,这个时候cluster模式就没法做到了。因为cluster模式,ApplicationMaster在哪机器上运行,是由Yarn动态确定的。
5.&和Hive集成
Spark集成Hive非常简单,只需以下几步:
1)&在spark-env.sh中加入HIVE_HOME,如:export&HIVE_HOME=/data/hadoop/hive
2)&将Hive的hive-site.xml和hive-log4j.properties两个文件复制到Spark的conf目录下。
完成后,再次执行spark-sql进入Spark的SQL&Cli,运行命令show&tables即可看到在Hive中创建的表。
./spark-sql&--master&yarn&--driver-class-path&/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar
6.&Java开发
Spark的Java编程示例:。
import&org.apache.spark.SparkC
import&org.apache.spark.api.java.JavaSparkC
import&org.apache.spark.sql.R
import&org.apache.spark.sql.hive.HiveC
public&class&SparkSQLHiveOnYarn&{
&&public&static&void&main(String[]&args)&throws&Exception&{
&&&&System.out.println("start");
&&&&SparkConf&sparkConf&=&new&SparkConf().setAppName("SparkSQLHiveOnYarnTest");
&&&&JavaSparkContext&ctx&=&new&JavaSparkContext(sparkConf);
&&&&HiveContext&hc&=&new&HiveContext(ctx.sc());
&&&&hc.sql("use&default");&&//&选择使用哪个DB
&&&&Row[]&result&=&hc.sql("select&count(1)&from&test").collect();
&&&&System.out.println(result[0]);
&&&&ctx.stop();
打包成jar后,运行(假设jar包放在/tmp目录下):
spark-submit&--master&&yarn&\
&&&&&&&&&&&&&--class&testspark.SparkSQLHiveOnYarn&\
&&&&&&&&&&&&&--driver-memory&4G&\
&&&&&&&&&&&&&--driver-java-options&"-XX:MaxPermSize=4G"&\
&&&&&&&&&&&&&--verbose&\
&&&&&&&&&&&&&--jars&$HIVE_HOME/lib/mysql-connector-java-5.1.38-bin.jar&\
&&&&&&&&&&&&&/tmp/testspark.jar&
7.&常见错误
7.1.&错误1:unknown&queue:&thequeue
./bin/spark-submit&--class&org.apache.spark.examples.SparkPi&--master&yarn&--deploy-mode&cluster&--driver-memory&4g&--executor-memory&2g&--executor-cores&1&--queue&thequeue&lib/spark-examples*.jar&10
时报如下错误,只需要将“--queue&thequeue”改成“--queue&default”即可。
16/02/03&15:57:36&INFO&yarn.Client:&Application&report&for&application_8_0004&(state:&FAILED)
16/02/03&15:57:36&INFO&yarn.Client:&
&&&&&&&&&client&token:&N/A
&&&&&&&&&diagnostics:&Application&application_8_0004&submitted&by&user&hadoop&to&unknown&queue:&thequeue
&&&&&&&&&ApplicationMaster&host:&N/A
&&&&&&&&&ApplicationMaster&RPC&port:&-1
&&&&&&&&&queue:&thequeue
&&&&&&&&&start&time:&7
&&&&&&&&&final&status:&FAILED
&&&&&&&&&tracking&URL:&http://hadoop-168-254:8088/proxy/application_8_0004/
&&&&&&&&&user:&hadoop
16/02/03&15:57:36&INFO&yarn.Client:&Deleting&staging&directory&.sparkStaging/application_8_0004
Exception&in&thread&"main"&org.apache.spark.SparkException:&Application&application_8_0004&finished&with&failed&status
&&&&&&&&at&org.apache.spark.deploy.yarn.Client.run(Client.scala:1029)
&&&&&&&&at&org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
&&&&&&&&at&org.apache.spark.deploy.yarn.Client.main(Client.scala)
&&&&&&&&at&sun.reflect.NativeMethodAccessorImpl.invoke0(Native&Method)
&&&&&&&&at&sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
&&&&&&&&at&sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
&&&&&&&&at&java.lang.reflect.Method.invoke(Method.java:606)
&&&&&&&&at&org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
&&&&&&&&at&org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
&&&&&&&&at&org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
&&&&&&&&at&org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
&&&&&&&&at&org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/02/03&15:57:36&INFO&util.ShutdownHookManager:&Shutdown&hook&called
16/02/03&15:57:36&INFO&util.ShutdownHookManager:&Deleting&directory&/tmp/spark-5d02-41be-8b9e-92f4b0f05807
7.2.&SPARK_CLASSPATH&was&detected
SPARK_CLASSPATH&was&detected&(set&to&'/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar:').
This&is&deprecated&in&Spark&1.0+.
Please&instead&use:
&-&./spark-submit&with&--driver-class-path&to&augment&the&driver&classpath
&-&spark.executor.extraClassPath&to&augment&the&executor&classpath
意思是不推荐在spark-env.sh中设置环境变量SPARK_CLASSPATH,可以改成如下推荐的方式:
./spark-sql&--master&yarn&--driver-class-path&/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar
8.&相关文档
《HBase-0.98.0分布式安装指南》
《Hive&0.12.0安装指南》
《ZooKeeper-3.4.6分布式安装指南》
《Hadoop&2.3.0源码反向工程》
《在Linux上编译Hadoop-2.4.0》
《Accumulo-1.5.1安装指南》
《Drill&1.0.0安装指南》
《Shark&0.9.1安装指南》
更多,敬请关注技术博客:。
阅读(2870) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~
请登录后评论。> spark1.6.0搭建(基于hadoop2.6.0分布式)
spark1.6.0搭建(基于hadoop2.6.0分布式)
2018阿里云全部产品优惠券(升级也可以使用,强烈推荐!!!)领取地址:
相关推荐:1.准备Linux环境 1.0点击VMware快捷方式,右键打开文件所在位置 -& 双击vmnetcfg.exe -& VMnet1 host-only -&修改subnet ip 设置网段:192.168.1.0 子网掩码:255.255.255.0 -& apply -& ok 回到windows --& 打开网络和共享中心 -& 更改适配器设置 -& 右键VMn
本文是基于hadoop2.6.0的分布式环境搭建spark1.6.0的分布式集群。
hadoop2.6.0分布式集群可参考:
1.解压spark的包,tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz,并将其移到/usr/local/spark目录下面;
在~/.bashrc文件中配置spark的环境变量,保存并退出,执行source ~/.bashrc使之生效;
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export SCALA_HOME=/usr/local/scala/scala-2.10.4
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS=&-Djava.library.path=${HADOOP_HOME}/lib&
export SPARK_HOME=/usr/local/spark/spark-1.6.0-bin-hadoop2.6
export PATH=.:${JAVA_HOME}/bin:${SCALA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:$PATH
然后将运行下面命令,将master1上的.bashrc拷贝到四台worker上。
root@master1:~# scp ~/.bashrc root@worker1:~/
root@master1:~# scp ~/.bashrc root@worker2:~/
root@master1:~# scp ~/.bashrc root@worker3:~/
root@master1:~# scp ~/.bashrc root@worker4:~/
分别在四台worker上执行source ~/.bashrc 使配置生效。
2.配置spark环境
2.1 将conf下面的spark-env.sh.template拷贝一份到spark-env.sh中,并编辑配置。
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# cp spark-env.sh.template
spark-env.sh
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# vim spark-env.sh
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60
export export SCALA_HOME=/usr/local/scala/scala-2.10.4
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_MASTER_IP=master1
export SPARK_WORKER_MEMORY=2g
export SPARK_EXECUTOR_MEMORY=2g
export SPARK_DRIVER_MEMORY=2g
export SPARK_WORKDER_CORES=4
说明:HADOOP_CONF_DIR配置是让spark运行在yarn模式下,非常关键。
SPARK_WORKER_MEMORY,SPARK_EXECUTOR_MEMORY,SPARK_DRIVER_MEMORY,SPARK_WORKDER_CORES 根据自己的集群情况进行配置。
配置slavas:
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# cp slaves.template slaves
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# vim slaves
# A Spark Worker will be started on each of the machines listed below.
配置spark-defaults.conf:
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# cp spark-defaults.conf.template spark-defaults.conf
#添加下面的配置:
spark.executor.extraJavaOptions
-XX:+PrintGCDetails -Dkey=value -Dnumbers=&one two three&
spark.eventLog.enabled
spark.eventLog.dir
hdfs://master1:9000/historyserverforSpark
spark.yarn.historyServer.address
master1:18080
spark.history.fs.logDirectory
hdfs://master1:9000/historyserverforSpark
说明:spark.eventLog.enabled打开后并配置了spark.eventLog.dir 那么在集群运行时,会将所有运行的日志信息都记录下来,方便运维。
将master1中配置的spark通过scp命令同步到worker上面。
root@master1:/usr/local# scp -r spark/ root@worker1:/usr/local/
root@master1:/usr/local# scp -r spark/ root@worker2:/usr/local/
root@master1:/usr/local# scp -r spark/ root@worker3:/usr/local/
root@master1:/usr/local# scp -r spark/ root@worker4:/usr/local/
然后查看worker上面的/usr/local/目录,确认一下是否将spark拷贝过来。
在hdfs上创建一个historyserverforSpark目录
root@master1:/usr/local# hdfs dfs -mkdir /historyserverforSpark
16/01/24 07:46:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
root@master1:/usr/local# hdfs dfs -ls /
16/01/24 07:46:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x&& - root supergroup&&&&&&&&& 0
07:46 /historyserverforSpark
可用通过浏览器查看我们创建的目录。
3.启动spark
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-master1.out
worker4: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker4.out
worker4: failed to launch org.apache.spark.deploy.worker.Worker:
worker4: full log in /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker4.out
worker1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker1.out
worker3: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker3.out
worker1: failed to launch org.apache.spark.deploy.worker.Worker:
worker1: full log in /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker1.out
worker3: failed to launch org.apache.spark.deploy.worker.Worker:
worker3: full log in /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker3.out
worker2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker2.out
worker2: failed to launch org.apache.spark.deploy.worker.Worker:
worker2: full log in /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker2.out
从上面看worker节点没启动成功,查看日志发现没有报错,原因是虚拟机自身的问题,但是具体哪里问题还不清楚;
通过命令./sbin/stop-all.sh停止spark集群,相关推荐:Hadoop学习二:伪分布式环境搭建 标签(空格分隔): hadoop Hadoop学习二伪分布式环境搭建 一配置固定固定IP 二linux设置静态IP 三关闭防火墙 四安装jdk和hadoop 五启动hadoop 一,配置固定固定IP 以root权限修改/etc/sysconfig/network文件后文件重启虚拟机 将所有节点中/usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs目录下的日志全部删除,再次启动,集群启动成功。
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-master1.out
worker3: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker3.out
worker2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker2.out
worker1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker1.out
worker4: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-worker4.out
通过jps命令确认是否启动了Master和Worker进程:
master1上的如下:
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin# jps
4551 ResourceManager
7143 Master
4379 SecondaryNameNode
4175 NameNode
worker1上的如下:
root@worker1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs# jps
4528 Worker
2563 DataNode
2713 NodeManager
通过浏览器访问http://192.168.112.130:8080/查看控制台,有4个节点。
到此为止,spark集群已经搭建完成!!!
启动history-server进程,记录集群的运行情况,即使重启后也能恢复之前的运行信息。
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin# ./start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-master1.out
通过http://192.168.112.130:18080/查看History Server
运行例子:计算pi
位置:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/examples/src/main/scala/org/apache/spark/examples
源码如下:
// scalastyle:off println
package org.apache.spark.examples
import scala.math.random
import org.apache.spark._
/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(&Spark Pi&)
val spark = new SparkContext(conf)
val slices = if (args.length & 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =&
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y & 1) 1 else 0
}.reduce(_ + _)
println(&Pi is roughly & + 4.0 * count / n)
spark.stop()
// scalastyle:on println
设置并行度为5000,在运行的过程中,方便通过浏览器查看;
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin# ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://master1:7077 ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 5000
通过浏览器查看任务:
运行结果:Pi is roughly 3.
从打印日志上看为什么程序启动的这么快?
答:因为spark使用了Coarse Grained(粗粒度);
粗粒度就是在程序启动初始化的那一个时刻就为分配资源,后续程序计算时直接使用资源就行了,不需要每次计算时再分配资源。
粗粒度适合于作业非常多,而且需要资源复用时。粗粒度的一个缺点是:当并行很多时,一个作业运行时间很长,而其他作业运行很短,就会浪费资源。
细粒度就是是指程序计算时才分配资源,计算完成后立即回收资源。
通过history Server来看下运行情况:
本文是基于hadoop2.6.0的分布式环境搭建spark1.6.0的分布式集群。 hadoop2.6.0分布式集群可参考: http://kevin12.iteye.com/blog/.解压spark的包,tar -zxvf spark-1.6.0-bin-hadoop2
------分隔线----------------------------
相关阅读排行
相关最新文章
Coin163.com ( Coin163 ) All Rights Reserved &&centos7(vm)下spark-2.0.2-bin-hadoop2.7.tgz单机模式的安装验证(x86)
1.vm安装(略)
2.软件下载
spark-2.0.2-bin-hadoop2.7.tgz
jdk-8u77-linux-x64.tar
scala-2.10.4
3.软件安装
#rpm -qa | grep java
#rpm -e --nodepsjava-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64
...... 删除默认安装的openjdk
#cp jdk-8u77-linux-x64.tar.gz/root/java/
#tar -zxvfjdk-8u77-linux-x64.tar.g
#vi /etc/profile----修改环境变量
export JAVA_HOME=/root/java/jdk1.8.0_77
export JRE_HOME=$JAVA_HOME/jre
export PATH=$PATH:$JAVA_HOME/bin
exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#source/etc/profile
b.安装scala
#wgethttp://www.scala-lang.org/files/archive/scala-2.10.4.tgz
#tar -zxvf scala-2.10.4.tgz
#vi /etc/profile
export SCALA_HOME=/root/spark/scala-2.10.4
export PATH=$PATH:$SCALA_HOME/bin
#source /etc/profile
注:检查java和scala安装
#java-version
scala-version
c.安装spark
#tar -zxvfspark-2.0.2-bin-hadoop2.7.tgz
#cd spark-2.0.2-bin-hadoop2.7/
#bin/spark-shell
[root@localhost spark-2.0.2-bin-hadoop2.7]#bin/spark-shell
[root@localhost spark-2.0.2-bin-hadoop2.7]# bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/12/11 18:49:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/11 18:49:42 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.1.221 instead (on interface eno)
16/12/11 18:49:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/12/11 18:49:52 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://192.168.1.221:4040
Spark context available as 'sc' (master = local[*], app id = local-2).
Spark session available as 'spark'.
Welcome to
___ _____/ /__
_\ \/ _ \/ _ `/ __/
/___/ .__/\_,_/_/ /_/\_\
version 2.0.2
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@
4.安装测试
关闭防火墙
没有更多推荐了,
(window.slotbydup=window.slotbydup || []).push({
id: '4770930',
container: s,
size: '300,250',
display: 'inlay-fix'

我要回帖

更多关于 hadoop是什么 的文章

 

随机推荐