安装spark并编写scala,java程序实现wordcount

    xiaoxiao2025-06-21  23

    安装spark并编写scala,java程序实现wordcount

    一、安装scala

    1、官网下载安装Scala:scala-2.12.8.tgzhttps://www.scala-lang.org/download/

    2、利用xftp把文件从本地路径上传到centos路径下

    3、解压:

    tar -zxvf scala-2.12.8.tgz -C /opt/module

    4、修改文件名:

    mv scala-2.12.8 scala

    5、测试:

    scala -version

    出现scala版本信息说明安装配置成功

    5、启动:scala

    二、安装spark

    1、官网下载安装Spark:spark-2.4.2-bin-hadoop2.7.tgzhttps://www.apache.org/dyn/closer.lua/spark/spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz

    2、利用xftp把文件从本地路径上传到centos路径下

    3、解压压缩包

    tar -zxvf spark-2.4.2-bin-hadoop2.7.tgz -C /opt/module

    3、先启动hadoop 环境

    start-all.sh

    4、启动spark环境 进入到SPARK_HOME/sbin下运行start-all.sh

    cd /opt/module/spark/sbin start-all.sh

    三、搭建spark伪分布

    配置spark-env.sh:

    vi spark-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_211-amd64 export SCALA_HOME=/usr/share/scala export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.7 export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.7.7/etc/hadoop export SPARK_MASTER_HOST=bigdata export SPARK_MASTER_PORT=7077 export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native

    配置etc/profile:

    vi etc/profile export JAVA_HOME=/usr/java/jdk1.8.0_211-amd64 export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.7 export HBASE_HOME=/usr/local/hbase/hbase-1.4.9 export HIVE_HOME=/usr/local/hive/apache-hive-2.3.4-bin export SPARK_HOME=/usr/local/spark/spark-2.4.2-bin-hadoop2.7 export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native

    使其生效:

    source profile

    进入Spark 的 sbin 目录执行 start-all.sh 启动 spark:

    ./start-all.sh

    进入spark界面:

    spark-shell

    四、安装sbt

    参考网址:http://dblab.xmu.edu.cn/blog/1307-2/

    五、统计本地文件

    val textFile = sc.textFile("file:///usr/local/spark/mycode/wordcount/word.txt") wordCount.collect()

    六、scala程序实现wordcount统计

    spark-submit --class "WordCount" /usr/local/spark/mycode/wordcount/target/scala-2.11/simple-project_2.11-4.1.jar

    相关scala程序:

    import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object WordCount { def main(args: Array[String]) { val inputFile = "file:///usr/local/spark/mycode/wordcount/word.txt" val conf = new SparkConf().setAppName("WordCount").setMaster("local[2]") val sc = new SparkContext(conf) val textFile = sc.textFile(inputFile) val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) wordCount.foreach(println) } }

     

     

     

    最新回复(0)