8.Spark Standalone Cluster环境搭建及验证

    xiaoxiao2022-07-12  156

    前置准备:Hadoop集群环境搭建

    1.在master虚拟机设置spark-env.sh

    (1)复制模板文件来创建spark-env.sh

             cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh

    (2)修改spark-env.sh

            sudo vim /usr/local/spark/conf/spark-env.sh

            export SPARK_MASTER_IP=master

            export SPARK_WORKER_CORES=1

            export SPARK_WORKER_MEMORY=128m

            export SPARK_WORKER_INSTANCES=4

    2.复制spark程序到data1、data2、data3

    (1)复制spark程序到data1

             ssh data1

             sudo mkdir /usr/local/spark

             sudo chown hduser:hduser /usr/local/spark

             exit

             sudo scp -r /usr/local/spark hduser@data1:/usr/local

    (2)复制spark程序到data2

             ssh data2

             sudo mkdir /usr/local/spark

             sudo chown hduser:hduser /usr/local/spark

             exit

             sudo scp -r /usr/local/spark hduser@data2:/usr/local

    (3)复制spark程序到data3

             ssh data3

             sudo mkdir /usr/local/spark

             sudo chown hduser:hduser /usr/local/spark

             exit

             sudo scp -r /usr/local/spark hduser@data3:/usr/local

    3.在master虚拟机编辑slaves文件

            1.修改slaves文件

               sudo vim /usr/local/spark/conf/slaves

               data1

               data2

               data3

    4.启动Spark Standalone Cluster

            /usr/local/spark/sbin/start-all.sh       

           或

           /usr/local/spark/sbin/start-master.sh       

          /usr/local/spark/sbin/start-slaves.sh

         注意:

         需要在spark-config.sh配置JAVA_HOME

     

      5.运行pyspark

         pyspark --master spark://master:7077 --num-executors 1 --total-executor-cores 3 --executor-memory 512m

    最新回复(0)