前置准备Hadoop多节点集群
1.启动hadoop集群
start-all.sh
2.启动Spark Stand Alone cluster
/usr/local/spark/sbin/start-all.sh
3.运行IPython Notebook来使用Spark
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 2 --executor-memory 512m