SparkSQL以yarn-cluster读取hive数据运行的错误排查记录

    xiaoxiao2023-10-26  175

    我事先在IDEA里通过Maven将程序及所依赖的jar打成一个jar包,在三台虚拟机(master、slave1、slave2)里启动了hadoop、mysql、yarn。master和slave节点的$SPARK_HOME\conf已经有hive-site.xml,并且已把mysql连接驱动包放到每个节点的SPARK_CLASSPATH下,然后执行:

    [root@master bin]# ./spark-submit --class com.badou.JiebaKry --master yarn-cluster /home/boya/boya-1.0-SNAPSHOT.jar

    发现执行失败,在yarn UI界面里查看log信息,报如下错误:

    “Table or view not found: ‘mydb’.‘news_noseg’; line 1 pos 27; ”

    于是我就将hive-site.xml和mysql驱动包通过–files和–jars放到提交命令中,执行:

    [root@master conf]# ./spark-submit --class com.badou.JiebaKry --master yarn-cluster --files $HIVE_HOME/conf/hive-site.xml --jars /home/boya/mysql-connector-java-5.1.44-bin.jar /home/boya/boya-1.0-SNAPSHOT.jar

    又失败了,查看log,发现salve报如下错误:

    “WARN HiveMetaStore: Retrying creating default database after error: Error creating transactional connection factory javax.jdo.JDOFatalInternalException: Error creating transactional connection factory” 和

    “Hive:Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost:3306”**

    此时我猜想是slave节点无法访问master节点上的mysql数据库,我注意到错误信息中“mysql://localhost"主机名是localhost,于是我查看master上的hive-site.xml,发现主机名写的是localhost:

    <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> </property>

    于是我将其修改为master:

    <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value> </property>

    这样使得slave节点读取hive-site.xml时去连接master上的mysql,因为master节点上hive的元数据是存储在mysql里,但是重新启动hive竟然报如下错误:

    “Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). ”

    又经过一番搜寻,可能是mysql的权限问题,于是按照如下步骤:

    [root@master ~]# mysql -u root -p mysql> use mysql; mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'yourpassword' WITH GRANT OPTION; mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'master' IDENTIFIED BY 'yourpassword' WITH GRANT OPTION; mysql> flush privileges; mysql> quit; [root@master ~]# mysql -u root -p

    这里使用grant目的是允许指定host机器上的root用户远程连接到mysql,这样别的机器想要访问master的mysql,就会从user表查询,是否有%,localhost,master等数据值。改好后,重新启动mysql和hive,hive正常启动。

    接下来继续重新执行:

    [root@master conf]# ./spark-submit --class com.badou.JiebaKry --master yarn-cluster --files $HIVE_HOME/conf/hive-site.xml --jars /home/boya/mysql-connector-java-5.1.44-bin.jar /home/boya/boya-1.0-SNAPSHOT.jar

    这次终于不再出任何问题,final status: SUCCEEDED!

    最新回复(0)