Spark RDD与DataFrame相互转换

    xiaoxiao2022-07-03  172

    @羲凡——只为了更好的活着

    Spark RDD与DataFrame相互转换

    Q:Spark中RDD转成DataFrame用什么算子 A:.rdd Q:Spark中DataFrame转成RDD用什么算子 A:.toDF

    1.直接上代码
    import org.apache.spark.rdd.RDD import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} import org.apache.spark.sql.{DataFrame, SparkSession} object Demo{ def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("Demo") .master("local[*]") .getOrCreate() val file="file:///D:\\data\\test.txt" val schema = StructType(Array( StructField("name",StringType), StructField("age",IntegerType) )) // DataFrame转换成RDD println("=========DataFrame转换成RDD=========") val dataFrame: DataFrame = spark .read .option("delimiter",",") .schema(schema) .csv(file) dataFrame.show() val rdd: RDD[(String, Int)] = dataFrame .rdd .map(t => (t.getAs[String](0),t.getAs[Int](1))) println("===转换后的结果如下===") rdd.foreach(println) // RDD转换成DataFrame println("=========RDD转换成DataFrame=========") import spark.implicits._ val rdd2: RDD[(String, String)] = spark .sparkContext .textFile(file) .map(t=>{ val arr = t.split(",") (arr(0),arr(1)) }) rdd2.foreach(println) val dataFrame2: DataFrame = rdd2 .toDF("name","age") println("===转换后的结果如下===") dataFrame2.show() spark.stop() } }
    2.结果展示
    =========DataFrame转换成RDD========= +----+---+ |name|age| +----+---+ | 扎克|227| | 赵信|200| | 魔腾|188| +----+---+ ===转换后的结果如下=== (扎克,227) (赵信,200) (魔腾,188) =========RDD转换成DataFrame========= (魔腾,188) (扎克,227) (赵信,200) ===转换后的结果如下=== +----+---+ |name|age| +----+---+ | 扎克|227| | 赵信|200| | 魔腾|188| +----+---+

    ====================================================================

    @羲凡——只为了更好的活着

    若对博客中有任何问题,欢迎留言交流

    最新回复(0)