使用hadoop的压缩方式进行压缩和解压

    xiaoxiao2022-07-12  161

    压缩算法及其编码/解码器

    压缩格式对应的编码/解码器DEFLATEorg.apache.hadoop.io.compress.DefaultCodecgziporg.apache.hadoop.io.compress.GzipCodecbziporg.apache.hadoop.io.compress.BZip2CodecSnappyorg.apache.hadoop.io.compress.SnappyCodec

    压缩过程实现: 接受一个字符串参数,用于指定编码/解码器,使用反射机制创建对应的并对相应的编码解码对象,对文件进行压缩。

    public static void compress(String method) throws ClassNotFoundException, IOException { File fileIn = new File("adult.data"); //输入流 FileInputStream in = new FileInputStream(fileIn); Class<?> codecClass = Class.forName(method); Configuration conf = new Configuration(); //通过名称找对应的编码/解码器 CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf); File fileOut = new File("adult.data" + codec.getDefaultExtension()); fileOut.delete(); //文件输出流 FileOutputStream out = new FileOutputStream(fileOut); //通过编码/解码器创建对应的输出流 CompressionOutputStream cout = codec.createOutputStream(out); //压缩 IOUtils.copyBytes(in,cout,4096,false); in.close(); cout.close(); }

    解压缩过程实现: 解压文件时,通常通过指定其拓展名来推断解码器。

    public static void decompress(File file) throws IOException { Configuration conf = new Configuration(); CompressionCodecFactory factory = new CompressionCodecFactory(conf); //通过文件拓展名获得相应的编码/解码器 CompressionCodec codec = factory.getCodec(new Path(file.getName())); if(codec == null){ System.out.println("Cannot find codec for file " + file); } File fileOut = new File(file.getName()); //通过编码/解码器创建对应的输入流 CompressionInputStream in = codec.createInputStream(new FileInputStream(file)); FileOutputStream out = new FileOutputStream(new File("adult.data.decompress")); IOUtils.copyBytes(in,out,4096,false); in.close(); out.close(); }
    最新回复(0)