如何将 .txt 文件转换为 Hadoop 的序列文件格式

时间：2023-07-27

本文介绍了如何将 .txt 文件转换为 Hadoop 的序列文件格式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为了有效利用 Hadoop 中的 map-reduce 作业，我需要将数据存储在 hadoop的序列文件格式.但是，目前数据只有平面 .txt 格式.谁能建议我可以将 .txt 文件转换为序列文件的方法?

To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop's sequence file format. However,currently the data is only in flat .txt format.Can anyone suggest a way i can convert a .txt file to a sequence file?

推荐答案

所以更简单的答案就是一个具有 SequenceFile 输出的身份"作业.

在java中看起来像这样:

So the way more simplest answer is just an "identity" job that has a SequenceFile output.

Looks like this in java:

    public static void main(String[] args) throws IOException,
        InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(Mapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    // increase if you need sorting or a special number of files
    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    TextInputFormat.addInputPath(job, new Path("/lol"));
    SequenceFileOutputFormat.setOutputPath(job, new Path("/lolz"));

    // submit and wait for completion
    job.waitForCompletion(true);
   }

这篇关于如何将 .txt 文件转换为 Hadoop 的序列文件格式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持html5模板网！

上一篇：Double.parseDouble(String) 和 Double.valueOf(String) 有什么 下一篇：2字节短java

如何将 .txt 文件转换为 Hadoop 的序列文件格式

问题描述

推荐答案

相关文章

最新文章