• <legend id='J1wic'><style id='J1wic'><dir id='J1wic'><q id='J1wic'></q></dir></style></legend>
  • <i id='J1wic'><tr id='J1wic'><dt id='J1wic'><q id='J1wic'><span id='J1wic'><b id='J1wic'><form id='J1wic'><ins id='J1wic'></ins><ul id='J1wic'></ul><sub id='J1wic'></sub></form><legend id='J1wic'></legend><bdo id='J1wic'><pre id='J1wic'><center id='J1wic'></center></pre></bdo></b><th id='J1wic'></th></span></q></dt></tr></i><div id='J1wic'><tfoot id='J1wic'></tfoot><dl id='J1wic'><fieldset id='J1wic'></fieldset></dl></div>

    <small id='J1wic'></small><noframes id='J1wic'>

    • <bdo id='J1wic'></bdo><ul id='J1wic'></ul>

      <tfoot id='J1wic'></tfoot>

      1. CombineFileInputFormat Hadoop 0.20.205 的实现

        时间:2023-09-26

          <small id='JoFxm'></small><noframes id='JoFxm'>

          <i id='JoFxm'><tr id='JoFxm'><dt id='JoFxm'><q id='JoFxm'><span id='JoFxm'><b id='JoFxm'><form id='JoFxm'><ins id='JoFxm'></ins><ul id='JoFxm'></ul><sub id='JoFxm'></sub></form><legend id='JoFxm'></legend><bdo id='JoFxm'><pre id='JoFxm'><center id='JoFxm'></center></pre></bdo></b><th id='JoFxm'></th></span></q></dt></tr></i><div id='JoFxm'><tfoot id='JoFxm'></tfoot><dl id='JoFxm'><fieldset id='JoFxm'></fieldset></dl></div>
            <legend id='JoFxm'><style id='JoFxm'><dir id='JoFxm'><q id='JoFxm'></q></dir></style></legend>

                • <bdo id='JoFxm'></bdo><ul id='JoFxm'></ul>
                  <tfoot id='JoFxm'></tfoot>
                    <tbody id='JoFxm'></tbody>

                  本文介绍了CombineFileInputFormat Hadoop 0.20.205 的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  有人可以指出我在哪里可以找到 CombineFileInputFormat 的实现(组织.使用 Hadoop 0.20.205?这是使用 EMR 从非常小的日志文件(行中的文本)创建大拆分.

                  Can someone please point out where I could find an implementation for CombineFileInputFormat (org. using Hadoop 0.20.205? this is to create large splits from very small log files (text in lines) using EMR.

                  令人惊讶的是,Hadoop 没有专门为此目的创建的此类的默认实现,并且在谷歌上搜索看起来好像我不是唯一对此感到困惑的人.我需要编译该类并将其捆绑在一个 jar 中用于 hadoop-streaming,但对 Java 的了解有限,这是一个挑战.

                  It is surprising that Hadoop does not have a default implementation for this class made specifically for this purpose and googling it looks like I'm not the only one confused by this. I need to compile the class and bundle it in a jar for hadoop-streaming, with a limited knowledge of Java this is some challenge.

                  我已经尝试过使用必要导入的 Yetitrails 示例,但我得到下一个方法的编译器错误.

                  I already tried the yetitrails example, with the necessary imports but I get a compiler error for the next method.

                  推荐答案

                  这是我为你准备的一个实现:

                  Here is an implementation I have for you:

                  import java.io.IOException;
                  
                  import org.apache.hadoop.conf.Configuration;
                  import org.apache.hadoop.io.LongWritable;
                  import org.apache.hadoop.io.Text;
                  import org.apache.hadoop.mapred.FileSplit;
                  import org.apache.hadoop.mapred.InputSplit;
                  import org.apache.hadoop.mapred.JobConf;
                  import org.apache.hadoop.mapred.LineRecordReader;
                  import org.apache.hadoop.mapred.RecordReader;
                  import org.apache.hadoop.mapred.Reporter;
                  import org.apache.hadoop.mapred.lib.CombineFileInputFormat;
                  import org.apache.hadoop.mapred.lib.CombineFileRecordReader;
                  import org.apache.hadoop.mapred.lib.CombineFileSplit;
                  
                  @SuppressWarnings("deprecation")
                  public class CombinedInputFormat extends CombineFileInputFormat<LongWritable, Text> {
                  
                      @SuppressWarnings({ "unchecked", "rawtypes" })
                      @Override
                      public RecordReader<LongWritable, Text> getRecordReader(InputSplit split, JobConf conf, Reporter reporter) throws IOException {
                  
                          return new CombineFileRecordReader(conf, (CombineFileSplit) split, reporter, (Class) myCombineFileRecordReader.class);
                      }
                  
                      public static class myCombineFileRecordReader implements RecordReader<LongWritable, Text> {
                          private final LineRecordReader linerecord;
                  
                          public myCombineFileRecordReader(CombineFileSplit split, Configuration conf, Reporter reporter, Integer index) throws IOException {
                              FileSplit filesplit = new FileSplit(split.getPath(index), split.getOffset(index), split.getLength(index), split.getLocations());
                              linerecord = new LineRecordReader(conf, filesplit);
                          }
                  
                          @Override
                          public void close() throws IOException {
                              linerecord.close();
                  
                          }
                  
                          @Override
                          public LongWritable createKey() {
                              // TODO Auto-generated method stub
                              return linerecord.createKey();
                          }
                  
                          @Override
                          public Text createValue() {
                              // TODO Auto-generated method stub
                              return linerecord.createValue();
                          }
                  
                          @Override
                          public long getPos() throws IOException {
                              // TODO Auto-generated method stub
                              return linerecord.getPos();
                          }
                  
                          @Override
                          public float getProgress() throws IOException {
                              // TODO Auto-generated method stub
                              return linerecord.getProgress();
                          }
                  
                          @Override
                          public boolean next(LongWritable key, Text value) throws IOException {
                  
                              // TODO Auto-generated method stub
                              return linerecord.next(key, value);
                          }
                  
                      }
                  }
                  

                  在您的工作中,首先根据您希望将输入文件组合成的大小设置参数 mapred.max.split.size.在您的 run() 中执行以下操作:

                  In your job first set the parameter mapred.max.split.size according to the size you would like the input files to be combined into. Do something like follows in your run():

                  ...
                              if (argument != null) {
                                  conf.set("mapred.max.split.size", argument);
                              } else {
                                  conf.set("mapred.max.split.size", "134217728"); // 128 MB
                              }
                  ...
                  
                              conf.setInputFormat(CombinedInputFormat.class);
                  ...
                  

                  这篇关于CombineFileInputFormat Hadoop 0.20.205 的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:hadoop java.net.URISyntaxException:绝对 URI 中的相对路径 下一篇:如何在 Hadoop-.20 api 中指定 KeyValueTextInputFormat 分隔

                  相关文章

                  最新文章

                      <tfoot id='WXy4P'></tfoot><legend id='WXy4P'><style id='WXy4P'><dir id='WXy4P'><q id='WXy4P'></q></dir></style></legend>
                    1. <i id='WXy4P'><tr id='WXy4P'><dt id='WXy4P'><q id='WXy4P'><span id='WXy4P'><b id='WXy4P'><form id='WXy4P'><ins id='WXy4P'></ins><ul id='WXy4P'></ul><sub id='WXy4P'></sub></form><legend id='WXy4P'></legend><bdo id='WXy4P'><pre id='WXy4P'><center id='WXy4P'></center></pre></bdo></b><th id='WXy4P'></th></span></q></dt></tr></i><div id='WXy4P'><tfoot id='WXy4P'></tfoot><dl id='WXy4P'><fieldset id='WXy4P'></fieldset></dl></div>
                        <bdo id='WXy4P'></bdo><ul id='WXy4P'></ul>

                      <small id='WXy4P'></small><noframes id='WXy4P'>