<i id='HirL5'><tr id='HirL5'><dt id='HirL5'><q id='HirL5'><span id='HirL5'><b id='HirL5'><form id='HirL5'><ins id='HirL5'></ins><ul id='HirL5'></ul><sub id='HirL5'></sub></form><legend id='HirL5'></legend><bdo id='HirL5'><pre id='HirL5'><center id='HirL5'></center></pre></bdo></b><th id='HirL5'></th></span></q></dt></tr></i><div id='HirL5'><tfoot id='HirL5'></tfoot><dl id='HirL5'><fieldset id='HirL5'></fieldset></dl></div>

<small id='HirL5'></small><noframes id='HirL5'>

  • <tfoot id='HirL5'></tfoot>

    • <bdo id='HirL5'></bdo><ul id='HirL5'></ul>
        <legend id='HirL5'><style id='HirL5'><dir id='HirL5'><q id='HirL5'></q></dir></style></legend>

        在运行 Hadoop MapReduce 作业时获取文件名/文件数据

        时间:2023-09-26
            <tbody id='RNMk5'></tbody>

          • <small id='RNMk5'></small><noframes id='RNMk5'>

              <legend id='RNMk5'><style id='RNMk5'><dir id='RNMk5'><q id='RNMk5'></q></dir></style></legend>
                  <bdo id='RNMk5'></bdo><ul id='RNMk5'></ul>

                • <tfoot id='RNMk5'></tfoot>
                  <i id='RNMk5'><tr id='RNMk5'><dt id='RNMk5'><q id='RNMk5'><span id='RNMk5'><b id='RNMk5'><form id='RNMk5'><ins id='RNMk5'></ins><ul id='RNMk5'></ul><sub id='RNMk5'></sub></form><legend id='RNMk5'></legend><bdo id='RNMk5'><pre id='RNMk5'><center id='RNMk5'></center></pre></bdo></b><th id='RNMk5'></th></span></q></dt></tr></i><div id='RNMk5'><tfoot id='RNMk5'></tfoot><dl id='RNMk5'><fieldset id='RNMk5'></fieldset></dl></div>
                  本文介绍了在运行 Hadoop MapReduce 作业时获取文件名/文件数据作为 Map 的键/值输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  我解决了这个问题 如何在运行 Hadoop MapReduce 作业时获取文件名/文件内容作为 MAP 的键/值输入? 在这里.虽然它解释了这个概念,但我无法成功地将其转换为代码.

                  I went through the question How to get Filename/File Contents as key/value input for MAP when running a Hadoop MapReduce Job? here. Though it explains the concept, I am unable to successfully transform it to code.

                  基本上,我希望文件名作为键,文件数据作为值.为此,我按照上述问题中的建议编写了一个自定义 RecordReader .但是我不明白如何在这个类中获取文件名作为键.另外,在编写自定义 FileInputFormat 类时,我无法理解如何返回我之前编写的自定义 RecordReader.

                  Basically, I want the file name as key and the file data as value. For that I wrote a custom RecordReader as recommended in the aforementioned question. But I couldn't understand how to get the file name as the key in this class. Also, while writing the custom FileInputFormat class, I couldn't understand how to return the custom RecordReader I wrote previously.

                  RecordReader 代码为:

                  import java.io.IOException;
                  import org.apache.hadoop.io.Text;
                  import org.apache.hadoop.mapreduce.InputSplit;
                  import org.apache.hadoop.mapreduce.RecordReader;
                  import org.apache.hadoop.mapreduce.TaskAttemptContext;
                  
                  public class CustomRecordReader extends RecordReader<Text, Text> {
                  
                      private static final String LINE_SEPARATOR = System.getProperty("line.separator");
                  
                      private StringBuffer valueBuffer = new StringBuffer("");
                      private Text key = new Text();
                      private Text value = new Text();
                      private RecordReader<Text, Text> recordReader;
                  
                      public SPDRecordReader(RecordReader<Text, Text> recordReader) {
                          this.recordReader = recordReader;
                      }
                  
                      @Override
                      public void close() throws IOException {
                          recordReader.close();
                      }
                  
                      @Override
                      public Text getCurrentKey() throws IOException, InterruptedException {
                          return key;
                      }
                  
                      @Override
                      public Text getCurrentValue() throws IOException, InterruptedException {
                          return value;
                      }
                  
                      @Override
                      public float getProgress() throws IOException, InterruptedException {
                          return recordReader.getProgress();
                      }
                  
                      @Override
                      public void initialize(InputSplit arg0, TaskAttemptContext arg1)
                              throws IOException, InterruptedException {
                          recordReader.initialize(arg0, arg1);
                      }
                  
                      @Override
                      public boolean nextKeyValue() throws IOException, InterruptedException {
                  
                          if (valueBuffer.equals("")) {
                              while (recordReader.nextKeyValue()) {
                                  valueBuffer.append(recordReader.getCurrentValue());
                                  valueBuffer.append(LINE_SEPARATOR);
                              }
                              value.set(valueBuffer.toString());
                              return true;
                          }
                          return false;
                      }
                  
                  }
                  

                  而不完整的FileInputFormat类是:

                  import java.io.IOException;
                  import org.apache.hadoop.fs.FileSystem;
                  import org.apache.hadoop.fs.Path;
                  import org.apache.hadoop.io.Text;
                  import org.apache.hadoop.mapred.FileInputFormat;
                  import org.apache.hadoop.mapred.InputSplit;
                  import org.apache.hadoop.mapred.JobConf;
                  import org.apache.hadoop.mapred.RecordReader;
                  import org.apache.hadoop.mapred.Reporter;
                  
                  public class CustomFileInputFormat extends FileInputFormat<Text, Text> {
                  
                      @Override
                      protected boolean isSplitable(FileSystem fs, Path filename) {
                          return false;
                      }
                  
                      @Override
                      public RecordReader<Text, Text> getRecordReader(InputSplit arg0, JobConf arg1,
                              Reporter arg2) throws IOException {
                          return null;
                      }
                  }
                  

                  推荐答案

                  在你的 CustomRecordReader 类中有这个代码.

                  Have this code in your CustomRecordReader class.

                  private LineRecordReader lineReader;
                  
                  private String fileName;
                  
                  public CustomRecordReader(JobConf job, FileSplit split) throws IOException {
                      lineReader = new LineRecordReader(job, split);
                      fileName = split.getPath().getName();
                  }
                  
                  public boolean next(Text key, Text value) throws IOException {
                      // get the next line
                      if (!lineReader.next(key, value)) {
                          return false;
                      }    
                  
                      key.set(fileName);
                      value.set(value);
                  
                      return true;
                  }
                  
                  public Text createKey() {
                      return new Text("");
                  }
                  
                  public Text createValue() {
                      return new Text("");
                  }
                  

                  删除 SPDRecordReader 构造函数(这是一个错误).

                  Remove SPDRecordReader constructor (It is an error).

                  并在您的 CustomFileInputFormat 类中包含此代码

                  And have this code in your CustomFileInputFormat class

                  public RecordReader<Text, Text> getRecordReader(
                    InputSplit input, JobConf job, Reporter reporter)
                    throws IOException {
                  
                      reporter.setStatus(input.toString());
                      return new CustomRecordReader(job, (FileSplit)input);
                  }
                  

                  这篇关于在运行 Hadoop MapReduce 作业时获取文件名/文件数据作为 Map 的键/值输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:Hadoop:java.lang.IncompatibleClassChangeError:找到接口 or 下一篇:乌兹 &gt;Java 动作 &gt;为什么属性 oozie.laun

                  相关文章

                  最新文章

                  <tfoot id='DSbvn'></tfoot>

                • <legend id='DSbvn'><style id='DSbvn'><dir id='DSbvn'><q id='DSbvn'></q></dir></style></legend>
                • <i id='DSbvn'><tr id='DSbvn'><dt id='DSbvn'><q id='DSbvn'><span id='DSbvn'><b id='DSbvn'><form id='DSbvn'><ins id='DSbvn'></ins><ul id='DSbvn'></ul><sub id='DSbvn'></sub></form><legend id='DSbvn'></legend><bdo id='DSbvn'><pre id='DSbvn'><center id='DSbvn'></center></pre></bdo></b><th id='DSbvn'></th></span></q></dt></tr></i><div id='DSbvn'><tfoot id='DSbvn'></tfoot><dl id='DSbvn'><fieldset id='DSbvn'></fieldset></dl></div>

                  1. <small id='DSbvn'></small><noframes id='DSbvn'>

                        <bdo id='DSbvn'></bdo><ul id='DSbvn'></ul>