在 Hadoop 中使用 NullWritable 的优势

时间：2023-09-27

本文介绍了在 Hadoop 中使用 NullWritable 的优势的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对 null 键/值使用 NullWritable 比使用 null 文本(即 new Text(null)).我从《Hadoop:权威指南》一书中看到以下内容.


What are the advantages of using NullWritable for null keys/values over using null texts (i.e. new Text(null)). I see the following from the «Hadoop: The Definitive Guide» book.
NullWritable 是 Writable 的一种特殊类型，因为它具有零长度序列化.无字节被写入流或从流中读取.它用作占位符；例如，在MapReduce，一个键或者一个值在不需要的时候可以声明为NullWritable使用那个位置——它有效地存储了一个常量空值.NullWritable 也可以当您想要存储值列表时，可用作 SequenceFile 中的键，而不是到键值对.它是一个不可变的单例:可以通过调用来检索实例NullWritable.get()

  NullWritable is a special type of Writable, as it has a zero-length serialization. No bytes
  are written to, or read from, the stream. It is used as a placeholder; for example, in
  MapReduce, a key or a value can be declared as a NullWritable when you don’t need
  to use that position—it effectively stores a constant empty value. NullWritable can also
  be useful as a key in SequenceFile when you want to store a list of values, as opposed
  to key-value pairs. It is an immutable singleton: the instance can be retrieved by calling
  NullWritable.get()
我不清楚如何使用 NullWritable 写出输出?会不会在开始的输出文件中有一个常量值表示这个文件的key或者value是null，这样MapReduce框架就可以忽略读取nullkeys/值(以 null 为准)?另外，null 文本实际上是如何序列化的?
I do not clearly understand how the output is written out using NullWritable? Will there be a single constant value in the beginning output file indicating that the keys or values of this file are null, so that the MapReduce framework can ignore reading the null keys/values (whichever is null)? Also, how actually are null texts serialized?
谢谢，
文卡特
推荐答案
键/值类型必须在运行时给出，所以任何写或读 NullWritables 的东西都会提前知道它将是处理该类型；文件中没有标记或任何内容.从技术上讲，NullWritables 是读取"的，只是读取"一个 NullWritable 实际上是无操作的.你可以亲眼看到根本没有写或读:
The key/value types must be given at runtime, so anything writing or reading NullWritables will know ahead of time that it will be dealing with that type; there is no marker or anything in the file. And technically the NullWritables are "read", it's just that "reading" a NullWritable is actually a no-op. You can see for yourself that there's nothing at all written or read:
NullWritable nw = NullWritable.get();
ByteArrayOutputStream out = new ByteArrayOutputStream();
nw.write(new DataOutputStream(out));
System.out.println(Arrays.toString(out.toByteArray())); // prints "[]"

ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]);
nw.readFields(new DataInputStream(in)); // works just fine

关于new Text(null)的问题，你可以再试一试:
And as for your question about new Text(null), again, you can try it out:
Text text = new Text((String)null);
ByteArrayOutputStream out = new ByteArrayOutputStream();
text.write(new DataOutputStream(out)); // throws NullPointerException
System.out.println(Arrays.toString(out.toByteArray()));

Text 根本无法使用 null String.

                        这篇关于在 Hadoop 中使用 NullWritable 的优势的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持html5模板网！



上一篇：在 Eclipse 中为 2.4.1 hadoop 映射 Reduce 客户端 jar 
下一篇：Hadoop 分布差异 

 
相关文章
如何检测 32 位 int 上的整数溢出?How can I detect integer overflow on 32 bits int?(如何检测 32 位 int 上的整数溢出?)
return 语句之前的局部变量，这有关系吗?Local variables before return statements, does it matter?(return 语句之前的局部变量，这有关系吗?)
如何将整数转换为整数?How to convert Integer to int?(如何将整数转换为整数?)
如何在给定范围内创建一个随机打乱数字的 intHow do I create an int array with randomly shuffled numbers in a given range(如何在给定范围内创建一个随机打乱数字的 int 数组)
java的行为不一致==Inconsistent behavior on java#39;s ==(java的行为不一致==)
为什么 Java 能够将 0xff000000 存储为 int?Why is Java able to store 0xff000000 as an int?(为什么 Java 能够将 0xff000000 存储为 int?)



最新文章
如何使用 SimpleDateFormat.parse() 将 Calendar.toString()How can I Convert Calendar.toString() into date using SimpleDateFormat.parse()?(如何使用 SimpleDateFormat.parse() 将 Calendar.toString() 转换为日期?)

在 hbase mapreduce 中传递 Delete 或 Put 错误
等效于 mongo 的 out:reduce 选项在 hadoop
如何在 hadoop 中序列化对象(在 HDFS 中)
在 hadoop 上解析 Stackoverflow 的 posts.xml
Java MapReduce 按日期计数
在 RIAK 上获取 MapReduce 结果(使用 Java 客户端)
Hadoop 框架中使用的属性的完整列表
从远程系统提交 mapreduce 作业时出现异常
Hadoop:reducer 的数量不等于我在程序中设置的数量
如何通过 API 访问 Hadoop 计数器值?