我的程序看起来像
public class TopKRecord extends Configured implements Tool {公共静态类 MapClass 扩展 Mapper<Text,Text,Text,Text>{公共无效映射(文本键,文本值,上下文上下文)抛出 IOException,InterruptedException {//你的地图代码在这里String[] 字段 = value.toString().split(",");字符串年份 = 字段 [1];字符串声明=字段[8];if (claims.length() > 0 && (!claims.startsWith("""))) {context.write(new Text(year.toString()), new Text(claims.toString()));}}}公共 int 运行(字符串 args[])抛出异常 {工作工作 = 新工作();job.setJarByClass(TopKRecord.class);job.setMapperClass(MapClass.class);FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));job.setJobName("TopKRecord");job.setMapOutputValueClass(Text.class);job.setNumReduceTasks(0);布尔成功 = job.waitForCompletion(true);返回成功?0:1;}公共静态 void main(String args[]) 抛出异常 {int ret = ToolRunner.run(new TopKRecord(), args);System.exit(ret);}}
数据看起来像
"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD"3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,,3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,,3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,
在运行这个程序时,我在控制台上看到以下内容
12/08/02 12:43:34 信息 mapred.JobClient:任务 ID:尝试_201208021025_0007_m_000000_0,状态:失败java.lang.ClassCastException: org.apache.hadoop.io.LongWritable 不能转换为 org.apache.hadoop.io.Text在 com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26)在 org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)在 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)在 org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)在 org.apache.hadoop.mapred.Child$4.run(Child.java:255)在 java.security.AccessController.doPrivileged(本机方法)在 javax.security.auth.Subject.doAs(Subject.java:396)在 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)在 org.apache.hadoop.mapred.Child.main(Child.java:249)
我相信类类型映射正确,类映射器,p>
请让我知道我在这里做错了什么?
当你用M/R程序读取文件时,你的mapper的输入key应该是文件中行的索引,而输入值将是整行.
所以这里发生的事情是你试图将行索引作为一个错误的 Text
对象,你需要一个 LongWritable
来代替,以便 Hadoop 不会不要抱怨类型.
试试这个:
public class TopKRecord extends Configured implements Tool {公共静态类 MapClass 扩展 Mapper<LongWritable, Text, Text, Text>{public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {//你的地图代码在这里String[] 字段 = value.toString().split(",");字符串年份 = 字段 [1];字符串声明=字段[8];if (claims.length() > 0 && (!claims.startsWith("""))) {context.write(new Text(year.toString()), new Text(claims.toString()));}}}...}
您可能还需要重新考虑代码中的一件事,即您正在为正在处理的每条记录创建 2 个 Text
对象.您应该只在开始时创建这两个对象,然后在您的映射器中使用 set
方法设置它们的值.如果您要处理大量数据,这将为您节省大量时间.
My program looks like
public class TopKRecord extends Configured implements Tool {
public static class MapClass extends Mapper<Text, Text, Text, Text> {
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
// your map code goes here
String[] fields = value.toString().split(",");
String year = fields[1];
String claims = fields[8];
if (claims.length() > 0 && (!claims.startsWith("""))) {
context.write(new Text(year.toString()), new Text(claims.toString()));
}
}
}
public int run(String args[]) throws Exception {
Job job = new Job();
job.setJarByClass(TopKRecord.class);
job.setMapperClass(MapClass.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJobName("TopKRecord");
job.setMapOutputValueClass(Text.class);
job.setNumReduceTasks(0);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int ret = ToolRunner.run(new TopKRecord(), args);
System.exit(ret);
}
}
The data looks like
"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD"
3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,
3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,
3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,,
3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,
On running this program I see the following on console
12/08/02 12:43:34 INFO mapred.JobClient: Task Id : attempt_201208021025_0007_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
I believe that the Class Types are mapped correctly, Class Mapper,
Please let me know what is that I am doing wrong here?
When you read a file with a M/R program, the input key of your mapper should be the index of the line in the file, while the input value will be the full line.
So here what's happening is that you're trying to have the line index as a Text
object which is wrong, and you need an LongWritable
instead so that Hadoop doesn't complain about type.
Try this instead:
public class TopKRecord extends Configured implements Tool {
public static class MapClass extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// your map code goes here
String[] fields = value.toString().split(",");
String year = fields[1];
String claims = fields[8];
if (claims.length() > 0 && (!claims.startsWith("""))) {
context.write(new Text(year.toString()), new Text(claims.toString()));
}
}
}
...
}
Also one thing in your code that you might want to reconsider, you're creating 2 Text
objects for every record you're processing. You should only create these 2 objects right at the beginning, and then in your mapper just set their values by using the set
method. This will save you a lot of time if you're processing a decent amount of data.
这篇关于Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable 不能转换为 org.apache.hadoop.io.Text的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!