• <bdo id='4eyk1'></bdo><ul id='4eyk1'></ul>
      1. <legend id='4eyk1'><style id='4eyk1'><dir id='4eyk1'><q id='4eyk1'></q></dir></style></legend>

        <i id='4eyk1'><tr id='4eyk1'><dt id='4eyk1'><q id='4eyk1'><span id='4eyk1'><b id='4eyk1'><form id='4eyk1'><ins id='4eyk1'></ins><ul id='4eyk1'></ul><sub id='4eyk1'></sub></form><legend id='4eyk1'></legend><bdo id='4eyk1'><pre id='4eyk1'><center id='4eyk1'></center></pre></bdo></b><th id='4eyk1'></th></span></q></dt></tr></i><div id='4eyk1'><tfoot id='4eyk1'></tfoot><dl id='4eyk1'><fieldset id='4eyk1'></fieldset></dl></div>
      2. <small id='4eyk1'></small><noframes id='4eyk1'>

        <tfoot id='4eyk1'></tfoot>

        Python Hadoop Streaming 错误“ERROR streaming.StreamJob:作业

        时间:2023-09-12
        1. <small id='vpyYu'></small><noframes id='vpyYu'>

          • <tfoot id='vpyYu'></tfoot>
            • <i id='vpyYu'><tr id='vpyYu'><dt id='vpyYu'><q id='vpyYu'><span id='vpyYu'><b id='vpyYu'><form id='vpyYu'><ins id='vpyYu'></ins><ul id='vpyYu'></ul><sub id='vpyYu'></sub></form><legend id='vpyYu'></legend><bdo id='vpyYu'><pre id='vpyYu'><center id='vpyYu'></center></pre></bdo></b><th id='vpyYu'></th></span></q></dt></tr></i><div id='vpyYu'><tfoot id='vpyYu'></tfoot><dl id='vpyYu'><fieldset id='vpyYu'></fieldset></dl></div>
                <tbody id='vpyYu'></tbody>

                  <bdo id='vpyYu'></bdo><ul id='vpyYu'></ul>
                  <legend id='vpyYu'><style id='vpyYu'><dir id='vpyYu'><q id='vpyYu'></q></dir></style></legend>

                  本文介绍了Python Hadoop Streaming 错误“ERROR streaming.StreamJob:作业不成功!"和堆栈跟踪:ExitCodeException exitCode=134的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  我正在尝试使用 Hadoop Streaming 在 Hadoop 集群上运行 python 脚本以进行情绪分析.我在本地机器上运行的相同脚本正在正确运行并提供输出.
                  要在本地机器上运行,我使用此命令.

                  I am trying to run python script on Hadoop cluster using Hadoop Streaming for sentiment analysis. The Same script I am running on Local machine which is running Properly and giving output.
                  to run on local machine I use this command.

                  $ cat /home/MB/analytics/Data/input/* | ./new_mapper.py
                  

                  为了在 hadoop 集群上运行,我使用以下命令

                  and to run on hadoop cluster I use below command

                  $ hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.2.0.jar -mapper "python $PWD/new_mapper.py" -reducer "$PWD/new_reducer.py" -input /user/hduser/Test_04012015_Data/input/* -output /user/hduser/python-mr/out-mr-out
                  

                  我的脚本示例代码是

                  The Sample code of my script is

                  #!/usr/bin/env python
                  import sys
                  
                  
                  def main(argv):
                  ##    for line in sys.stdin:
                  ##        print line
                      for line in sys.stdin:
                          line = line.split(',')
                          t_text      = re.sub(r'[?|$|.|!|,|!|?|;]',r'',line[7])
                          words    = re.findall(r"[w']+", t_text.rstrip())
                          predicted = classifier.classify(feature_select(words))
                          i=i+1
                          referenceSets[predicted].add(i)
                          testSets[predicted].add(i)
                          print line[7] +'	'+predicted
                  
                  if __name__ == "__main__":
                      main(sys.argv)
                  

                  Exception的堆栈跟踪是:

                  The stack trace of Exception is:

                      15/04/22 12:55:14 INFO mapreduce.Job: Task Id : attempt_1429611942931_0010_m_000001_0, Status : FAILED
                      Error: java.io.IOException: Stream closed at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
                      ...
                  
                      Exit code: 134
                      Exception message: /bin/bash: line 1:  1691 Aborted 
                  (core dumped) /usr/lib/jvm/java-7-oracle-cloudera/bin/java
                  -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx525955249
                  -Djava.io.tmpdir=/yarn/nm/usercache/hduser/appcache/application_1429611942931_0010/container_1429611942931_0010_01_000016/tmp
                  -Dlog4j.configuration=container-log4j.properties
                  -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1429611942931_0010/container_1429611942931_0010_01_000016 -Dyarn.app.container.log.filesize=0 
                  -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.0.122 48725 attempt_1429611942931_0010_m_000006_1 16 > /var/log/hadoop-yarn/container/application_1429611942931_0010/container_1429611942931_0010_01_000016/stdout 2> /var/log/hadoop-yarn/container/application_1429611942931_0010/container_1429611942931_0010_01_000016/stderr
                      ....
                  
                      15/04/22 12:55:47 ERROR streaming.StreamJob: Job not Successful!
                      Streaming Command Failed!
                  

                  我试图查看日志,但在色调中它向我显示了这个错误.请建议我,出了什么问题.

                  I tried to see logs but in hue it shows me this error. Please suggest me, what is going wrong.

                  推荐答案

                  您好像忘记将文件 new_mapper.py 添加到您的工作中.

                  It looks like you forgot to add the file new_mapper.py to your job.

                  基本上,您的作业会尝试运行 python 脚本 new_mapper.py,但运行映射器的服务器上缺少此脚本.

                  Basically, your job tries to run the python script new_mapper.py, but this script is missing on the server running your mapper.

                  您必须使用选项 -file <local_path_to_your_file> 将此文件添加到您的作业中.

                  You must add this file to your job, using the option -file <local_path_to_your_file>.

                  在此处查看文档和示例:https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html#Streaming_Command_Options

                  See documentation and example here: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html#Streaming_Command_Options

                  这篇关于Python Hadoop Streaming 错误“ERROR streaming.StreamJob:作业不成功!"和堆栈跟踪:ExitCodeException exitCode=134的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:appengine-mapreduce 达到内存限制 下一篇:使用 Hadoop 计算唯一身份访问者的最佳方法是什么

                  相关文章

                  最新文章

                • <tfoot id='eZGmW'></tfoot>

                      <bdo id='eZGmW'></bdo><ul id='eZGmW'></ul>

                    1. <small id='eZGmW'></small><noframes id='eZGmW'>

                    2. <legend id='eZGmW'><style id='eZGmW'><dir id='eZGmW'><q id='eZGmW'></q></dir></style></legend>
                      <i id='eZGmW'><tr id='eZGmW'><dt id='eZGmW'><q id='eZGmW'><span id='eZGmW'><b id='eZGmW'><form id='eZGmW'><ins id='eZGmW'></ins><ul id='eZGmW'></ul><sub id='eZGmW'></sub></form><legend id='eZGmW'></legend><bdo id='eZGmW'><pre id='eZGmW'><center id='eZGmW'></center></pre></bdo></b><th id='eZGmW'></th></span></q></dt></tr></i><div id='eZGmW'><tfoot id='eZGmW'></tfoot><dl id='eZGmW'><fieldset id='eZGmW'></fieldset></dl></div>