<tfoot id='G1HUK'></tfoot>
      <legend id='G1HUK'><style id='G1HUK'><dir id='G1HUK'><q id='G1HUK'></q></dir></style></legend>
    1. <small id='G1HUK'></small><noframes id='G1HUK'>

    2. <i id='G1HUK'><tr id='G1HUK'><dt id='G1HUK'><q id='G1HUK'><span id='G1HUK'><b id='G1HUK'><form id='G1HUK'><ins id='G1HUK'></ins><ul id='G1HUK'></ul><sub id='G1HUK'></sub></form><legend id='G1HUK'></legend><bdo id='G1HUK'><pre id='G1HUK'><center id='G1HUK'></center></pre></bdo></b><th id='G1HUK'></th></span></q></dt></tr></i><div id='G1HUK'><tfoot id='G1HUK'></tfoot><dl id='G1HUK'><fieldset id='G1HUK'></fieldset></dl></div>
      • <bdo id='G1HUK'></bdo><ul id='G1HUK'></ul>

      从 IPython 笔记本运行 MRJob

      时间:2023-09-12
      <i id='Qk3ri'><tr id='Qk3ri'><dt id='Qk3ri'><q id='Qk3ri'><span id='Qk3ri'><b id='Qk3ri'><form id='Qk3ri'><ins id='Qk3ri'></ins><ul id='Qk3ri'></ul><sub id='Qk3ri'></sub></form><legend id='Qk3ri'></legend><bdo id='Qk3ri'><pre id='Qk3ri'><center id='Qk3ri'></center></pre></bdo></b><th id='Qk3ri'></th></span></q></dt></tr></i><div id='Qk3ri'><tfoot id='Qk3ri'></tfoot><dl id='Qk3ri'><fieldset id='Qk3ri'></fieldset></dl></div>
          <tbody id='Qk3ri'></tbody>

          • <bdo id='Qk3ri'></bdo><ul id='Qk3ri'></ul>
          • <small id='Qk3ri'></small><noframes id='Qk3ri'>

              • <tfoot id='Qk3ri'></tfoot><legend id='Qk3ri'><style id='Qk3ri'><dir id='Qk3ri'><q id='Qk3ri'></q></dir></style></legend>

                本文介绍了从 IPython 笔记本运行 MRJob的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                问题描述

                我正在尝试从 IPython 笔记本运行 mrjob 示例

                I'm trying to run mrjob example from IPython notebook

                from mrjob.job import MRJob
                
                
                class MRWordFrequencyCount(MRJob):
                
                def mapper(self, _, line):
                    yield "chars", len(line)
                    yield "words", len(line.split())
                    yield "lines", 1
                
                def reducer(self, key, values):
                    yield key, sum(values)  
                

                然后用代码运行它

                mr_job = MRWordFrequencyCount(args=["testfile.txt"])
                with mr_job.make_runner() as runner:
                    runner.run()
                    for line in runner.stream_output():
                        key, value = mr_job.parse_output_line(line)
                        print key, value
                

                并得到错误:

                TypeError: <module '__main__' (built-in)> is a built-in class
                

                有没有办法从 IPython notebook 运行 mrjob?

                Is there way to run mrjob from IPython notebook?

                推荐答案

                我还没有找到完美的方法",但你可以做的一件事是创建一个笔记本单元格,使用 %%file 魔术,将单元格内容写入文件:

                I haven't found the "perfect way" yet, but one thing you can do is create one notebook cell, using the %%file magic, writing the cell contents to a file:

                %%file wordcount.py
                from mrjob.job import MRJob
                
                class MRWordFrequencyCount(MRJob):
                
                    def mapper(self, _, line):
                        yield "chars", len(line)
                        yield "words", len(line.split())
                        yield "lines", 1
                
                    def reducer(self, key, values):
                        yield key, sum(values)
                

                然后让 mrjob 在稍后的单元格中运行该文件:

                And then have mrjob run that file in a later cell:

                import wordcount
                reload(wordcount)
                
                mr_job = wordcount.MRWordFrequencyCount(args=['example.txt'])
                with mr_job.make_runner() as runner:
                    runner.run()
                    for line in runner.stream_output():
                        key, value = mr_job.parse_output_line(line)
                        print key, value
                

                请注意,我调用了我的文件 wordcount.py 并且我从 wordcount 模块导入了类 MRWordFrequencyCount -- 文件名和模块必须匹配.Python 还会缓存导入的模块,当您更改 wordcount.py 文件时,iPython 不会重新加载模块,而是使用旧的缓存模块.这就是我将 reload() 调用放在那里的原因.

                Notice that I called my file wordcount.py and that I import the class MRWordFrequencyCount from the wordcount module -- the filename and module has to match. Also Python caches imported modules and when you change the wordcount.py-file iPython will not reload the module but rather used the old, cached one. That's why I put the reload() call in there.

                参考:https://groups.google.com/d/味精/mrjob/CfdAgcEaC-I/8XfJPXCjTvQJ

                更新(更短)
                对于较短的第二个笔记本单元,您可以通过从笔记本中调用 shell 来运行 mrjob

                Update (shorter)
                For a shorter second notebook cell you can run the mrjob by invoking the shell from within the notebook

                ! python mrjob.py shakespeare.txt
                

                参考:http://jupyter.cs.brynmawr.edu/hub/dblank/公共/Jupyter%20Magics.ipynb

                Reference: http://jupyter.cs.brynmawr.edu/hub/dblank/public/Jupyter%20Magics.ipynb

                这篇关于从 IPython 笔记本运行 MRJob的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                上一篇:在 PySpark 中进行排序减少的最有效方法是什么? 下一篇:Hadoop 流作业在 Python 中失败(不成功)

                相关文章

                最新文章

                  <legend id='uWsHt'><style id='uWsHt'><dir id='uWsHt'><q id='uWsHt'></q></dir></style></legend><tfoot id='uWsHt'></tfoot>
                    <bdo id='uWsHt'></bdo><ul id='uWsHt'></ul>

                  <i id='uWsHt'><tr id='uWsHt'><dt id='uWsHt'><q id='uWsHt'><span id='uWsHt'><b id='uWsHt'><form id='uWsHt'><ins id='uWsHt'></ins><ul id='uWsHt'></ul><sub id='uWsHt'></sub></form><legend id='uWsHt'></legend><bdo id='uWsHt'><pre id='uWsHt'><center id='uWsHt'></center></pre></bdo></b><th id='uWsHt'></th></span></q></dt></tr></i><div id='uWsHt'><tfoot id='uWsHt'></tfoot><dl id='uWsHt'><fieldset id='uWsHt'></fieldset></dl></div>

                  <small id='uWsHt'></small><noframes id='uWsHt'>