<tfoot id='yPL8R'></tfoot>
    1. <i id='yPL8R'><tr id='yPL8R'><dt id='yPL8R'><q id='yPL8R'><span id='yPL8R'><b id='yPL8R'><form id='yPL8R'><ins id='yPL8R'></ins><ul id='yPL8R'></ul><sub id='yPL8R'></sub></form><legend id='yPL8R'></legend><bdo id='yPL8R'><pre id='yPL8R'><center id='yPL8R'></center></pre></bdo></b><th id='yPL8R'></th></span></q></dt></tr></i><div id='yPL8R'><tfoot id='yPL8R'></tfoot><dl id='yPL8R'><fieldset id='yPL8R'></fieldset></dl></div>
        <bdo id='yPL8R'></bdo><ul id='yPL8R'></ul>

    2. <small id='yPL8R'></small><noframes id='yPL8R'>

      <legend id='yPL8R'><style id='yPL8R'><dir id='yPL8R'><q id='yPL8R'></q></dir></style></legend>
    3. 在 python 中高效的文件读取需要在 ' ' 上拆

      时间:2023-05-26

        <small id='s5p14'></small><noframes id='s5p14'>

            <tbody id='s5p14'></tbody>
          <tfoot id='s5p14'></tfoot>

          • <legend id='s5p14'><style id='s5p14'><dir id='s5p14'><q id='s5p14'></q></dir></style></legend>

            <i id='s5p14'><tr id='s5p14'><dt id='s5p14'><q id='s5p14'><span id='s5p14'><b id='s5p14'><form id='s5p14'><ins id='s5p14'></ins><ul id='s5p14'></ul><sub id='s5p14'></sub></form><legend id='s5p14'></legend><bdo id='s5p14'><pre id='s5p14'><center id='s5p14'></center></pre></bdo></b><th id='s5p14'></th></span></q></dt></tr></i><div id='s5p14'><tfoot id='s5p14'></tfoot><dl id='s5p14'><fieldset id='s5p14'></fieldset></dl></div>
                <bdo id='s5p14'></bdo><ul id='s5p14'></ul>
              • 本文介绍了在 python 中高效的文件读取需要在 ' ' 上拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                问题描述

                限时送ChatGPT账号..

                我一直在阅读以下文件:

                I've traditionally been reading in files with:

                file = open(fullpath, "r")
                allrecords = file.read()
                delimited = allrecords.split('
                ')
                for record in delimited[1:]:
                    record_split = record.split(',')
                

                with open(os.path.join(txtdatapath,pathfilename), "r") as data:
                  datalines = (line.rstrip('
                ') for line in data)
                  for record in datalines:
                    split_line = record.split(',')
                    if len(split_line) > 1:
                

                但似乎当我在多处理线程中处理这些文件时,我得到了 MemoryError.当我正在阅读的文本文件需要在 ' ' 上拆分时,我如何才能最好地逐行读取文件.

                But it seems when I process these files in a multiprocessing thread I get MemoryError. How can I best readin files line by line, when the text file I'm reading needs to be split on ' '.

                这里是多处理代码:

                pool = Pool()
                fixed_args = (targetdirectorytxt, value_dict)
                varg = ((filename,) + fixed_args for filename in readinfiles)
                op_list = pool.map_async(PPD_star, list(varg), chunksize=1)     
                while not op_list.ready():
                  print("Number of files left to process: {}".format(op_list._number_left))
                  time.sleep(60)
                op_list = op_list.get()
                pool.close()
                pool.join()
                

                这是错误日志

                Exception in thread Thread-3:
                Traceback (most recent call last):
                  File "C:Python27lib	hreading.py", line 810, in __bootstrap_inner
                    self.run()
                  File "C:Python27lib	hreading.py", line 763, in run
                    self.__target(*self.__args, **self.__kwargs)
                  File "C:Python27libmultiprocessingpool.py", line 380, in _handle_results
                    task = get()
                MemoryError
                

                我正在尝试按照 Mike 的建议安装 pathos,但我遇到了问题.这是我的安装命令:

                I'm trying to install pathos as Mike has kindly suggested but I'm running into issues. Here is my install command:

                pip install https://github.com/uqfoundation/pathos/zipball/master --allow-external pathos --pre
                

                但这是我收到的错误消息:

                But here are the error messages that I get:

                Downloading/unpacking https://github.com/uqfoundation/pathos/zipball/master
                  Running setup.py (path:c:usersxxxappdatalocal	emp2pip-1e4saj-b
                uildsetup.py) egg_info for package from https://github.com/uqfoundation/pathos/
                zipball/master
                
                Downloading/unpacking ppft>=1.6.4.5 (from pathos==0.2a1.dev0)
                  Running setup.py (path:c:usersxxxappdatalocal	emp2pip_build_jp
                tyuserppftsetup.py) egg_info for package ppft
                
                    warning: no files found matching 'python-restlib.spec'
                Requirement already satisfied (use --upgrade to upgrade): dill>=0.2.2 in c:pyth
                on27libsite-packagesdill-0.2.2-py2.7.egg (from pathos==0.2a1.dev0)
                Requirement already satisfied (use --upgrade to upgrade): pox>=0.2.1 in c:pytho
                n27libsite-packagespox-0.2.1-py2.7.egg (from pathos==0.2a1.dev0)
                Downloading/unpacking pyre==0.8.2.0-pathos (from pathos==0.2a1.dev0)
                  Could not find any downloads that satisfy the requirement pyre==0.8.2.0-pathos
                 (from pathos==0.2a1.dev0)
                  Some externally hosted files were ignored (use --allow-external pyre to allow)
                .
                Cleaning up...
                No distributions at all found for pyre==0.8.2.0-pathos (from pathos==0.2a1.dev0)
                
                Storing debug log for failure in C:Usersxxxpippip.log
                

                我在 Windows 7 64 位上安装.最后,我设法使用 easy_install 进行了安装.

                I'm installing on Windows 7 64 bit. In the end I managed to install with easy_install.

                但是现在我失败了,因为我无法打开那么多文件:

                But Now I have a failure as I cannot open that many files:

                Finished reading in Exposures...
                Reading Samples from:  C:XXXXXXXXX
                Traceback (most recent call last):
                  File "events.py", line 568, in <module>
                    mdrcv_dict = ReadDamages(damage_dir, value_dict)
                  File "events.py", line 185, in ReadDamages
                    res = thpool.amap(mppool.map, [rstrip]*len(readinfiles), files)
                  File "C:Python27libsite-packagespathos-0.2a1.dev0-py2.7.eggpathosmultipr
                ocessing.py", line 230, in amap
                    return _pool.map_async(star(f), zip(*args)) # chunksize
                  File "events.py", line 184, in <genexpr>
                    files = (open(name, 'r') for name in readinfiles[0:])
                IOError: [Errno 24] Too many open files: 'C:\xx.csv'
                

                当前使用多处理库,我将参数和字典传递到我的函数中并打开映射文件,然后输出字典.这是我目前如何做的一个例子,如何用 pathos 做这个聪明的方法?

                Currently using the multiprocessing library, I am passing in parameters and dictionaries into my function and opening a mapped file and then outputting a dictionary. Here is an example of how I currently do it, how would the smart way to do this with pathos?

                def PP_star(args_flat):
                    return PP(*args_flat)
                
                def PP(pathfilename, txtdatapath, my_dict):
                    return com_dict
                
                fixed_args = (targetdirectorytxt, my_dict)
                varg = ((filename,) + fixed_args for filename in readinfiles)
                op_list = pool.map_async(PP_star, list(varg), chunksize=1)
                

                如何使用 pathos.multiprocessing

                推荐答案

                假设我们有 file1.txt:

                hello35
                1234123
                1234123
                hello32
                2492wow
                1234125
                1251234
                1234123
                1234123
                2342bye
                1234125
                1251234
                1234123
                1234123
                1234125
                1251234
                1234123
                

                file2.txt:

                1234125
                1251234
                1234123
                hello35
                2492wow
                1234125
                1251234
                1234123
                1234123
                hello32
                1234125
                1251234
                1234123
                1234123
                1234123
                1234123
                2342bye
                

                等等,通过file5.txt:

                1234123
                1234123
                1234125
                1251234
                1234123
                1234123
                1234123
                1234125
                1251234
                1234125
                1251234
                1234123
                1234123
                hello35
                hello32
                2492wow
                2342bye
                

                我建议使用分层并行 map 来快速读取您的文件.multiprocessing 的一个分支(称为 pathos.multiprocessing)可以做到这一点.

                I'd suggest to use a hierarchical parallel map to read your files quickly. A fork of multiprocessing (called pathos.multiprocessing) can do this.

                >>> import pathos
                >>> thpool = pathos.multiprocessing.ThreadingPool()
                >>> mppool = pathos.multiprocessing.ProcessingPool()
                >>> 
                >>> def rstrip(line):
                ...     return line.rstrip()
                ... 
                # get your list of files
                >>> fnames = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt', 'file5.txt']
                >>> # open the files
                >>> files = (open(name, 'r') for name in fnames)
                >>> # read each file in asynchronous parallel
                >>> # while reading and stripping each line in parallel
                >>> res = thpool.amap(mppool.map, [rstrip]*len(fnames), files)
                >>> # get the result when it's done
                >>> res.ready()
                True
                >>> data = res.get()
                >>> # if not using a files iterator -- close each file by uncommenting the next line
                >>> # files = [file.close() for file in files]
                >>> data[0]
                ['hello35', '1234123', '1234123', 'hello32', '2492wow', '1234125', '1251234', '1234123', '1234123', '2342bye', '1234125', '1251234', '1234123', '1234123', '1234125', '1251234', '1234123']
                >>> data[1]
                ['1234125', '1251234', '1234123', 'hello35', '2492wow', '1234125', '1251234', '1234123', '1234123', 'hello32', '1234125', '1251234', '1234123', '1234123', '1234123', '1234123', '2342bye']
                >>> data[-1]
                ['1234123', '1234123', '1234125', '1251234', '1234123', '1234123', '1234123', '1234125', '1251234', '1234125', '1251234', '1234123', '1234123', 'hello35', 'hello32', '2492wow', '2342bye']
                

                但是,如果您想检查还有多少文件要完成,您可能需要使用迭代"映射 (imap) 而不是异步"映射 (地图).有关详细信息,请参阅此帖子:Python 多处理 - 跟踪pool.map操作过程

                However, if you want to check how many files you have left to finish, you might want to use an "iterated" map (imap) instead of an "asynchronous" map (amap). See this post for details: Python multiprocessing - tracking the process of pool.map operation

                在此处获取 pathos:https://github.com/uqfoundation

                这篇关于在 python 中高效的文件读取需要在 ' ' 上拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                上一篇:一旦任何一个进程在python中找到匹配项,如何让 下一篇:具有多处理功能的 Celery 并行分布式任务

                相关文章

                最新文章

                <small id='Yerz9'></small><noframes id='Yerz9'>

                  <tfoot id='Yerz9'></tfoot>
                    <bdo id='Yerz9'></bdo><ul id='Yerz9'></ul>
                  <legend id='Yerz9'><style id='Yerz9'><dir id='Yerz9'><q id='Yerz9'></q></dir></style></legend>
                1. <i id='Yerz9'><tr id='Yerz9'><dt id='Yerz9'><q id='Yerz9'><span id='Yerz9'><b id='Yerz9'><form id='Yerz9'><ins id='Yerz9'></ins><ul id='Yerz9'></ul><sub id='Yerz9'></sub></form><legend id='Yerz9'></legend><bdo id='Yerz9'><pre id='Yerz9'><center id='Yerz9'></center></pre></bdo></b><th id='Yerz9'></th></span></q></dt></tr></i><div id='Yerz9'><tfoot id='Yerz9'></tfoot><dl id='Yerz9'><fieldset id='Yerz9'></fieldset></dl></div>