• <legend id='er9O6'><style id='er9O6'><dir id='er9O6'><q id='er9O6'></q></dir></style></legend>

  • <i id='er9O6'><tr id='er9O6'><dt id='er9O6'><q id='er9O6'><span id='er9O6'><b id='er9O6'><form id='er9O6'><ins id='er9O6'></ins><ul id='er9O6'></ul><sub id='er9O6'></sub></form><legend id='er9O6'></legend><bdo id='er9O6'><pre id='er9O6'><center id='er9O6'></center></pre></bdo></b><th id='er9O6'></th></span></q></dt></tr></i><div id='er9O6'><tfoot id='er9O6'></tfoot><dl id='er9O6'><fieldset id='er9O6'></fieldset></dl></div>
    <tfoot id='er9O6'></tfoot>

      • <bdo id='er9O6'></bdo><ul id='er9O6'></ul>

        <small id='er9O6'></small><noframes id='er9O6'>

        根据来自scrapy的信号更新主线程内的PyQt5 Gui

        时间:2023-08-04
      1. <legend id='VoWFE'><style id='VoWFE'><dir id='VoWFE'><q id='VoWFE'></q></dir></style></legend>
          • <tfoot id='VoWFE'></tfoot>
          • <small id='VoWFE'></small><noframes id='VoWFE'>

              <bdo id='VoWFE'></bdo><ul id='VoWFE'></ul>

              <i id='VoWFE'><tr id='VoWFE'><dt id='VoWFE'><q id='VoWFE'><span id='VoWFE'><b id='VoWFE'><form id='VoWFE'><ins id='VoWFE'></ins><ul id='VoWFE'></ul><sub id='VoWFE'></sub></form><legend id='VoWFE'></legend><bdo id='VoWFE'><pre id='VoWFE'><center id='VoWFE'></center></pre></bdo></b><th id='VoWFE'></th></span></q></dt></tr></i><div id='VoWFE'><tfoot id='VoWFE'></tfoot><dl id='VoWFE'><fieldset id='VoWFE'></fieldset></dl></div>
                <tbody id='VoWFE'></tbody>

                  本文介绍了根据来自scrapy的信号更新主线程内的PyQt5 Gui的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  我有一个非常基本的蜘蛛,它看起来像来自 scrapy testpiders 的 followall 蜘蛛.

                  I have a very basic spider that looks like the followall spider from scrapy testspiders.

                  import re
                  
                  import scrapy.signals
                  from scrapy.http import Request, HtmlResponse
                  from scrapy.linkextractors import LinkExtractor
                  from six.moves.urllib.parse import urlparse
                  
                  from page import Page
                  
                  
                  class ZenSpider( scrapy.Spider ) :
                      def __init__(self) :
                          super().__init__()
                  
                      name = 'followall'
                      custom_settings = {
                          'CLOSESPIDER_PAGECOUNT' : 2,
                          "FEEDS" : {
                              "items.csv" : {"format" : "csv"},
                          },
                      }
                  
                      def __init__(self, **kw) :
                          super( ZenSpider, self ).__init__( **kw )
                          url = kw.get( 'url' ) or kw.get( 'domain' ) or 'http://scrapinghub.com/'
                          if not url.startswith( 'http://' ) and not url.startswith( 'https://' ) :
                              url = 'http://%s/' % url
                          self.url = url
                          self.allowed_domains = [re.sub(r'^www.', '', urlparse(url).hostname)]
                          self.link_extractor = LinkExtractor()
                  
                      def start_requests(self):
                          return [Request(self.url, callback=self.parse, dont_filter=True)]
                  
                      def parse(self, response):
                          """Parse a PageItem and all requests to follow
                  
                          @url http://www.scrapinghub.com/
                          @returns items 1 1
                          @returns requests 1
                          @scrapes url title foo
                          """
                          page = self._get_item(response)
                          r = [page]
                          r.extend(self._extract_requests(response))
                          return r
                  
                      def _get_item(self, response):
                          items = []
                          item = Page(
                              url=response.url,
                              size=str( len( response.body ) ),
                              status=response.status,
                              # content_type=response.request.headers.get('Content-Type'),
                              # encoding=response.request.headers.get('encoding'),
                              # referer=response.request.headers.get('Referer'),
                          )
                          self._set_title( item, response )
                          self._set_description( item, response )
                          return item
                  
                      def _extract_requests(self, response):
                          r = []
                          if isinstance(response, HtmlResponse):
                              links = self.link_extractor.extract_links( response )
                              r.extend( Request( x.url, callback=self.parse ) for x in links )
                          return r
                  
                      def _set_title(self, page, response) :
                          if isinstance( response, HtmlResponse ) :
                              title = response.xpath( "//title/text()" ).extract()
                              if title :
                                  page['title'] = title[0]
                  
                      def _set_description(self, page, response) :
                          if isinstance( response, HtmlResponse ) :
                              description = response.xpath( "//meta[@name='description']/@content" ).extract()
                              if description :
                                  page['description'] = description[0]
                  
                  

                  我从下面的脚本中调用这个蜘蛛.蜘蛛使用 CrawlRunner 类运行,当它获取一个项目时会发出一个信号作为 p.signals.connect ,然后调用方法 crawler_results 并打印被抓取的项目.

                  I am calling this spider from a script as below. The spider is run using the CrawlRunner class and when it fetches an item emits a signal as p.signals.connect which then calls the method crawler_results and prints item scraped.

                  据我了解,我无法将爬行移动到它自己的类中,因为那样信号将无法与 PyQt5 一起使用

                  As far as my understanding is I cannot move the crawling into it's own class because then the signal wont work with PyQt5

                  import scrapy
                  from PyQt5 import QtWidgets, QtCore, QtGui
                  from PyQt5.QtCore import QRunnable, pyqtSlot, QThread, pyqtSignal, QTimer
                  from PyQt5.QtWidgets import QTableWidgetItem, QLabel
                  from scrapy import signals
                  from scrapy.crawler import CrawlerProcess, CrawlerRunner
                  from twisted.internet import reactor
                  from scrapy.utils.log import configure_logging
                  
                  from Layout import Ui_MainWindow
                  from ZenSpider import ZenSpider
                  
                  
                  class MainWindow( QtWidgets.QMainWindow, Ui_MainWindow ) :
                  
                      def __init__(self, parent=None) :
                          super(MainWindow, self).__init__()
                  
                          self.setupUi( self )
                          self.pushButton.pressed.connect( self.on_url_entered )
                  
                      def crawler_results(self, item) :
                          print( "SCRAPED AN ITEM" )
                          ##Do Something here ##
                  
                      def on_url_entered(self) :
                          # global userInput
                          # userInput = self.urlbar.text()
                          configure_logging()
                          runner = CrawlerRunner()
                          runner.crawl(ZenSpider, domain="google.com.au")
                          for p in runner.crawlers :
                              p.signals.connect(self.crawler_results, signal=signals.item_scraped)
                          reactor.run()
                  
                  if __name__ == "__main__" :
                      app = QtWidgets.QApplication( [] )
                      main_window = MainWindow()
                      main_window.show()
                      app.exec_()
                  
                  

                  我有一个带有简单 QTableWidget 和按钮的布局

                  I have a layout with a simple QTableWidget and a pushbutton

                  # -*- coding: utf-8 -*-
                  
                  # Form implementation generated from reading ui file 'basic.ui'
                  #
                  # Created by: PyQt5 UI code generator 5.14.2
                  #
                  # WARNING! All changes made in this file will be lost!
                  
                  
                  from PyQt5 import QtCore, QtGui, QtWidgets
                  
                  
                  class Ui_MainWindow(object):
                      def setupUi(self, MainWindow):
                          MainWindow.setObjectName("MainWindow")
                          MainWindow.resize(1034, 803)
                          self.centralwidget = QtWidgets.QWidget(MainWindow)
                          self.centralwidget.setObjectName("centralwidget")
                          self.tableWidget = QtWidgets.QTableWidget(self.centralwidget)
                          self.tableWidget.setGeometry(QtCore.QRect(140, 200, 831, 401))
                          self.tableWidget.setObjectName("tableWidget")
                          self.tableWidget.setColumnCount(1)
                          self.tableWidget.setRowCount(0)
                          item = QtWidgets.QTableWidgetItem()
                          self.tableWidget.setHorizontalHeaderItem(0, item)
                          self.pushButton = QtWidgets.QPushButton(self.centralwidget)
                          self.pushButton.setGeometry(QtCore.QRect(880, 610, 89, 25))
                          self.pushButton.setObjectName("pushButton")
                          MainWindow.setCentralWidget(self.centralwidget)
                          self.statusbar = QtWidgets.QStatusBar(MainWindow)
                          self.statusbar.setObjectName("statusbar")
                          MainWindow.setStatusBar(self.statusbar)
                  
                          self.retranslateUi(MainWindow)
                          QtCore.QMetaObject.connectSlotsByName(MainWindow)
                  
                      def retranslateUi(self, MainWindow):
                          _translate = QtCore.QCoreApplication.translate
                          MainWindow.setWindowTitle(_translate("MainWindow", "MainWindow"))
                          item = self.tableWidget.horizontalHeaderItem(0)
                          item.setText(_translate("MainWindow", "URL"))
                          self.pushButton.setText(_translate("MainWindow", "Start"))
                  
                  
                  if __name__ == "__main__":
                      import sys
                      app = QtWidgets.QApplication(sys.argv)
                      MainWindow = QtWidgets.QMainWindow()
                      ui = Ui_MainWindow()
                      ui.setupUi(MainWindow)
                      MainWindow.show()
                      sys.exit(app.exec_())
                  

                  当我按下按钮时,我可以看到爬虫正在运行并进入 crawler_results 方法,因为它打印了抓取的项目.蜘蛛将每个项目返回为以下值

                  When I hit the pushbutton I can see the crawler running and entering the crawler_results method as it prints the item scraped. The spider returns each item as the following value

                  {'size': '164125',
                   'status': 200,
                   'title': 'Google Advanced Search',
                   'url': 'https://www.google.com.au/advanced_search?hl=en-AU&authuser=0'}
                  

                  页面只是我的scrapy项目

                  Page is simply my scrapy items

                  import scrapy
                  
                  class Page(scrapy.Item):
                      url = scrapy.Field()
                      size = scrapy.Field()
                      status = scrapy.Field()
                      title = scrapy.Field()
                  
                  

                  我的问题是如何将这些数据转换到 GUI 中并让它在蜘蛛运行时自动刷新.这意味着每次抓取一个项目时,GUI 都会更新,然后蜘蛛会继续.

                  My question is how do I translate this data into the GUI and have it auto refresh as long as the spider runs. This means that every time an item is scraped the GUI updates and then the spider continues.

                  到目前为止我已经探索过了

                  I have so far explored

                  1. 使用scrapy deferred 运气不佳
                  2. 插槽/信号,但无法更新 GUI.
                  3. 每秒更新一次 GUI 的 Qtimer 函数,但同样不会产生任何结果.

                  非常感谢任何帮助

                  推荐答案

                  你必须安装一个兼容 Qt 事件循环的反应器,例如使用:

                  You have to install a reactor compatible with the Qt event loop, for example using:

                  • qt5reactor (python -m pip install qt5reactor),
                  • qt-reactor (python -m pip install qt-reactor)
                  import sys
                  
                  from PyQt5 import QtWidgets, QtCore, QtGui
                  
                  import qt5reactor
                  # import qreactor
                  
                  from scrapy import signals
                  from scrapy.crawler import CrawlerRunner
                  from scrapy.utils.log import configure_logging
                  
                  import twisted
                  
                  from Layout import Ui_MainWindow
                  from ZenSpider import ZenSpider
                  
                  
                  class MainWindow(QtWidgets.QMainWindow, Ui_MainWindow):
                      def __init__(self, parent=None):
                          super(MainWindow, self).__init__()
                  
                          self.setupUi(self)
                          self.pushButton.pressed.connect(self.on_url_entered)
                          self.tableWidget.horizontalHeader().setSectionResizeMode(
                              QtWidgets.QHeaderView.ResizeToContents
                          )
                  
                      def crawler_results(self, item):
                          row = self.tableWidget.rowCount()
                  
                          url = item["url"]
                  
                          it = QtWidgets.QTableWidgetItem(url)
                          self.tableWidget.insertRow(row)
                          self.tableWidget.setItem(row, 0, it)
                  
                      def on_url_entered(self):
                          configure_logging()
                          runner = CrawlerRunner()
                          runner.crawl(ZenSpider, domain="google.com.au")
                          for p in runner.crawlers:
                              p.signals.connect(self.crawler_results, signal=signals.item_scraped)
                  
                      def closeEvent(self, event):
                          super(MainWindow, self).closeEvent(event)
                          twisted.internet.reactor.stop()
                  
                  
                  if __name__ == "__main__":
                      app = QtWidgets.QApplication([])
                  
                      qt5reactor.install()
                      # qreactor.install()
                  
                      main_window = MainWindow()
                      main_window.show()
                      twisted.internet.reactor.run()
                  

                  这篇关于根据来自scrapy的信号更新主线程内的PyQt5 Gui的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:PyQt5/pyqt4 是否已经支持带有手写识别的 QtVirtual 下一篇:QWebEngineView 中的 Qt 事件传播

                  相关文章

                  最新文章

                  <i id='EtTEu'><tr id='EtTEu'><dt id='EtTEu'><q id='EtTEu'><span id='EtTEu'><b id='EtTEu'><form id='EtTEu'><ins id='EtTEu'></ins><ul id='EtTEu'></ul><sub id='EtTEu'></sub></form><legend id='EtTEu'></legend><bdo id='EtTEu'><pre id='EtTEu'><center id='EtTEu'></center></pre></bdo></b><th id='EtTEu'></th></span></q></dt></tr></i><div id='EtTEu'><tfoot id='EtTEu'></tfoot><dl id='EtTEu'><fieldset id='EtTEu'></fieldset></dl></div>
                  <tfoot id='EtTEu'></tfoot>

                  1. <legend id='EtTEu'><style id='EtTEu'><dir id='EtTEu'><q id='EtTEu'></q></dir></style></legend>
                        <bdo id='EtTEu'></bdo><ul id='EtTEu'></ul>

                      <small id='EtTEu'></small><noframes id='EtTEu'>