使用 JavaScript 和任何其他可用技术执行 从 Google Chrome 扩展程序中对当前未打开的标签页进行网页抓取 的最佳选项是什么?也接受其他 JavaScript 库.
What are the best options for performing Web Scraping of a not currently open tab from within a Google Chrome Extension with JavaScript and whatever more technologies are available. Other JavaScript-libraries are also accepted.
重要的是掩盖抓取行为,使其表现得像正常的网络请求.没有 AJAX 或 XMLHttpRequest 的迹象,例如 X-Requested-With: XMLHttpRequest 或 Origin.
The important thing is to mask the scraping to behave like a normal web-request. No indications of AJAX or XMLHttpRequest, like X-Requested-With: XMLHttpRequest or Origin.
必须可以从 JavaScript 访问抓取的内容,以便在扩展程序中进行进一步操作和呈现,最有可能作为字符串.
The scraped content must be accessible from JavaScript for further manipulation and presentation within the extension, most probably as a string.
在任何 WebKit/Chrome 特定的 API 中是否有任何钩子可用于发出正常的网络请求并获取操作结果?
Are there any hooks in any WebKit/Chrome-specific API:s that can be used to make a normal web-request and get the results for manipulation?
var pageContent = getPageContent(url); // TODO: Implement
var items = $(pageContent).find('.item');
// Display items with further selections
使用磁盘上的本地文件进行这项工作的奖励积分,用于初始调试.但如果这是唯一的一点就是停止解决方案,那么请忽略奖励积分.
Bonus-points to make this work from a local file on disk, for initial debugging. But if that is the only point is stopping a solution, then disregard the bonus-points.
尝试使用 XHR2 responseType = "document" 并使用 (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type"))a href="https://gist.github.com/1129031" rel="noreferrer">我的 text/html 补丁.有关我如何检测 responseType 的示例,请参阅 https://gist.github.com/1138724= "document 支持(在从 text/html blob 创建的对象 URL 上同步检查 response === null).
Attempt to use XHR2 responseType = "document" and fall back on (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type")) with my text/html patch. See https://gist.github.com/1138724 for an example of how I detect responseType = "document support (synchronously checking response === null on an object URL created from a text/html blob).
使用 Chrome WebRequest API 隐藏 X-Requested-With 等标题.
Use the Chrome WebRequest API to hide X-Requested-With, etc. headers.
这篇关于Google Chrome 扩展中的网页抓取(JavaScript + Chrome API)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
即使在调用 abort (jQuery) 之后,浏览器也会等待Browser waits for ajax call to complete even after abort has been called (jQuery)(即使在调用 abort (jQuery) 之后,浏览器也会等待 ajax 调用
JavaScript innerHTML 不适用于 IE?JavaScript innerHTML is not working for IE?(JavaScript innerHTML 不适用于 IE?)
XMLHttpRequest 无法加载,请求的资源上不存在“AXMLHttpRequest cannot load, No #39;Access-Control-Allow-Origin#39; header is present on the requested resource(XMLHttpRequest 无法加载,请求的资
XHR HEAD 请求是否有可能不遵循重定向 (301 302)Is it possible for XHR HEAD requests to not follow redirects (301 302)(XHR HEAD 请求是否有可能不遵循重定向 (301 302))
XMLHttpRequest 206 部分内容XMLHttpRequest 206 Partial Content(XMLHttpRequest 206 部分内容)
XMLHttpRequest 的 getResponseHeader() 的限制?Restrictions of XMLHttpRequest#39;s getResponseHeader()?(XMLHttpRequest 的 getResponseHeader() 的限制?)