我想制作一个 Greasemonkey 脚本,当您在 URL_1 中时,该脚本会在后台解析 URL_2 的整个 HTML 网页,以便从中提取文本元素.
I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.
具体来说,我想在后台下载整个页面的HTML代码(一个烂番茄页面)并将其存储在一个变量中,然后使用getElementsByClassName[0] 以便从类名为critic_consensus"的元素中提取我想要的文本.
To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0] in order to extract the text I want from the element with class name "critic_consensus".
我在 MDN 中找到了这个:XMLHttpRequest 中的 HTML所以,我最终得到了这个不幸的非工作代码:
I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "http://www.rottentomatoes.com/m/godfather/",true);
xhr.responseType = "document";
xhr.send();
当我在 Firefox Scratchpad 中运行它时,它会显示此错误消息:
It shows this error message when I run it in Firefox Scratchpad:
跨域请求被阻止:同源策略不允许读取http://www.rottentomatoes.com/m/godfather/ 的远程资源.这可以通过将资源移动到同一域或启用 CORS.
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.rottentomatoes.com/m/godfather/. This can be fixed by moving the resource to the same domain or enabling CORS.
PS.我不使用烂番茄 API 的原因是 他们已经删除了批评者的共识.
对于跨域请求,获取的站点没有帮助设置许可CORS 策略,Greasemonkey 提供 GM_xmlhttpRequest() 函数.(大多数其他用户脚本引擎也提供此功能.)
For cross-origin requests, where the fetched site has not helpfully set a permissive CORS policy, Greasemonkey provides the GM_xmlhttpRequest() function. (Most other userscript engines also provide this function.)
GM_xmlhttpRequest 明确设计为允许跨域请求.
GM_xmlhttpRequest is expressly designed to allow cross-origin requests.
要获取您的目标信息,请在结果上创建一个 DOMParser.不要使用 jQuery 方法,因为这会导致加载无关的图像、脚本和对象、减慢速度或使页面崩溃.
To get your target information create a DOMParser on the result. Do not use jQuery methods as this will cause extraneous images, scripts and objects to load, slowing things down, or crashing the page.
这里有一个完整的脚本来说明这个过程:
Here's a complete script that illustrates the process:
// ==UserScript==
// @name _Parse Ajax Response for specific nodes
// @include http://stackoverflow.com/questions/*
// @require http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js
// @grant GM_xmlhttpRequest
// ==/UserScript==
GM_xmlhttpRequest ( {
method: "GET",
url: "http://www.rottentomatoes.com/m/godfather/",
onload: function (response) {
var parser = new DOMParser ();
/* IMPORTANT!
1) For Chrome, see
https://developer.mozilla.org/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension_for_other_browsers
for a work-around.
2) jQuery.parseHTML() and similar are bad because it causes images, etc., to be loaded.
*/
var doc = parser.parseFromString (response.responseText, "text/html");
var criticTxt = doc.getElementsByClassName ("critic_consensus")[0].textContent;
$("body").prepend ('<h1>' + criticTxt + '</h1>');
},
onerror: function (e) {
console.error ('**** error ', e);
},
onabort: function (e) {
console.error ('**** abort ', e);
},
ontimeout: function (e) {
console.error ('**** timeout ', e);
}
} );
这篇关于如何使用 XMLHttpRequest 在后台下载 HTML 页面并从中提取文本元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
即使在调用 abort (jQuery) 之后,浏览器也会等待Browser waits for ajax call to complete even after abort has been called (jQuery)(即使在调用 abort (jQuery) 之后,浏览器也会等待 ajax 调用
JavaScript innerHTML 不适用于 IE?JavaScript innerHTML is not working for IE?(JavaScript innerHTML 不适用于 IE?)
XMLHttpRequest 无法加载,请求的资源上不存在“AXMLHttpRequest cannot load, No #39;Access-Control-Allow-Origin#39; header is present on the requested resource(XMLHttpRequest 无法加载,请求的资
XHR HEAD 请求是否有可能不遵循重定向 (301 302)Is it possible for XHR HEAD requests to not follow redirects (301 302)(XHR HEAD 请求是否有可能不遵循重定向 (301 302))
NETWORK_ERROR:XMLHttpRequest 异常 101NETWORK_ERROR: XMLHttpRequest Exception 101(NETWORK_ERROR:XMLHttpRequest 异常 101)
XMLHttpRequest 206 部分内容XMLHttpRequest 206 Partial Content(XMLHttpRequest 206 部分内容)