我试图以编程方式找到 bing 链接的最终目的地:
https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1
在浏览器上,这会重定向到https://www.danielshvac.com/
但是,如果我尝试通过假设第一个重定向到第二个来找到该网站,我只是看到没有重定向。
这是怎么回事,我怎么能找到这些bing.com/ck/a
链接的最终目的地?
代码:
Based onthis SO answerr = requests.get('https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1')
print(r.url) # https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1
Based onthis SO answer
response = requests.get(https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1)
if response.history:
print("Request was redirected")
for resp in response.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(response.status_code, response.url)
else:
print("Request was not redirected") # this is printed
更新:通过curl
阅读链接的内容,我可以看到你得到一个 HTML 文档,其中包含一些重定向你的函数,我想这就是为什么没有真正的重定向。
<!DOCTYPE html>
<html lang="en">
<head>
<meta cht="utf-8">
<meta name="referrer" content="origin-when-cross-origin">
<script>//<![CDATA[
var s = false;
function l() {
setTimeout(f, 10000);
if (document.referrer) {
try {
var pm = /(^|&|\?)px=([^&]*)(&|$)/i;
var px = window.location.href.match(pm);
var rs = document.referrer;
if (px != null) {
if (rs.match(pm))
rs = rs.replace(pm, "$1px=" + px[2] + "$3");
else if (rs.indexOf("?") != -1)
rs = rs + "&px=" + px[2];
else
rs = rs + "?px=" + px[2];
}
history.replaceState({}, "Bing", rs);
window.addEventListener("pageshow", function(e) { if (e.persisted || (typeof window.performance != "undefined" && window.performance.navigation.type === 2)) window.location.reload(); });
s = true;
setTimeout(r, 10);
return;
} catch (e) {}
}
r();
}
function r() {
var u = "https://www.danielshvac.com/";
if (s)
window.location.href = u;
else
window.location.replace(u);
}
function f() {
document.getElementById("fb").style.display = "block";
}
//]]>
</script>
</head>
<body onload="l()">
<div id="fb" style="display: none">
Please <a href="https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=F">click here</a> if the page does not redirect automatically ...
</div>
</body>
</html>
现在试图找出如何执行此操作并获取链接
由于 curl 输出显示,HTML 文档中的脚本已经包含 destination-url,您可以简单地用一行 Python 代码提取它
r.content.decode().split("var u = ")[1].split("\";")
这将在 URL 变量的初始化时拆分请求的内容(您从 curl 中获得的内容),然后在此变量的末尾再次拆分,因此您将仅获得 destination-URL。
本站系公益性非盈利分享网址,本文来自用户投稿,不代表边看边学立场,如若转载,请注明出处
评论列表(28条)