Python怎麼爬取下拉式的網頁?
用爬蟲只會爬取前面的一部分,下拉的爬不出來
比如www.vmgirls.com
謝邀~不BB直接給代碼,
# for shisi
# time 2020.2.26
import requests
url = "https://www.vmgirls.com/wp-admin/admin-ajax.php"
payload = "append=list-homepaged=5action=ajax_load_postsquery=page=home"
headers = {
authority: "www.vmgirls.com",
pragma: "no-cache",
cache-control: "no-cache",
accept: "text/html, */*; q=0.01",
x-requested-with: "XMLHttpRequest",
user-agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
content-type: "application/x-www-form-urlencoded; charset=UTF-8",
origin: "https://www.vmgirls.com",
sec-fetch-site: "same-origin",
sec-fetch-mode: "cors",
referer: "https://www.vmgirls.com/",
accept-encoding: "gzip, deflate, br",
accept-language: "zh-CN,zh;q=0.9,und;q=0.8",
cookie: "__cfduid=db15bfa028f71d189ffa0c5ddf47b1d481582594282; Hm_lvt_a5eba7a40c339f057e1c5b5ac4ab4cc9=1582594284; Hm_lpvt_a5eba7a40c339f057e1c5b5ac4ab4cc9=1582594284; _ga=GA1.2.1131286326.1582594284; _gid=GA1.2.83351297.1582594284; _gat_gtag_UA_127463675_2=1",
postman-token: "14e3a70d-c1f6-08a5-f2ea-8b00e1a48d63"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
改動paged=5 這個參數,就可以翻頁了,然後,從html取出圖片,難度不大吧,,
有贊在更新。。有5個贊,來更新咯
上面代碼是自動生成的,方法是:
四十不是十四:60秒GET小技能-爬蟲快速生成標準post代碼?zhuanlan.zhihu.com
如何分析的呢?你要堅信,頁面只要有東西出現在你面前,就一定是在某個時候下載到了你本地。因此,只有在你面前的,都是可以爬下來的。