用PyCharm Profile分析非同步爬蟲效率
今天比較忙,水一下
下面的代碼來源於這個視頻裡面提到的,github 的鏈接為:https://github.com/mikeckennedy/async-await-jetbrains-webcast
第一個代碼如下,就是一個普通的 for 循環爬蟲。原文地址。
import requests
import bs4
from colorama import Fore
def main():
get_title_range()
print("Done.")
def get_html(episode_number: int) -> str:
print(Fore.YELLOW + f"Getting HTML for episode {episode_number}", flush=True)
url = fhttps://talkpython.fm/{episode_number}
resp = requests.get(url)
resp.raise_for_status()
return resp.text
def get_title(html: str, episode_number: int) -> str:
print(Fore.CYAN + f"Getting TITLE for episode {episode_number}", flush=True)
soup = bs4.BeautifulSoup(html, html.parser)
header = soup.select_one(h1)
if not header:
return "MISSING"
return header.text.strip()
def get_title_range():
# Please keep this range pretty small to not DDoS my site.
for n in range(185, 200):
html = get_html(n)
title = get_title(html, n)
print(Fore.WHITE + f"Title found: {title}", flush=True)
if __name__ == __main__:
main()
這段代碼跑完花了37s,然後我們用 pycharm 的 profiler 工具來具體看看哪些地方比較耗時間。
點擊 Profile(文件名稱)