[問題] 爬蟲爬不到資料(BLOOMBERG) snakei14702 PTT批踢踢實業坊

[問題] 爬蟲爬不到資料(BLOOMBERG)

作者: snakei14702 (sun) 2019-05-03 17:30:17

我寫了一小段程式碼如下, 分別想抓兩個財金網站的資料:
from bs4 import BeautifulSoup
from urllib.request import urlopen
html11=urlopen('https://www.bloomberg.com/quote/INDU:IND')
soup=BeautifulSoup(html11,'html.parser')
print(soup.find_all('div'))
html22=urlopen('https://www.cnbc.com/quotes/?symbol=AAPL&qsearchterm=aapl')
soup=BeautifulSoup(html22,'html.parser')
print(soup.find_all('div'))
htm111 是bloomberg的報價網站，很怪的事我在chrome裡面看原始碼明明就有很多'div'
的標籤, 但是實作跑完只有如下:
[<div id="px-captcha"></div>, <div id="block_uuid">Block reference ID: </div>]
html22是cnbc的報價網站, 就沒有這個問題, 稍微過濾一下就找到我要的資料了.
想要向各位前輩請教解決這個小問題....
非常感謝!

作者: tlaceruse 2019-05-03 18:03:00

Bloomberg 很早開始就擋爬蟲了。Header request 要多試幾個參數

繼續閱讀

[問題] leetcode 中 listnode定義hsiaoeddie [問題] 如何判斷小數和整數位數j30916 Re: [問題] for + if 優化問題azuel [問題] 可以用迴圈呼叫function嗎？phoenixcx [問題] replace後資料直接寫入lattes [教學] 用基因遺傳演算法解旅行推銷員問題b05703 [問題] plot結果與資料有異disney82231 [問題] Python語法問題請益james999 [問題] Pandas新手疑問a172545056 Re: [問題] 如何讀取特定格式檔案windless99