[問題] Pchome股票網站爬蟲 s8607142004 PTT批踢踢實業坊

[問題] Pchome股票網站爬蟲

作者: s8607142004 (挖哩勒) 2021-12-08 22:13:33

各位版上大大好
小弟剛進到爬蟲的世界
想嘗試爬取Pchome股市的概念股清單
網址如下
https://pchome.megatime.com.tw/group/sto3
先附上程式碼
import time
import requests
from bs4 import BeautifulSoup
header={'Referer':'http://pchome.megatime.com.tw/stock/sto3/',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}
url = "https://pchome.megatime.com.tw/group/sto3"
r = requests.post(url,header)
r.encoding = 'UTF-8'
sp = BeautifulSoup(r.text, 'html5lib')
sp
在sto3 那個Document裡面有看到需要的資料但爬出來的資料卻只有下面幾行
但爬出來只有看到下面幾行
<html><head>
</head>
<body>
<form action="https://pchome.megatime.com.tw/group/sto3" id="submit_form"
method="post" name="submit_form">
<input name="is_check" type="hidden" value="1"/>
</form>
<script type="text/javascript">
document.getElementById('submit_form').submit();
</script>
</body></html>
有爬到之前的文章說是header設定不對
https://pttdigit.com/python/M.1485354796.A.810.html
但我header 照著這篇大大說的設定方法類比去設還是沒辦法成功
有另外嘗試使用pyppeteer 但也是爬不出來
想請版上大神能指點迷津
感謝

作者: Woqeker (窩顆ker) 2021-12-10 02:42:00

第一則推文不是有說不能用requests嗎

作者: blc (Anemos) 2021-12-10 20:30:00

Referer的意思是從哪個url來的不是填你要連的網址抱歉我搞錯了把Referer最後的 / 去掉試試

作者: s8607142004 (挖哩勒) 2021-12-13 18:07:00

最後是 headers = header 就成功了

繼續閱讀

[問題] tkinter.entryconfig無法使用迴圈輸入MaJaeYun [問題] PYTHON問題newforte [問題]rebuild TensorFlow with the appropriapolytrade [問題] 請問如何在bash script啟動pyenv虛擬環境chang0206 [問題] isChanged 是python的 keywords 還是方法njpp [問題] 櫃買分點爬蟲ccccccccc [資訊] 專屬女生的Python入門課（Pyladies主辦）stepfish [問題] 讀取/修改檔案內容指定區間文字m0911182606 [問題] 新手 list to string 的問題Moonmoon0827 [問題] numpy dimensionRasin