[問題] 爬蟲問題 iftrush PTT批踢踢實業坊

[問題] 爬蟲問題

作者: iftrush (綾絹姊) 2018-07-17 10:11:56

小弟爬蟲新手
目前正在爬字典(已成功用網頁API爬出意思)
假如我想爬apple(不使用API)
從page source裡知道意思在下面程式碼的content裡
<meta name="twitter:description" content=" "/>
我要如何用findall 或是 find 找到這句
然後print出content的" "裡的意思?
自己寫的程式碼
from urllib.request import urlopen
from bs4 import BeautifulSoup
def DictRequest(word):
html = urlopen("https://www.merriam-webster.com/dictionary/"+ word)
bsobj = BeautifulSoup(html.read(), 'html')
meaning = bsobj.findAll('meta', name = 'twitter:description')
TypeError: find_all() got multiple values for argument 'name'

作者: TitanEric (泰坦) 2018-07-17 10:55:00

建議用with statement去抓urlopen

作者: iftrush (綾絹姊) 2018-07-17 10:28:00

有辦法只產生" "裡的東西嗎?我自己是可以把meaning = str(meaning)return meaning[15:-53]還是有其他方法可以用?

作者: bibo9901 (function(){})() 2018-07-17 10:17:00

.select('meta[name="twitter:description"]')[0]

作者: coeric ( ) 2018-07-17 10:57:00

findAll('meta', attrs={'name':'twitter:description'})我自己習慣用attrs # . 這之類的我比較記不住

繼續閱讀

[問題] 學完語法，如何進階？ching4562 [問題] 如何在matplotlib中使用Index呢?avlin [問題] 使用 dict.get('key') 還是 dict['key']jacobcan118 [問題] multiprocessing執行問題ponwar87123 [問題] 升級3.7後無法使用3.6時安裝的模組bjchiou [問題] socket中PF_socket vs AF_socketyabegirl25 [問題] jupyter無法連上這個網站zxc741qaz123 [問題] 遇到JS加密MAXCAI [問題] VSCODE的環境dauntless [問題] pandas dataframe 轉 spark dataframe 出現null值zeus83157