[問題] Beautifulsoup find_all 找不到符合標籤 stanley2k PTT批踢踢實業坊

[問題] Beautifulsoup find_all 找不到符合標籤

作者: stanley2k (使單力) 2016-05-05 18:23:24

各位大大好：
小弟目前在學寫python+beautifulsoup+lxml
目前有個練習是讀取一份清單中的資料，來此資料再讀取、辨斷某個xml資料中是否有符合的tag存在：
比如xml中只有<centos>：
<centos>
<name>centos</name>
<version>7</version>
<download-url>http://ftp.ksu.edu.tw/pub/CentOS/7/isos/x86_64/CentOS-7-x86_64-DVD-1511.iso</download-url>
</centos>
並用下面的code讀取xml檔案後並嘗試判斷是否有找到或找不到對應的tag:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open(os.xml))
os = "fedora"
for item in soup.findAll( os ):
print item.tag,item,attrib
if item == "":
print "OS %s not exist in DB"
else:
print "OS %s exist in DB"
看起來並不會執行，如果 os = "centos"，就能抓到對應的資料，但 os = "fedora"不行
求教：
1，如何判斷才是正確的方法？bs的網站說，findall在找不到tag時會return空字串，但我不太理解如何可以判斷空字串？看起來用 == ""是不行的。
2，另外執行python時會有下列錯誤，這個如何解決？
/usr/local/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "html.parser")
markup_type=markup_type))
我有先爬過文，各種解決都不行，比如 BeautifulSoup(markup, "xml")
抱歉問的可能是很基本的問題，感謝指導：D

作者: yeh6 2016-05-05 19:22:00

不是就因為沒有fedora這個標籤嗎空的時候是回傳空陣列吧, 不是空字串而且應該是soup.findAll(os) == [] 不是item

作者: octantis (@.@) 2016-05-05 23:57:00

可以用len()來判斷list是否為空出現Warning是因為你沒有註明使用何種Parser，所以他預設使用內建的html.parser並跳出警告，但html.parser不支援xml，所以你需要安裝lxml套件，才可以使用BeautifulSoup(markup, "lxml")或BeautifulSoup(markup, "xml")

繼續閱讀

[問題] 函數回傳值WingedDragon [問題] selenium爬蟲新手問題xyz6206a [問題] 矩陣數值寫成bin檔enjoyloli [問題] request 如何實現多重代理yf9000555 Re: [問題] os.sepuranusjr [閒聊] multiprocessing的thread數量shemale [問題] os.sepshemale Re: [問題] 用file open抓TXT開的問題uranusjr [問題] 重新index一個去除重複列的DataFramejimmy15923 Re: [問題] 用file open抓TXT開的問題doomleika