Python爬蟲如何使用CSS選擇器?針對這個問題,這篇文章詳細(xì)介紹了相對應(yīng)的分析和解答,希望可以幫助更多想解決這個問題的小伙伴找到更簡單易行的方法。
成都創(chuàng)新互聯(lián)公司主要從事做網(wǎng)站、網(wǎng)站設(shè)計、網(wǎng)頁設(shè)計、企業(yè)做網(wǎng)站、公司建網(wǎng)站等業(yè)務(wù)。立足成都服務(wù)鷹潭,十載網(wǎng)站建設(shè)經(jīng)驗,價格優(yōu)惠、服務(wù)專業(yè),歡迎來電咨詢建站服務(wù):13518219792
CSS選擇器
這是另一種與find_all()方法有異曲同工的查找方法,寫CSS時,標(biāo)簽名不加任何修飾,類名前加.,id名前加#。
在這里我們也可以利用類似的方法來篩選元素,用到的方法是soup.select(),返回的類型是list。
(1)通過標(biāo)簽名查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """The Dormouse's story The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" # 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("title")) print(soup.select("b")) print(soup.select("a"))
運行結(jié)果
[The Dormouse's story ] [The Dormouse's story] [, Lacie, Tillie]
(2)通過類名查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """The Dormouse's story The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" # 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select(".title"))
運行結(jié)果
[The Dormouse's story
]
(3)通過id名查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """The Dormouse's story The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" # 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("#link1"))
運行結(jié)果
[The Dormouse's story
]
(4)組合查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """The Dormouse's story The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" # 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("p #link1"))
運行結(jié)果
[]
(5)屬性查找
查找時還可以加入屬性元素,屬性需要用中括號括起來,注意屬性和標(biāo)簽屬于同一節(jié)點,所以中間不能加空格,否則會無法匹配到。
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """The Dormouse's story The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" # 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("a[class='sister']"))
運行結(jié)果
[, Lacie, Tillie]
同樣,屬性仍然可以與上述查找方式組合,不在同一節(jié)點的空格隔開,同一節(jié)點的不加空格。
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """The Dormouse's story The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" # 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("p a[class='sister']"))
運行結(jié)果
[, Lacie, Tillie]
(6)獲取內(nèi)容
以上的select()方法返回的結(jié)果都是列表形式,可以遍歷形式輸出,然后用get_text()方法來獲取它的內(nèi)容。
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """The Dormouse's story The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" # 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("p a[class='sister']")) for item in soup.select("p a[class='sister']"): print(item.get_text())
運行結(jié)果
[ Tillie] Lacie Tillie
注意:為注釋內(nèi)容,未輸出
關(guān)于Python爬蟲如何使用CSS選擇器問題的解答就分享到這里了,希望以上內(nèi)容可以對大家有一定的幫助,如果你還有很多疑惑沒有解開,可以關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道了解更多相關(guān)知識。