from bs4 import BeautifulSoup html =''' <div> <h1><title>this is a story</title></h1> <p class="title" name="dromouse"> <b>The Dormouse's story</b> aaaaa </p> <p class="title" name="dromouse" title='new'><b>The Dormouse's story</b>a</p> <p class="story"> <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a> <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a> </p> <p class="story">good</p> <ul id="ulone"> <li>x01</li> <li>y02</li> <li>z03</li> </ul> <div class='div11'> <ul id="ultwo"> <li>a0001</li> <li>b0002</li> <li>c0003</li> </ul> </div> </div> ''' soup = BeautifulSoup(html,'lxml')
print(soup.find_all('p',attrs={'class':'title'}))
(1)获取标签对象 print(soup.h1)
(2)获取标签内的文本字符串: print(soup.h1.text) print(soup.h1.get_text()) tit = soup.find('h1').get_text() print(tit)
(3)获取soup内的所有p标签,返回一个列表 print(soup.find_all('p'))
(4)多层查询 find_all查询返回的是列表,使用下标寻找想要的内容 print(soup.find_all('ul')[0].find_all('li'))
(5)获取标签的属性 print(soup.a.attrs['href'])
tag.get('attr') 可以得到tag标签中attr属性的value,
for link in soup.find_all('a'): print(link.get('href'))
(6)通过指定的属性,获取对象 print(soup.find('ul',id='ulone')) print(soup.find_all('ul',id='ulone'))结果是列表
print(soup.find_all('p',attrs={'class':'title'}))
