获取标签及内容详解

    xiaoxiao2022-07-14  157

    from bs4 import BeautifulSoup html =''' <div>  <h1><title>this is a story</title></h1> <p class="title" name="dromouse">     <b>The Dormouse's story</b>     aaaaa </p>  <p class="title" name="dromouse" title='new'><b>The Dormouse's story</b>a</p>    <p class="story">     <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>      <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>        <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>     </p>   <p class="story">good</p>  <ul id="ulone">      <li>x01</li>      <li>y02</li>      <li>z03</li>       </ul> <div class='div11'>     <ul id="ultwo">         <li>a0001</li>         <li>b0002</li>         <li>c0003</li>             </ul> </div> </div> ''' soup = BeautifulSoup(html,'lxml')

    print(soup.find_all('p',attrs={'class':'title'}))

    (1)获取标签对象 print(soup.h1)

    (2)获取标签内的文本字符串: print(soup.h1.text) print(soup.h1.get_text()) tit = soup.find('h1').get_text() print(tit)

    (3)获取soup内的所有p标签,返回一个列表 print(soup.find_all('p')) 

    (4)多层查询 find_all查询返回的是列表,使用下标寻找想要的内容 print(soup.find_all('ul')[0].find_all('li'))

    (5)获取标签的属性 print(soup.a.attrs['href'])

    tag.get('attr') 可以得到tag标签中attr属性的value,

    for link in soup.find_all('a'): print(link.get('href'))

    (6)通过指定的属性,获取对象 print(soup.find('ul',id='ulone')) print(soup.find_all('ul',id='ulone'))结果是列表

    print(soup.find_all('p',attrs={'class':'title'}))

    最新回复(0)