python使用BeautifulSoup分析网页信息的方法

1378次阅读 | 发布于6年以前

本文实例讲述了python使用BeautifulSoup分析网页信息的方法。分享给大家供大家参考。具体如下：

这段python代码查找网页上的所有链接，分析所有的span标签，并查找class包含titletext的span的内容

复制代码 代码如下:

import the library used to query a website

import urllib2

specify the url you want to query

url = "http://www.python.org"

Query the website and return the html to the variable 'page'

page = urllib2.urlopen(url)

import the Beautiful soup functions to parse the data returned from the website

from BeautifulSoup import BeautifulSoup

Parse the html in the 'page' variable, and store it in Beautiful Soup format

soup = BeautifulSoup(page)

to print the soup.head is the head tag and soup.head.title is the title tag

print soup.head
print soup.head.title

to print the length of the page, use the len function

print len(page)

create a new variable to store the data you want to find.

tags = soup.findAll('a')

to print all the links

print tags

to get all titles and print the contents of each title

titles = soup.findAll('span', attrs = { 'class' : 'titletext' })
for title in allTitles:
print title.contents

Copyright© 2013-2020

All Rights Reserved 京ICP备2023019179号-8