selenium 和 phantomJS或chrome浏览器抓取渲染网页

    xiaoxiao2026-03-02  8

    首先pip安装selenium

    一、phantomjs

    1、下载phantomjs压缩包,解压,把bin文件夹路径加入PATH环境变量

    2、代码

    #coding=utf-8 import requests import re from pyquery import PyQuery as pq from lxml import etree from bs4 import BeautifulSoup import sys from selenium import webdriver reload(sys) sys.setdefaultencoding("utf-8") def getHtml(url): driver = webdriver.PhantomJS(executable_path='/home/lhy/phantomjs-1.9.8-linux-x86_64/bin/phantomjs') driver.get(url) fo = open("phonesinfo2.txt", "wb") fo.write(driver.page_source) fo.close() return driver.page_source 二、chrome浏览器

    1、必须安装chrome浏览器

    2、下载chrome驱动chromedriver

    3、把驱动加如PATH环境变量(注意最好修改/etc/profile配置,永久生效)

    4、代码

    #coding=utf-8 import requests import re from pyquery import PyQuery as pq from lxml import etree from bs4 import BeautifulSoup import sys from selenium import webdriver reload(sys) sys.setdefaultencoding("utf-8") def getHtml(url):   driver=webdriver.Chrome(); driver.get(url) fo = open("phonesinfo2.txt", "wb") fo.write(driver.page_source) fo.close() return driver.page_source 注意运行过程中会打开chrome浏览器

    相关资源:Selenium PhantomJS python获取html动态生成的数据
    最新回复(0)