爬取一些奇怪网站的图片(单进程)

    xiaoxiao2025-06-21  19

    学习爬虫的第不知道多少天了,今天用爬一些你懂得的网站,冗余代码太多,只用来相互交流

    import requests from lxml import etree class Sprider(): """ 设置爬取,返回一个html传递给解析函数""" def __init__(self,url): self.url=url def sprider(self): headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36",} html=requests.get(url,headers=headers,verify=False).text, # print(html) return html def resolving(self,html): html=etree.HTML(html) content=html.xpath("//div//a/img/@data-original") name=html.xpath("//div//a/img/@alt") # print(content,name) return content,name def spriderPicture(self,url,name): headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36", } html = requests.get(url, headers=headers,verify=False).content with open("D:\专业学习\Python\pythonGet\爬取javbuff\\"+name+".jpg","wb") as f: f.write(html) print("ok") if __name__=="__main__": """1.获取要爬去的url 2.爬指定的url的页面将返回的源码做解析找到需要的图片的url并且返回列表 3.将图片地址的列表利用for循环传递给爬取得函数 """ url=input("请输入抓取的地址") """将url传递给爬虫模块""" sprider=Sprider(url) html=sprider.sprider() """将返回的页面源代码传递解析函数,且接受两个返回的列表""" pictureUrlList,nameList=sprider.resolving(html) for i in range(0,len(pictureUrlList)): sprider.spriderPicture(pictureUrlList[i],nameList[i]) print("over")

    但是问题出现了,

    requests.exceptions.SSLError: HTTPSConnectionPool(host='imgb.xboot.bid', port=443): Max retries exceeded with url: /digital/video/rbb00152/rbb00152ps.jpg (Caused by SSLError(SSLError("bad handshake: SysCallError(10054, 'WSAECONNRESET')")))

    所以怎么解决那?加上verify=False 证书不再验证 加上 verify=False 出现了新的 ERROR

    InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning)

    按照这个来,解决

    最新回复(0)