UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 30073: invalid continuation byt

    xiaoxiao2022-07-05  173

    出错代码:

    @retry(stop_max_attempt_number=10) def _get_url_content(self, start_url): proxies = get_proxies_requests(start_url) random_header = get_header() add_header = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'Host': 'www.chinanews.com', 'Referer': start_url, 'Upgrade-Insecure-Requests': '1', } last_header = dict() last_header.update(random_header) last_header.update(add_header) html = requests.get(start_url, headers=last_header, proxies=proxies, timeout=10, allow_redirects=False) assert html.status_code == 200 return html

     

    调用代码:

    try: html = self._get_url_content(start_url=start_url) except: html = '' if html is not '': html_str = html.content.decode('utf8')

    报错为:

    H:\python3.5.2\python.exe F:/shining_future/spider_bogger/a044_news_spider/a14_chinanews_spider/chinanews_v1.py 爬虫启动 200 Exception in thread Thread-2: Traceback (most recent call last):   File "H:\python3.5.2\lib\threading.py", line 914, in _bootstrap_inner     self.run()   File "H:\python3.5.2\lib\threading.py", line 862, in run     self._target(*self._args, **self._kwargs)   File "F:/shining_future/spider_bogger/a044_news_spider/a14_chinanews_spider/chinanews_v1.py", line 223, in get_url_content     list_page_news_lists = self.get_data_from_response(orginal_html=html, year=year, month_today=month_today)   File "F:/shining_future/spider_bogger/a044_news_spider/a14_chinanews_spider/chinanews_v1.py", line 92, in get_data_from_response     html_str = orginal_html.content.decode('utf8')UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 30073: invalid continuation byte

    处理办法:

    html_str = html_str.content.decode('utf8', 'ignore')

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    最新回复(0)