请查收,最近B站献给新一代的青年宣言片。 一、找到评论链接 二、敲代码 1、利用requests库请求获取网页 2、找到所有评论并将其存入二维列表 3、将二维列表中的所有数据下载到文本 4、生成专属词云图 完整代码 效果图
国家一级演员何冰走上舞台,以青年宣言《后浪》为词,认可、赞美与寄语年轻一代。在UP主们的青春混剪中,属于年轻人的光芒正在闪耀。“你们有幸 遇见这样的时代 但时代更有幸 遇见这样的你们”
用Python爬取《后浪》弹幕,看看“后浪”都在评论些什么?
进入B站《后浪》播放页面,按F12键后,刷新页面。
评论链接在红色标记下对应的包,蓝线上方Request URL即为评论链接https://api.bilibili.com/x/v1/dm/list.so?oid=188273397。注意,Hide data URLs勾选为All。def getHtmlText(url): try: response = requests.get(url) response.raise_for_status() response.encoding = response.apparent_encoding data = response.content.decode('utf-8')#utf-8转码,不然会出现乱码 return data except: return ''
所有评论都在标签下的文本中,而所有标签都在标签下。利用bs4库中的BeautifulSoup函数将1中得到的页面进行解析。找到标签下的文本存入列表。(这里也可以用正则表达式)def fillList(html,list): soup = BeautifulSoup(html,'html.parser') itotal = soup.find('i') dtotals = itotal.find_all('d') for dtotal in dtotals: danmu = dtotal.text List.append([danmu])
def down_danmu(list): for list in List: with open('后浪弹幕.txt', 'a', encoding='utf-8-sig') as f: s = str(list).replace('[','').replace(']','') + 'n' #去除[],这两行按数据不同,可以选择,每行末尾加换行符 s = s.replace("'", '').replace(',', '')#去除单引号,逗号,每行末尾追加换行符 f.write(s) print("文件写入成功!")
def showPic(): img = np.array(image.open('bili.jpg'))#numpy库与PIL库引入自定义图片 with open('后浪弹幕.txt','r',encoding='utf-8') as f: text = f.read() w = wordcloud.WordCloud(font_path="msyh.ttc", mask=img, scale=15,stopwords=' ', width=1000, height=700, background_color='white') w.generate(text) w.to_file("Houlang.png")
import requests from bs4 import BeautifulSoup import wordcloud import PIL.Image as image import numpy as np def getHtmlText(url): try: response = requests.get(url) response.raise_for_status() response.encoding = response.apparent_encoding html = response.content.decode('utf-8') return html except: return '' def fillList(html,list): soup = BeautifulSoup(html,'html.parser') itotal = soup.find('i') dtotals = itotal.find_all('d') for dtotal in dtotals: danmu = dtotal.text List.append([danmu]) def down_danmu(list): for list in List: with open('./后浪弹幕.txt', 'a', encoding='utf-8-sig') as f: s = str(list).replace('[','').replace(']','') + 'n' s = s.replace("'", '').replace(',', '') f.write(s) print("文件写入成功!") def showPic(): img = np.array(image.open('bili.jpg')) with open('后浪弹幕.txt','r',encoding='utf-8') as f: text = f.read() w = wordcloud.WordCloud(font_path="msyh.ttc", mask=img, scale=15,stopwords=' ', width=1000, height=700, background_color='white') w.generate(text) w.to_file("Houlang.png") List = [] url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=186803402' html = getHtmlText(url) fillList(html,List) down_danmu(List) showPic()
本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器 下载并得到。
ImovieBox网页视频下载器 下载地址: ImovieBox网页视频下载器-最新版本下载
本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.
阅读和此文章类似的: 全球云计算