當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

豆瓣TOP250爬虫，数据分析项目实战——pyecharts

發(fā)布時(shí)間：2023/12/10 编程问答 55 豆豆

生活随笔收集整理的這篇文章主要介紹了豆瓣TOP250爬虫，数据分析项目实战——pyecharts 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Python數(shù)據(jù)分析可視化項(xiàng)目——豆瓣TOP250爬蟲(chóng)，數(shù)據(jù)分析項(xiàng)目實(shí)戰(zhàn)

爬蟲(chóng)篇

暫時(shí)不寫(xiě)

數(shù)據(jù)分析篇

今天來(lái)將代碼整理一下，說(shuō)實(shí)話我都是按照別人的思路去修改的，感覺(jué)還沒(méi)有學(xué)透徹，但是會(huì)用就行，重在舉一反三。這也算是我學(xué)數(shù)據(jù)分析的一份作業(yè)吧。本來(lái)還想加入pie和其它圖的。但是不想貪心了，總要留點(diǎn)遺憾來(lái)讓你們來(lái)彌補(bǔ)。

1.先導(dǎo)入所有模塊

這里我們用pyecharts來(lái)數(shù)據(jù)可視化，pd做數(shù)據(jù)清洗，jieba分詞，collections做統(tǒng)計(jì)運(yùn)算，re正則匹配

**注意事項(xiàng)：**先說(shuō)好數(shù)據(jù)我都會(huì)給個(gè)百度網(wǎng)盤(pán)鏈接，同時(shí)我使用的是絕對(duì)路徑。因?yàn)槲矣玫氖莢scode，但我一直沒(méi)有去解決這個(gè)問(wèn)題，網(wǎng)上也找過(guò)。如果有大佬知道，可以留言告訴我，要詳細(xì)的那種。我是用的是新版本pyecharts，與時(shí)俱進(jìn)，追求新事物也是件很快樂(lè)的事。

import jieba import collections import re from pyecharts.charts import WordCloud,Page,Bar,Line,Boxplot,TreeMap,Scatter from pyecharts.globals import SymbolType from pyecharts import options as opts from pyecharts.globals import ThemeType, CurrentConfig import pandas as pd

我們先從一個(gè)與這個(gè)項(xiàng)目無(wú)關(guān)的詞云圖開(kāi)始，主要是我喜歡這首歌：一粒紅塵，因?yàn)槎拱昱廊〉臄?shù)據(jù)沒(méi)有包括評(píng)論（其實(shí)是爬蟲(chóng)我從CSDN找的代碼)。

#文件路徑是絕對(duì)路徑，命名請(qǐng)自行修改或刪除 with open("D:\Python\douban_HTML\一粒紅塵評(píng)論.txt",encoding="utf-8") as f:data = f.read() newdata = re.findall('[\u4e00-\u9fa5]+',data,re.S) newdata = ' '.join(newdata) txt_split = jieba.cut(newdata,cut_all=True) result_list = [] with open("D:\Python\douban_HTML\停用詞大全.txt",encoding="utf-8") as f:tyc = f.readlines()stop_words = set()for i in tyc:i = i.replace("\n","")for i in tyc:stop_words.add(i) #判斷 for word in txt_split:if word not in stop_words and len(word)>1:result_list.append(word) word_count = collections.Counter(result_list) word_count_top100 = word_count.most_common(100) word1 = WordCloud(init_opts=opts.InitOpts(width='600px', height='400px', theme=ThemeType.WONDERLAND,)) word1.add('詞頻', data_pair=word_count_top100,word_size_range=[5, 100], textstyle_opts=opts.TextStyleOpts(font_family='cursive'),shape=SymbolType.DIAMOND) word1.set_global_opts(title_opts=opts.TitleOpts('一粒紅塵評(píng)論詞云圖'),toolbox_opts=opts.ToolboxOpts(is_show=False, orient='vertical'),tooltip_opts=opts.TooltipOpts(is_show=True, background_color='red', border_color='yellow')) #word1.render("一粒紅塵評(píng)論詞云圖.html") word1.render_notebook() Building prefix dict from the default dictionary ... Loading model from cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache Loading model cost 1.920 seconds. Prefix dict has been built successfully. d:\Python\python\lib\site-packages\pyecharts\charts\chart.py:14: PendingDeprecationWarning: pyecharts 所有圖表類型將在 v1.9.0 版本開(kāi)始強(qiáng)制使用 ChartItem 進(jìn)行數(shù)據(jù)項(xiàng)配置 :) super().__init__(init_opts=init_opts)

簡(jiǎn)要總結(jié)一下思路：

1.先用pandas讀取文件

2.利用re匹配自己想要的內(nèi)容，這里我們需要匹配中文

3.調(diào)用jieba庫(kù)進(jìn)行分詞，這里是精確分詞

4.進(jìn)階操作，設(shè)置自己的停用詞txt，進(jìn)一步精確分詞，去除不想要，不相關(guān)的詞語(yǔ)

5.通常使用replace()方法去除不想要的字符，很好理解，他的英文就是“代替”

6.最終數(shù)據(jù)我們都需要保存到列表—list里面，這樣pyacharts才能讀取到數(shù)據(jù)，要不然會(huì)報(bào)錯(cuò)，并且折線圖的x軸數(shù)據(jù)需要是字符串str類型，建議你們可以試試x，y都是int時(shí)，畫(huà)出的圖會(huì)咋樣

7.去參照pyecharts中文文檔，自己添加組件，下面給大家整理一些主題

8.格式化代碼，避免縮進(jìn)格式錯(cuò)誤

9.查看效果，這里推薦使用jupyter，在vscode下載這個(gè)插件就ok，因?yàn)樗厥獾膇pynb文件可以實(shí)時(shí)查看效果

接下來(lái)步入正題，先讀取數(shù)據(jù)源douban_250_2.csv

[3]

df = pd.read_csv("D:\Python\douban_HTML\douban250_2.csv",header=0,names=["title","info","score","people"]) <>:1: DeprecationWarning: invalid escape sequence \P <>:1: DeprecationWarning: invalid escape sequence \P <>:1: DeprecationWarning: invalid escape sequence \P <ipython-input-3-d1978264e112>:1: DeprecationWarning: invalid escape sequence \P df = pd.read_csv("D:\Python\douban_HTML\douban250_2.csv",header=0,names=["title","info","score","people"])

[4]

df.head(5)#先展示10條 titleinfoscorepeople

0	肖申克的救贖	1994 / 美國(guó) / 犯罪劇情	9.7	2186881人評(píng)價(jià)
1	霸王別姬	1993 / 中國(guó)大陸中國(guó)香港 / 劇情愛(ài)情同性	9.6	1622453人評(píng)價(jià)
2	阿甘正傳	1994 / 美國(guó) / 劇情愛(ài)情	9.5	1650384人評(píng)價(jià)
3	這個(gè)殺手不太冷	1994 / 法國(guó) 美國(guó) / 劇情動(dòng)作犯罪	9.4	1834962人評(píng)價(jià)
4	泰坦尼克號(hào)	1997 / 美國(guó) / 劇情愛(ài)情災(zāi)難	9.4	1604815人評(píng)價(jià)

通過(guò)數(shù)據(jù)我們可以從電影名字，年份，國(guó)家，電影類型，評(píng)論人數(shù)去分析

1.對(duì)年份，國(guó)家，類型，評(píng)分計(jì)數(shù)—count

2.電影排名，評(píng)分排名

3.中外電影作品對(duì)比

4.電影與評(píng)論數(shù) 大家可以自己畫(huà)一個(gè)關(guān)系圖去擴(kuò)展些新的思路，比如環(huán)比(適合餅圖)，這里推薦用drawio這個(gè)軟件繪圖，思維導(dǎo)圖可以用ProcessOn

有了思路之后，但是這里面的數(shù)據(jù)并不是很友好，這也就是說(shuō)我們還需要進(jìn)行數(shù)據(jù)清洗

1.我們可以用excel，選中info列，進(jìn)行批量處理單元格操作，以/作為分隔符拆分?jǐn)?shù)據(jù)，注意拆分的數(shù)據(jù)會(huì)放在右側(cè)，記得先將info列放置最右側(cè)。

2.使用pandas進(jìn)行數(shù)據(jù)清洗哈哈，你會(huì)發(fā)現(xiàn)，對(duì)于info列里面的數(shù)據(jù)，excel也并不好操作，因?yàn)轭愋陀卸喾N

2.先來(lái)一個(gè)柱狀圖分析豆瓣電影top250的年份統(tǒng)計(jì)

這里先定義幾個(gè)列表,以免自己混淆，畢竟是將數(shù)據(jù)整合到一起，追求簡(jiǎn)潔

dom：爬取年份year dom1：bar 第一個(gè)圖表的數(shù)據(jù)源列表 dom2: bar2 第二個(gè)圖表的數(shù)據(jù)源列表

df_last：boxplot 以中國(guó)為行索引 df_last1：以外國(guó)為行索引，這里在中外電影評(píng)分對(duì)比有用處，可直接調(diào)用

doms2：bar3 國(guó)家計(jì)數(shù)top10 dom3:bar4 數(shù)據(jù)源

[5]

#制作電影年份統(tǒng)計(jì)柱形圖 dom = [] for i in df['info']:dom.append(i.split("/")[0].replace('(中國(guó)大陸)',"").strip()) df['year'] = dom place = df.groupby(['year'])['year'].agg(['count']) place.reset_index(inplace=True) place_last = place.sort_index() dom1 = place_last.sort_values('year', ascending=True) x = list(dom1['year']) y = list(dom1['count']) #制作柱形圖 bar = (Bar(init_opts=opts.InitOpts(theme=ThemeType.MACARONS)#在這里可以設(shè)置圖表寬高).add_xaxis(x).add_yaxis("豆瓣電影TOP250年份統(tǒng)計(jì)",y).set_global_opts(title_opts=opts.TitleOpts(title="",subtitle="",),#設(shè)置滾動(dòng)條datazoom_opts=opts.DataZoomOpts(),#圖例設(shè)置legend_opts=opts.LegendOpts(pos_left='center', # 圖例放置的位置，分上下左右，可用左右中表示，也可用百分比表示pos_top='5%',orient='horizontal', # horizontal、vertical #圖例放置的方式橫著放or豎著放textstyle_opts=opts.TextStyleOpts(font_size=16,color='skyblue',font_family='Times New Roman',),),yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(color='skyblue',),),#顯示工具欄#toolbox_opts=opts.ToolboxOpts(is_show=True),)#.render('豆瓣電影TOP250上映年份分布.html') ) bar.render_notebook() d:\Python\python\lib\site-packages\pyecharts\charts\chart.py:14: PendingDeprecationWarning: pyecharts 所有圖表類型將在 v1.9.0 版本開(kāi)始強(qiáng)制使用 ChartItem 進(jìn)行數(shù)據(jù)項(xiàng)配置 :) super().__init__(init_opts=init_opts)

簡(jiǎn)單總結(jié)：

1.先用split以/分隔數(shù)據(jù)輸出為列表格式，strip去掉空格，replace去掉特殊數(shù)據(jù)，仔細(xì)觀察數(shù)據(jù)，其實(shí)我們應(yīng)該手動(dòng)調(diào)整一下這個(gè)特殊數(shù)據(jù)，要不然處理太麻煩，不要過(guò)度追求完美。csv第51行數(shù)據(jù)，也可以去豆瓣top上去看。

2.新構(gòu)建year列，并對(duì)其分組進(jìn)行計(jì)數(shù)統(tǒng)計(jì)構(gòu)建count列，這是pandas里面的特殊格式，自己可以type()一下 3.重置索引，默認(rèn)排序(升序)，具體方法自行百度，我能力有限

3.豆瓣電影評(píng)分計(jì)數(shù)，這也是個(gè)柱形圖，可以自行嘗試一下，舉一反三

[6]

plase_s = df.groupby(['score'])['score'].agg(['count']) plase_s.reset_index(inplace=True) plase_s_last = plase_s.sort_index() dom2 = plase_s_last.sort_values('score',ascending=True) plase_s.reset_index(inplace=True) plase_s_last = plase_s.sort_index() dom2 = plase_s_last.sort_values('score',ascending=True) x = list(dom2['score']) y = list(dom2['count']) #制作柱形圖 bar2 = (Bar(init_opts=opts.InitOpts(theme=ThemeType.MACARONS,width="600px",height="400px")).add_xaxis(x).add_yaxis("豆瓣電影TOP250評(píng)分統(tǒng)計(jì)",y).set_global_opts(title_opts=opts.TitleOpts(title="",subtitle="",),datazoom_opts=opts.DataZoomOpts(),#圖例設(shè)置legend_opts=opts.LegendOpts(pos_left='center', # 圖例放置的位置，分上下左右，可用左右中表示，也可用百分比表示pos_top='5%',orient='vertical', # horizontal、vertical #圖例放置的方式橫著放or豎著放textstyle_opts=opts.TextStyleOpts(font_size=16,# color='skyblue',font_family='Times New Roman',),),yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(),),# #顯示工具欄# toolbox_opts=opts.ToolboxOpts(is_show=True),)#.render('豆瓣電影TOP250評(píng)分分布.html') ) bar2.render_notebook() d:\Python\python\lib\site-packages\pyecharts\charts\chart.py:14: PendingDeprecationWarning: pyecharts 所有圖表類型將在 v1.9.0 版本開(kāi)始強(qiáng)制使用 ChartItem 進(jìn)行數(shù)據(jù)項(xiàng)配置 :) super().__init__(init_opts=init_opts)

4.電影類型分布—樹(shù)形圖，這個(gè)有點(diǎn)小難

[7]

#制作樹(shù)形圖 (domt1,domt2,domt3,domt4) = ([],[],[],[]) for i in df['info']:i = i.split('/')[2].replace('\xa0','').replace(' 1978(中國(guó)大陸) ','')domt1.append(i) count = 0 for j in domt1:res = j.split(" ")for re in res:if re not in domt2:domt2.append(re)count += 1else:pass for k in domt2:num = 0for j in domt1:res = j.split(' ')for re in res:if re == k:num += 1domt3.append(num) #生成字典形式 def dict():for p in range(len(domt2)):data = {}#把類型作為鍵，計(jì)數(shù)作為值data['name'] = domt2[p]data['value'] = domt3[p]yield data data = dict() for item in data:domt4.append(item) #樹(shù)形圖 treemap = (TreeMap(init_opts=opts.InitOpts())#width="600px", height="400px".add(series_name="電影類型",data=domt4,visual_min=300,leaf_depth=1,# 標(biāo)簽居中為 position = "inside"label_opts=opts.LabelOpts(position="inside"),).set_global_opts(legend_opts=opts.LegendOpts(is_show=True),title_opts=opts.TitleOpts(title="", subtitle="2020/11", pos_left="leafDepth"),)#.render("豆瓣電影TOP250電影類型圖.html") ) treemap.render_notebook() d:\Python\python\lib\site-packages\pyecharts\charts\chart.py:14: PendingDeprecationWarning: pyecharts 所有圖表類型將在 v1.9.0 版本開(kāi)始強(qiáng)制使用 ChartItem 進(jìn)行數(shù)據(jù)項(xiàng)配置 :) super().__init__(init_opts=init_opts)

具體思路：

1.先分離數(shù)據(jù)到列表，再split對(duì)列表中的元素進(jìn)行分離到列表

2.domt2作為一個(gè)空列表，設(shè)置判斷條件，匹配一致就添加，得到電影類型的類目

3.再設(shè)置判斷條件，這里的思維比較逆，大家可以先思維導(dǎo)圖，能看懂以后看到能想到這種思維就ok了，對(duì)于我來(lái)說(shuō)

4.生成字典，主要是輸出格式，這可能是做樹(shù)形圖的標(biāo)準(zhǔn)格式吧

5.將數(shù)據(jù)導(dǎo)入列表，這一步跟之前照應(yīng)了，要是列表做數(shù)據(jù)源才行哦

5.中外電影年份計(jì)數(shù)分布對(duì)比—難度系數(shù)等同于樹(shù)形圖

[8]

(c1,c2,c3,c4) = ([],[],[],[]) for i in df['info']:country = i.split("/")[1].split(' ')[0].strip()if country in ['中國(guó)大陸', '臺(tái)灣', '香港']:c1.append('中國(guó)')else:c1.append('外國(guó)') df['country'] = c1 x_c = list(dom1['year']) #year前面提取過(guò)了，就直接拿過(guò)來(lái)用 #取索引為'中國(guó)'的行 # 對(duì)中國(guó)電影計(jì)數(shù)排序 df_last = df.loc[df['country'] == '中國(guó)'] place = df_last.groupby(['year'])['year'].agg(['count']) # print(place) place.reset_index(inplace=True) place_c_last = place.sort_index() c3 = place_c_last.sort_values('year', ascending=True) # 對(duì)外國(guó)電影計(jì)數(shù)排序 df_last1 = df.loc[df['country'] == '外國(guó)'] place = df_last1.groupby(['year'])['year'].agg(['count']) # print(place) place.reset_index(inplace=True) place_c_last = place.sort_index() c4 = place_c_last.sort_values('year', ascending=True) (c5,c6) = ([],[]) for j in x_c:for x, y in zip(c3['year'], c3['count']):if x == j:aaa = int(y)breakelse:aaa = int('0')continuec5.append(aaa) # 外國(guó)電影縱坐標(biāo) for j in x_c:for x, y in zip(c4['year'], c4['count']):if x == j:aaa = int(y)breakelse:aaa = int('0')continuec6.append(aaa) #繪制折線圖 line = (Line(init_opts=opts.InitOpts(theme=ThemeType.DARK) # 設(shè)置主題不知道如何添加寬高).add_xaxis(x_c).add_yaxis("中國(guó)",c5).add_yaxis("外國(guó)",c6)#.set_colors(["orange"]) # 柱子的顏色.set_series_opts(label_opts=opts.LabelOpts(is_show=False)).set_global_opts(title_opts=opts.TitleOpts(title="",subtitle="",),datazoom_opts=opts.DataZoomOpts(),#圖例設(shè)置legend_opts=opts.LegendOpts(pos_left='center', # 圖例放置的位置，分上下左右，可用左右中表示，也可用百分比表示pos_top='5%',orient='vertical', # horizontal、vertical #圖例放置的方式橫著放or豎著放textstyle_opts=opts.TextStyleOpts(font_size=16,# color='skyblue',font_family='Times New Roman',),),yaxis_opts=opts.AxisOpts(),#顯示工具欄# toolbox_opts=opts.ToolboxOpts(is_show=True),)#.render('豆瓣電影TOP250中外上映年份分布.html') ) line.render_notebook() d:\Python\python\lib\site-packages\pyecharts\charts\chart.py:14: PendingDeprecationWarning: pyecharts 所有圖表類型將在 v1.9.0 版本開(kāi)始強(qiáng)制使用 ChartItem 進(jìn)行數(shù)據(jù)項(xiàng)配置 :) super().__init__(init_opts=init_opts)

**具體思路：**交給你們補(bǔ)充了，偷下懶！寫(xiě)于2020-11-24-星期二

6.中外電影評(píng)分情況

[9]

x_axis = ['中國(guó)', '外國(guó)'] y_axis = [list(df_last['score']), list(df_last1['score'])] #制作箱圖 boxplot = Boxplot(init_opts=opts.InitOpts(theme=ThemeType.DARK,)) # # 設(shè)置主題不知道如何添加寬高 boxplot = (boxplot.add_xaxis(x_axis).add_yaxis(series_name="評(píng)分", y_axis=boxplot.prepare_data(y_axis)).set_global_opts(title_opts=opts.TitleOpts(title="",subtitle="",),#圖例設(shè)置legend_opts=opts.LegendOpts(pos_left='center', # 圖例放置的位置，分上下左右，可用左右中表示，也可用百分比表示pos_top='5%',orient='vertical', # horizontal、vertical #圖例放置的方式橫著放or豎著放textstyle_opts=opts.TextStyleOpts(font_size=16,color='white',font_family='Times New Roman',),),yaxis_opts=opts.AxisOpts(max_=10,min_=8,type_="value",name="評(píng)分",name_gap=5,),#顯示工具欄# toolbox_opts=opts.ToolboxOpts(is_show=True),)#.render('豆瓣電影TOP250中外評(píng)分情況.html') ) boxplot.render_notebook() d:\Python\python\lib\site-packages\pyecharts\charts\chart.py:14: PendingDeprecationWarning: pyecharts 所有圖表類型將在 v1.9.0 版本開(kāi)始強(qiáng)制使用 ChartItem 進(jìn)行數(shù)據(jù)項(xiàng)配置 :) super().__init__(init_opts=init_opts)

**具體思路：**直接調(diào)用上一步寫(xiě)好的df_last，df_last1即可，可以自己調(diào)試

7.條形圖—國(guó)家-地區(qū)電影數(shù)TOP10

[10]

doms = [] # 生成電影國(guó)家列表 for i in df['info']:country = i.split('/')[1].split(' ')[0].strip()print(country)doms.append(country) df['country'] = doms # 計(jì)數(shù)排序 place_message = df.groupby(['country']) place_com = place_message['country'].agg(['count']) print(place_com) place_com.reset_index(inplace=True) place_com_last = place_com.sort_index() doms2 = place_com_last.sort_values('count', ascending=False)[0:10] # print(doms2) # 生成柱狀圖 attr = list(doms2['country']) v1 = list(doms2['count']) attr.reverse() v1.reverse() print(attr) print(v1) bar3 = (Bar(init_opts=opts.InitOpts(theme=ThemeType.WONDERLAND,) # 設(shè)置主題不知道如何添加寬高寫(xiě)到里面去即可).add_xaxis(attr).add_yaxis("count",v1).reversal_axis().set_series_opts(label_opts=opts.LabelOpts(position="right"))#.set_colors(["orange"]) # 柱子的顏色.set_global_opts(title_opts=opts.TitleOpts(title="",subtitle="",pos_left="center", ),#圖例設(shè)置legend_opts=opts.LegendOpts(pos_left='center', # 圖例放置的位置，分上下左右，可用左右中表示，也可用百分比表示pos_top='middle',orient='vertical', # horizontal、vertical #圖例放置的方式橫著放or豎著放),xaxis_opts=opts.AxisOpts(# 坐標(biāo)軸標(biāo)簽的配置axislabel_opts=opts.LabelOpts(font_size=12,font_family='Times New Roman',#rotate=45),),yaxis_opts=opts.AxisOpts(),#顯示工具欄#toolbox_opts=opts.ToolboxOpts(is_show=True),)#.render('豆瓣電影TOP250-國(guó)家-地區(qū)電影數(shù)TOP10.html') ) bar3.render_notebook() 美國(guó) 中國(guó)大陸美國(guó) 法國(guó) 美國(guó) 意大利日本美國(guó) 美國(guó) 美國(guó) 意大利美國(guó) 美國(guó) 印度美國(guó) 法國(guó) 中國(guó)香港韓國(guó) 美國(guó) 中國(guó)香港美國(guó) 日本美國(guó) 美國(guó) 法國(guó) 美國(guó) 美國(guó) 中國(guó)大陸英國(guó) 美國(guó) 美國(guó) 黎巴嫩美國(guó) 印度美國(guó) 美國(guó) 美國(guó) 美國(guó) ... count country 1 中國(guó)臺(tái)灣 6 中國(guó)大陸 15 中國(guó)香港 19 丹麥 1 伊朗 2 印度 4 巴西 1 德國(guó) 5 意大利 6 新西蘭 1 日本 32 法國(guó) 8 泰國(guó) 1 澳大利亞 3 愛(ài)爾蘭 1 瑞典 1 美國(guó) 112 英國(guó) 17 西班牙 1 阿根廷 1 韓國(guó) 11 黎巴嫩 1 ['德國(guó)', '中國(guó)臺(tái)灣', '意大利', '法國(guó)', '韓國(guó)', '中國(guó)大陸', '英國(guó)', '中國(guó)香港', '日本', '美國(guó)'] [5, 6, 6, 8, 11, 15, 17, 19, 32, 112] d:\Python\python\lib\site-packages\pyecharts\charts\chart.py:14: PendingDeprecationWarning: pyecharts 所有圖表類型將在 v1.9.0 版本開(kāi)始強(qiáng)制使用 ChartItem 進(jìn)行數(shù)據(jù)項(xiàng)配置 :) super().__init__(init_opts=init_opts)

**注意事項(xiàng)：**柱形圖添加reverse()就可以得到條形圖了

8.豆瓣電影TOP250評(píng)價(jià)人數(shù)分布

[11]

(dom1, dom2) = ([], []) # 清洗數(shù)據(jù),建立評(píng)價(jià)人數(shù)列 for i in df['people']:dom1.append(i.replace('人評(píng)價(jià)', '')) df['people_last'] = dom1 # print(dom1) # print(df['people_last']) # 清洗數(shù)據(jù),建立電影名稱列(取第一個(gè)) for j in df['title']:dom2.append(j) df['title_last'] = dom2 # 切換為整型 df["people_last"] = df["people_last"].astype("int") # 計(jì)數(shù)排序,取前10 dom3 = df[['title_last', 'people_last']].sort_values('people_last', ascending=False)[0:10] # 生成柱狀圖 attr = list(dom3["title_last"]) v1 = list(dom3['people_last']) attr.reverse() v1.reverse() print(attr) print(v1) bar4 = (Bar(init_opts=opts.InitOpts(theme=ThemeType.WONDERLAND,)) # 設(shè)置主題不知道如何添加寬高寫(xiě)到里面去即可.add_xaxis(attr).add_yaxis("2019年豆瓣電影TOP10評(píng)價(jià)人數(shù)統(tǒng)計(jì)",v1).reversal_axis().set_series_opts(label_opts=opts.LabelOpts(position="right")).set_global_opts(title_opts=opts.TitleOpts(title="",subtitle="",),#圖例設(shè)置legend_opts=opts.LegendOpts(pos_left='center', # 圖例放置的位置，分上下左右，可用左右中表示，也可用百分比表示pos_top='bottom',orient='vertical', # horizontal、vertical #圖例放置的方式橫著放or豎著放textstyle_opts=opts.TextStyleOpts(font_size=16,# color='skyblue',font_family='Times New Roman',),),xaxis_opts=opts.AxisOpts( # 坐標(biāo)軸標(biāo)簽的配置axislabel_opts=opts.LabelOpts(font_size=12,font_family='Times New Roman',#rotate=45),),yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=12,font_family='Times New Roman',# color='skyblue',),),#顯示工具欄#toolbox_opts=opts.ToolboxOpts(is_show=False),)#.render('豆瓣電影TOP250評(píng)價(jià)人數(shù)分布.html') ) bar4.render_notebook()

9.評(píng)分維度散點(diǎn)圖

[12]

(dom2, dom3) = ([], []) # 生成電影排名列表 dom1 = ["{}".format(i) for i in range(1, 251)] # 生成電影評(píng)分列表 for i in df['score']:dom2.append(i) # 生成電影評(píng)價(jià)人數(shù)列表 for i in df['people']:dom3.append(i.replace('人評(píng)價(jià)', '')) # 生成數(shù)據(jù)列表 data = [list(i) for i in zip(dom1, dom2, dom3)] # 生成散點(diǎn)圖 x_lst = [int(v[0]) for v in data] y_lst = [v[1] for v in data] extra_data = [int(v[2]) for v in data] # print(x_lst) # print(y_lst) # print(extra_data) sc = (Scatter(init_opts=opts.InitOpts(theme=ThemeType.WONDERLAND,) # 設(shè)置主題不知道如何添加寬高寫(xiě)到里面去即可).add_xaxis(x_lst).add_yaxis("評(píng)分",y_lst)#.add_yaxis("評(píng)價(jià)人數(shù)",extra_data).set_global_opts(title_opts=opts.TitleOpts(title="",subtitle="",),visualmap_opts=opts.VisualMapOpts(dimension = extra_data,is_show = True,series_index = [x_lst,extra_data],type_="size",min_=min(y_lst),max_=max(y_lst),range_text = ['High', 'Low'],orient = "horizontal",),xaxis_opts=opts.AxisOpts(interval= max(x_lst)-min(x_lst), # Optional[Numeric]grid_index=0, # Numericsplit_number= 50, # Numeric# boundary_gap='', # Union[str, bool, None]# 坐標(biāo)軸刻度配置項(xiàng)axistick_opts=opts.AxisTickOpts(is_show=True, # 是否顯示# is_inside=True, # 刻度線是否在內(nèi)側(cè)), # 坐標(biāo)軸線的配置axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(# width=50,# color='black',)),),yaxis_opts=opts.AxisOpts(min_= 8, # Union[Numeric, str, None]max_= 10, # Union[Numeric, str, None]axistick_opts=opts.AxisTickOpts(# is_show=False, # 是否顯示# is_inside=True, # 刻度線是否在內(nèi)側(cè) ),axislabel_opts=opts.LabelOpts(font_size=16,font_family='Times New Roman',color='skyblue',),),#顯示工具欄#toolbox_opts=opts.ToolboxOpts(is_show=True),)#.render('豆瓣電影TOP250-排名評(píng)分人數(shù)三維度.html') ) sc.render_notebook()

**說(shuō)明：**這個(gè)圖存在一點(diǎn)問(wèn)題，功能沒(méi)有完全實(shí)現(xiàn)，本來(lái)是三維散點(diǎn)圖，但是現(xiàn)版本，我不知道如何畫(huà)平面的三維散點(diǎn)圖。上面是一個(gè)二維散點(diǎn)圖，還可以進(jìn)一步修改完善

最后，就是將這8個(gè)表整合到一起了。 通過(guò)學(xué)習(xí)pyecharts中文文檔，我們可以得知

overlap：用來(lái)實(shí)現(xiàn)在一個(gè)圖表里面添加多種類型，如bar和line，并且設(shè)置元件可以有多個(gè)y軸(一般是固定x軸)

grid：用來(lái)在一個(gè)盒子里裝多個(gè)圖表，是一個(gè)整體

page：實(shí)現(xiàn)多個(gè)圖表可以自由移動(dòng)，并且都在同一個(gè)html里

所以，這里我們使用Page方法

10.調(diào)用Page方法

[13]

page = Page(layout=Page.DraggablePageLayout) page.add( bar, bar2, bar3, bar4, line, sc, word1, treemap, boxplot ) page.render_notebook() page.render('render.html') #Page.save_resize_html('render.html',cfg_file='chart_config.json')

**注意事項(xiàng)：**這里先注釋掉最后一句，然后打開(kāi)生成的render.html，下載chart_config.json到本地，然后再執(zhí)行最后一句，會(huì)得到一個(gè)resize_render.html 二者有區(qū)別，大家可以試試

常見(jiàn)報(bào)錯(cuò)解決

UndefinedError: ‘int object’ has no attribute ‘endswith’ 這里是因?yàn)榻o圖表設(shè)置width，height里面要給字符串，不能是int

項(xiàng)目所有資源我都整理到這里了，請(qǐng)自行下載！

下載鏈接：https://pan.baidu.com/s/10gBk44MpTimBYHaZ7D6jeQ
提取碼：love

github地址：https://github.com/git123hub121/Python—TOP250.git

實(shí)現(xiàn)可視化效果

在網(wǎng)上找一個(gè)模板，放心，我給你了。

對(duì)模板進(jìn)行對(duì)應(yīng)的添加，最好先把各圖表的參數(shù)都調(diào)好，以免反復(fù)調(diào)試。具體就不多敘述了。

最終效果：

總結(jié)

以上是生活随笔為你收集整理的豆瓣TOP250爬虫，数据分析项目实战——pyecharts的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： eclipse 创建 maven web
下一篇： Mac中安装Node和版本控制工具nvm