當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

数据可视化：世界银行数据（1960-2017）

發(fā)布時(shí)間：2023/12/29 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了数据可视化：世界银行数据（1960-2017）小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

數(shù)據(jù)可視化：世界銀行數(shù)據(jù)（1960-2017）

我選擇并下載了數(shù)據(jù)集The World Bank Data by Indicators 1960-2017用于這次的作業(yè)，并選擇 Jupyter Notebooks (Python) 作為我的可視化工具。

這個(gè)數(shù)據(jù)集十分龐大，但卻很結(jié)構(gòu)化（以表格的形式組織），并且這個(gè)數(shù)據(jù)集包含超過(guò)20個(gè)被清洗過(guò)的數(shù)據(jù)集。我選擇了climate-change和health這兩個(gè)數(shù)據(jù)集作為探索性分析所用的數(shù)據(jù)。如今，Python為使用者提供了很多優(yōu)秀的計(jì)算和可視化的工具，例如NumPy、Pandas、Matplotlib和Pyecharts。

1. 二氧化碳排放量

在文件climate-change.csv中包含了世界各國(guó)從1960年至2014年的各種溫室氣體的排放量。

為了探究二氧化碳這種溫室氣體的排放情況，我決定先將世界主要國(guó)家的二氧化碳排放量進(jìn)行可視化。

首先，先對(duì)數(shù)據(jù)進(jìn)行選擇和清洗，例如：選擇國(guó)家、修改或剔除明顯錯(cuò)誤的數(shù)據(jù)：

import pandas as pd import matplotlib.pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = Falseraw_data = pd.read_csv('climate-change.csv') dismiss_years = [1960, 1970, 1980, 1990]# 中國(guó)數(shù)據(jù) China_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'China',['Country Name', 'Year', 'CO2 emissions (kt)']] China_CO2_emission_data = China_CO2_emission_data.loc[China_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:China_CO2_emission_data = China_CO2_emission_data.loc[China_CO2_emission_data['Year'] != year] China_CO2_emission_data.sort_values('Year', inplace=True)# 美國(guó)數(shù)據(jù) US_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'United States',['Country Name', 'Year', 'CO2 emissions (kt)']] US_CO2_emission_data = US_CO2_emission_data.loc[US_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:US_CO2_emission_data = US_CO2_emission_data.loc[US_CO2_emission_data['Year'] != year] US_CO2_emission_data.sort_values('Year', inplace=True)# 印度數(shù)據(jù) India_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'India',['Country Name', 'Year', 'CO2 emissions (kt)']] India_CO2_emission_data = India_CO2_emission_data.loc[India_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:India_CO2_emission_data = India_CO2_emission_data.loc[India_CO2_emission_data['Year'] != year] India_CO2_emission_data.sort_values('Year', inplace=True)# 日本數(shù)據(jù) Japan_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'Japan',['Country Name', 'Year', 'CO2 emissions (kt)']] Japan_CO2_emission_data = Japan_CO2_emission_data.loc[Japan_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:Japan_CO2_emission_data = Japan_CO2_emission_data.loc[Japan_CO2_emission_data['Year'] != year] Japan_CO2_emission_data.sort_values('Year', inplace=True)# 英國(guó)數(shù)據(jù) UK_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'United Kingdom',['Country Name', 'Year', 'CO2 emissions (kt)']] UK_CO2_emission_data = UK_CO2_emission_data.loc[UK_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:UK_CO2_emission_data = UK_CO2_emission_data.loc[UK_CO2_emission_data['Year'] != year] UK_CO2_emission_data.sort_values('Year', inplace=True)# 法國(guó)數(shù)據(jù) France_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'France',['Country Name', 'Year', 'CO2 emissions (kt)']] France_CO2_emission_data = France_CO2_emission_data.loc[France_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:France_CO2_emission_data = France_CO2_emission_data.loc[France_CO2_emission_data['Year'] != year] France_CO2_emission_data.sort_values('Year', inplace=True)# 俄羅斯數(shù)據(jù) Russia_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'Russian Federation',['Country Name', 'Year', 'CO2 emissions (kt)']] Russia_CO2_emission_data = Russia_CO2_emission_data.loc[Russia_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:Russia_CO2_emission_data = Russia_CO2_emission_data.loc[Russia_CO2_emission_data['Year'] != year] Russia_CO2_emission_data.sort_values('Year', inplace=True)# 德國(guó)數(shù)據(jù) Germany_CO2_emission_data = raw_data.loc[raw_data['Country Name'] == 'Germany',['Country Name', 'Year', 'CO2 emissions (kt)']] Germany_CO2_emission_data = Germany_CO2_emission_data.loc[Germany_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:Germany_CO2_emission_data = Germany_CO2_emission_data.loc[Germany_CO2_emission_data['Year'] != year] Germany_CO2_emission_data.sort_values('Year', inplace=True)

然后，對(duì)世界主要國(guó)家的二氧化碳排放量進(jìn)行可視化：

plt.close('all') plt.figure(figsize=(10.0, 8.0))China_line = plt.plot(China_CO2_emission_data.loc[:,['Year']],China_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='red', label='中國(guó)')US_line = plt.plot(US_CO2_emission_data.loc[:,['Year']],US_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='blue', label='美國(guó)')India_line = plt.plot(India_CO2_emission_data.loc[:,['Year']],India_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='green', label='印度')Russia_line = plt.plot(Russia_CO2_emission_data.loc[:,['Year']],Russia_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='pink', label='俄羅斯')Japan_line = plt.plot(Japan_CO2_emission_data.loc[:,['Year']],Japan_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='purple', label='日本')Germany_line = plt.plot(Germany_CO2_emission_data.loc[:,['Year']],Germany_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='black', label='德國(guó)')UK_line = plt.plot(UK_CO2_emission_data.loc[:,['Year']],UK_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='brown', label='英國(guó)')France_line = plt.plot(France_CO2_emission_data.loc[:,['Year']],France_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='orange', label='法國(guó)')# 從1961年到2014年 plt.title('世界主要國(guó)家的二氧化碳排放量') plt.xlabel('年份') plt.ylabel('二氧化碳排放量/kt') plt.xlim(1960, 2015) plt.ylim(0, 1.2e7) plt.grid(which='major', axis='both', color='black', linestyle='--', alpha=0.2) plt.legend(loc='upper left')plt.savefig('img1.jpg') plt.show()

可視化結(jié)果：

可以發(fā)現(xiàn)，有的西方國(guó)家的二氧化碳排放量比較穩(wěn)定，有的西方國(guó)家的二氧化碳排放量在上世紀(jì)逐漸增長(zhǎng)，到了本世紀(jì)開(kāi)始趨于穩(wěn)定；除日本外的大部分亞洲國(guó)家的二氧化碳排放量一直在增長(zhǎng)，尤其是到了本世紀(jì)開(kāi)始加速增長(zhǎng)。

中國(guó)的二氧化碳排放量從2000年以后開(kāi)始快速增長(zhǎng)。截至2014年，中國(guó)已成為二氧化碳排放量的第一大國(guó)，其排放量是第二名——美國(guó)的將近兩倍。但從2012年開(kāi)始，中國(guó)的二氧化碳排放量的增速開(kāi)始顯著放緩。

經(jīng)濟(jì)的增長(zhǎng)是否就意味著二氧化碳排放量的增加？

由于數(shù)據(jù)集中沒(méi)有反映經(jīng)濟(jì)狀況的一個(gè)很重要指標(biāo)——GDP，所以我另外從世界銀行的官網(wǎng)上下載到了世界各國(guó)從1960年至2019年的GDP數(shù)據(jù)集。

經(jīng)過(guò)對(duì)數(shù)據(jù)的選擇與清洗，將世界主要國(guó)家的二氧化碳排放量和GDP在同一張圖里進(jìn)行可視化：

import pandas as pd import matplotlib.pyplot as plt import numpy as npplt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = Falseraw_emission_data = pd.read_csv('climate-change.csv') dismiss_years = [1960, 1970, 1980, 1990]raw_gdp_data = pd.read_csv('gdp.csv') raw_gdp_data = raw_gdp_data.drop(['Country Code', 'Indicator Name', 'Indicator Code', '1960', '2015', '2016', '2017', '2018', '2019', '2020'], axis=1)# 中國(guó)數(shù)據(jù) China_gdp_data = raw_gdp_data.loc[raw_gdp_data['Country Name']=='China'].drop(['Country Name'], axis=1)# 美國(guó)數(shù)據(jù) US_gdp_data = raw_gdp_data.loc[raw_gdp_data['Country Name']=='United States'].drop(['Country Name'], axis=1)# 印度數(shù)據(jù) India_gdp_data = raw_gdp_data.loc[raw_gdp_data['Country Name']=='India'].drop(['Country Name'], axis=1)# 日本數(shù)據(jù) Japan_gdp_data = raw_gdp_data.loc[raw_gdp_data['Country Name']=='Japan'].drop(['Country Name'], axis=1)years = np.arange(1961, 2015)# 中國(guó)數(shù)據(jù) China_CO2_emission_data = raw_emission_data.loc[raw_emission_data['Country Name'] == 'China',['Country Name', 'Year', 'CO2 emissions (kt)']] China_CO2_emission_data = China_CO2_emission_data.loc[China_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:China_CO2_emission_data = China_CO2_emission_data.loc[China_CO2_emission_data['Year'] != year] China_CO2_emission_data.sort_values('Year', inplace=True)# 美國(guó)數(shù)據(jù) US_CO2_emission_data = raw_emission_data.loc[raw_emission_data['Country Name'] == 'United States',['Country Name', 'Year', 'CO2 emissions (kt)']] US_CO2_emission_data = US_CO2_emission_data.loc[US_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:US_CO2_emission_data = US_CO2_emission_data.loc[US_CO2_emission_data['Year'] != year] US_CO2_emission_data.sort_values('Year', inplace=True)# 印度數(shù)據(jù) India_CO2_emission_data = raw_emission_data.loc[raw_emission_data['Country Name'] == 'India',['Country Name', 'Year', 'CO2 emissions (kt)']] India_CO2_emission_data = India_CO2_emission_data.loc[India_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:India_CO2_emission_data = India_CO2_emission_data.loc[India_CO2_emission_data['Year'] != year] India_CO2_emission_data.sort_values('Year', inplace=True)# 日本數(shù)據(jù) Japan_CO2_emission_data = raw_emission_data.loc[raw_emission_data['Country Name'] == 'Japan',['Country Name', 'Year', 'CO2 emissions (kt)']] Japan_CO2_emission_data = Japan_CO2_emission_data.loc[Japan_CO2_emission_data['CO2 emissions (kt)'] != 0] for year in dismiss_years:Japan_CO2_emission_data = Japan_CO2_emission_data.loc[Japan_CO2_emission_data['Year'] != year] Japan_CO2_emission_data.sort_values('Year', inplace=True)plt.close('all')fig = plt.figure(figsize=(10.0, 8.0))ax1 = fig.add_subplot(111)China_CO2_emission_line = ax1.plot(China_CO2_emission_data.loc[:,['Year']],China_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='red', label='中國(guó)二氧化碳排放')US_CO2_emission_line = ax1.plot(US_CO2_emission_data.loc[:,['Year']],US_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='blue', label='美國(guó)二氧化碳排放')India_CO2_emission_line = ax1.plot(India_CO2_emission_data.loc[:,['Year']],India_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='green', label='印度二氧化碳排放')Japan_CO2_emission_line = ax1.plot(Japan_CO2_emission_data.loc[:,['Year']],Japan_CO2_emission_data.loc[:,['CO2 emissions (kt)']],lw=2, ls='-', color='purple', label='日本二氧化碳排放')ax1.set_xlabel('年份') ax1.set_xlim(1960, 2015) ax1.set_ylabel('二氧化碳排放量/kt') ax1.set_ylim(0, 1.6e7)ax1.grid(which='major', axis='both', color='black', linestyle='--', alpha=0.2) ax1.legend(loc='upper left')ax2 = ax1.twinx()China_gdp_line = ax2.plot(years,China_gdp_data.values.reshape(China_gdp_data.shape[1]),lw=1, ls='--', color='red', label='中國(guó)GDP')US_gdp_line = ax2.plot(years,US_gdp_data.values.reshape(US_gdp_data.shape[1]),lw=1, ls='--', color='blue', label='美國(guó)GDP')India_gdp_line = ax2.plot(years,India_gdp_data.values.reshape(India_gdp_data.shape[1]),lw=1, ls='--', color='green', label='印度GDP')Japan_gdp_line = ax2.plot(years,Japan_gdp_data.values.reshape(Japan_gdp_data.shape[1]),lw=1, ls='--', color='purple', label='日本GDP')ax2.set_ylabel('GDP/美元') ax2.set_ylim(0, 2.0e13)ax2.grid(which='major', axis='both', color='black', linestyle='--', alpha=0.2) ax2.legend(loc='upper left', bbox_to_anchor=(0.25,1))# 從1961年到2014年 plt.title('世界主要國(guó)家的二氧化碳排放量和GDP')plt.savefig('img2.jpg') plt.show()

可視化結(jié)果：

可以發(fā)現(xiàn)，中國(guó)與印度的二氧化碳排放量和GDP都在同時(shí)顯著地增長(zhǎng)；而美國(guó)與日本的二氧化碳排放量雖然維持在比較穩(wěn)定的水平，但GDP仍在顯著地增長(zhǎng)。

結(jié)合歷史，我對(duì)此的解釋是：進(jìn)入本世紀(jì)以后，由于西方國(guó)家的勞動(dòng)力成本不斷增加，西方國(guó)家的大部分工業(yè)生產(chǎn)轉(zhuǎn)移到了擁有廉價(jià)勞動(dòng)力和資源的亞洲國(guó)家；而污染程度較小的服務(wù)業(yè)和高新技術(shù)產(chǎn)業(yè)逐漸成為了西方國(guó)家的經(jīng)濟(jì)支柱，這樣就造成了“發(fā)展中國(guó)家依靠有環(huán)境污染的工業(yè)發(fā)展經(jīng)濟(jì)，而發(fā)達(dá)國(guó)家依靠服務(wù)業(yè)和高新技術(shù)產(chǎn)業(yè)發(fā)展經(jīng)濟(jì)”的現(xiàn)象。

2. 中國(guó)人口

在文件health.csv中包含了中國(guó)從1960年至2017年的男性和女性人口數(shù)量。

為了探究中國(guó)人口數(shù)量的增長(zhǎng)情況，我對(duì)數(shù)據(jù)進(jìn)行了清洗和可視化：

import pandas as pd import matplotlib.pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = Falseraw_data = pd.read_csv('health.csv') change_years = [1970, 1980, 1990]# 中國(guó)人口數(shù)據(jù) China_population_data = raw_data.loc[raw_data['Country Name'] == 'China',['Year', 'Population, female', 'Population, male']] China_population_data = China_population_data.loc[China_population_data['Year'] != 1960]for year in change_years:pre_year_data = China_population_data.loc[China_population_data['Year'] == year-1, ['Population, female', 'Population, male']].valuesnext_year_data = China_population_data.loc[China_population_data['Year'] == year+1, ['Population, female', 'Population, male']].valuesChina_population_data.loc[China_population_data['Year'] == year, ['Population, female', 'Population, male']] = (pre_year_data+next_year_data)/2China_population_data.sort_values("Year", inplace=True)plt.close('all') plt.figure(figsize=(10.0, 8.0))years = China_population_data.loc[:, ['Year']].values.reshape(China_population_data.shape[0]) China_male_population = China_population_data.loc[:, ['Population, male']].values.reshape(China_population_data.shape[0]) China_female_population = China_population_data.loc[:, ['Population, female']].values.reshape(China_population_data.shape[0])width = 0.8 male_bar = plt.bar(years, China_male_population, width, color='royalblue', label='男性') female_bar = plt.bar(years, China_female_population, width, bottom=China_male_population, color='hotpink', label='女性')# 數(shù)據(jù)從1961年至2017年 plt.title('中國(guó)人口數(shù)量') plt.xlabel('年份') plt.ylabel('人口數(shù)量') plt.grid(which='major', axis='both', color='black', linestyle='--', alpha=0.2) plt.legend(loc='upper left')plt.savefig('img3.jpg') plt.show()

可視化結(jié)果：

從圖中不難看出，中國(guó)人口數(shù)量的增長(zhǎng)存在著兩個(gè)明顯的轉(zhuǎn)折點(diǎn)：一個(gè)在1970年至1980年之間，另一個(gè)在1990年至2000年之間。

通過(guò)查閱資料得知，政府在第四個(gè)五年計(jì)劃（從1970年至1975年）中提出“一個(gè)不少，兩個(gè)正好，三個(gè)多了”的口號(hào)；從1995年起，政府提倡“晚婚晚育、少生優(yōu)生”；兩個(gè)政策的時(shí)間與圖中兩個(gè)轉(zhuǎn)折點(diǎn)的時(shí)間比較吻合。可見(jiàn)，計(jì)劃生育政策對(duì)中國(guó)人口數(shù)量產(chǎn)生了巨大的影響。

3. 世界各國(guó)人口

在文件health.csv中包含了世界各國(guó)從1960年至2017年的人口數(shù)量。

為了將世界各國(guó)人口情況更好地展示出來(lái)，我先利用直方圖統(tǒng)計(jì)了一下2017年世界各國(guó)人口數(shù)量的情況：

import pandas as pd import matplotlib.pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = Falseraw_data = pd.read_csv('health.csv') country_codes = pd.read_csv('country code.csv')case = [] for code in raw_data['Country Code'].values:case.extend([code in country_codes.values])population_data = raw_data.loc[case].loc[raw_data['Year'] == 2017,['Population, total']] population_data = population_data.loc[population_data['Population, total']!=0]plt.hist(population_data.values, bins=10, edgecolor="black", facecolor="royalblue")plt.title('2017年世界各國(guó)人口數(shù)量直方圖') plt.xlabel('人口數(shù)量') plt.ylabel('頻數(shù)') plt.grid(which='major', axis='both', color='black', linestyle='--', alpha=0.2)plt.savefig('img6.jpg') plt.show()

統(tǒng)計(jì)結(jié)果：

可以發(fā)現(xiàn)，絕大部分國(guó)家的人口數(shù)量低于2億人（超過(guò)200個(gè)國(guó)家）。而且，最少的人口數(shù)量（瑙魯共和國(guó)，13649人）與最多的人口數(shù)量（中國(guó)，1386395000人）之間的差距超過(guò)十萬(wàn)倍（世界各國(guó)人口數(shù)量之間的差距極為懸殊）。

接著，我將前30個(gè)人口數(shù)量最多的國(guó)家與剩余的人口數(shù)量進(jìn)行可視化：

可視化結(jié)果：

可以看出，人口數(shù)量最多的兩個(gè)國(guó)家：中國(guó)和印度，都擁有超過(guò)13億人，而人口數(shù)量第三多的國(guó)家：美國(guó)，擁有不到4億人，且與前兩者的差距超過(guò)四倍。不難想象，人口數(shù)量更少的國(guó)家與前兩者的差距還會(huì)進(jìn)一步地增大。

數(shù)據(jù)的極端懸殊性不利于數(shù)據(jù)的可視化（后面會(huì)講到如何解決）。

以上的條形圖雖然能直觀地對(duì)比前30個(gè)國(guó)家之間的人口數(shù)量，但各個(gè)國(guó)家的人口數(shù)量占世界人口數(shù)量的比重就不是那么直觀了。所以，我又將數(shù)據(jù)可視化成一個(gè)餅圖（為了更好的可視化效果，我選取了前10個(gè)國(guó)家）：

可視化結(jié)果：

這樣就可以很直觀地看出各個(gè)國(guó)家的人口數(shù)量占世界人口數(shù)量的比重。其中，中國(guó)和印度的人口數(shù)量都超過(guò)世界人口數(shù)量的1/6，二者總?cè)丝跀?shù)量超過(guò)世界人口數(shù)量的1/3。

最后，我使用Pyecharts將世界各國(guó)的人口用顏色在地圖上表示出來(lái)（代碼A）：

但效果并不好：除了中國(guó)和印度為紅色，其他國(guó)家大多為藍(lán)綠色或綠色。

這正是由于數(shù)據(jù)過(guò)于極端而造成的：在顏色條上，中國(guó)和印度位于頂部，而其他國(guó)家則“擁擠”在底部，使得國(guó)家之間的顏色差距很小，中間的顏色并沒(méi)有得到很好的利用。

于是想到，如果將人口數(shù)量取對(duì)數(shù)，則各個(gè)國(guó)家的數(shù)據(jù)差距就會(huì)極大地減小。

以下兩張圖展示的是將世界各國(guó)的人口數(shù)量取對(duì)數(shù)的前后對(duì)比（代碼B）：

原數(shù)據(jù)：

取對(duì)數(shù)：

可以發(fā)現(xiàn)，取對(duì)數(shù)后，各國(guó)的數(shù)據(jù)之間的差距明顯變得平滑。于是，我將取對(duì)數(shù)后的世界各國(guó)的人口數(shù)量進(jìn)行可視化：

這樣的可視化的效果比之前好很多。

但我覺(jué)得還有明顯不足的地方：地圖大部分為紅色、橙色、黃色等顏色，而藍(lán)色和綠色占地圖極少部分。

經(jīng)過(guò)分析，發(fā)現(xiàn)其原因?yàn)?#xff1a;大部分幅員遼闊的國(guó)家往往人口數(shù)量也眾多，所以這些國(guó)家的顏色往往為黃色、橙色或紅色，這也使得地圖看上去幾乎全是黃色、橙色或紅色。

改進(jìn)的方法是，取前幾十個(gè)國(guó)家的數(shù)據(jù)作為顏色條的分布，后面的國(guó)家的顏色則取為最小數(shù)據(jù)所對(duì)應(yīng)的顏色。雖然后面的國(guó)家的顏色會(huì)變?yōu)橐粯?#xff0c;但實(shí)際上他們的數(shù)量級(jí)往往在10⁴~10⁶之間（即人口數(shù)量為幾萬(wàn)至幾百萬(wàn)之間），相較于人口幾千萬(wàn)甚至上億的國(guó)家來(lái)說(shuō)分別不是很大，故具有一定的合理性。

可視化效果：

經(jīng)過(guò)反復(fù)比較，最終確定取前150個(gè)國(guó)家的數(shù)據(jù)作為顏色條的分布。這樣，地圖的可視化效果由進(jìn)一步得到了提高。

代碼A：

from pyecharts.charts import Map,Geo from pyecharts import options as opts import pandas as pd import numpy as np import mathraw_data = pd.read_csv('health.csv') country_codes = pd.read_csv('country code.csv')case = [] for code in raw_data['Country Code'].values:case.extend([code in country_codes.values])population_data = raw_data.loc[case].loc[raw_data['Year'] == 2017,['Country Name', 'Population, total']] population_data = population_data.loc[population_data['Population, total']!=0] population_data.sort_values('Population, total', ascending=False, inplace=True)number = -1country_names = population_data.loc[:, ['Country Name']].values.reshape(population_data.shape[0])[0:number].tolist() country_populations = population_data.loc[:, ['Population, total']].values.reshape(population_data.shape[0])[0:number].tolist()country_names[country_names.index('Russian Federation')] = 'Russia' country_names[country_names.index('Egypt, Arab Rep.')] = 'Egypt' country_names[country_names.index('Congo, Dem. Rep.')] = 'Dem. Rep. Congo' country_names[country_names.index('Iran, Islamic Rep.')] = 'Iran'country_names[country_names.index('Czech Republic')] = 'Czech Rep.' country_names[country_names.index('Slovak Republic')] = 'Slovakia' country_names[country_names.index('Yemen, Rep.')] = 'Yemen' country_names[country_names.index('Korea, Rep.')] = 'Korea' country_names[country_names.index('Korea, Dem. People’s Rep.')] = 'Dem. Rep. Korea' country_names[country_names.index('Kyrgyz Republic')] = 'Kyrgyzstan' country_names[country_names.index('Bosnia and Herzegovina')] = 'Bosnia and Herz.' country_names[country_names.index('Macedonia, FYR')] = 'Macedonia' country_names[country_names.index('South Sudan')] = 'S. Sudan' country_names[country_names.index('Central African Republic')] = 'Central African Rep.' country_names[country_names.index('Congo, Rep.')] = 'Congo' country_names[country_names.index('Venezuela, RB')] = 'Venezuela' country_names[country_names.index('Dominican Republic')] = 'Dominican Rep.' country_names[country_names.index('Syrian Arab Republic')] = 'Syria' country_names[country_names.index('Equatorial Guinea')] = 'Eq. Guinea' country_names[country_names.index("Cote d'Ivoire")] = "C?te d'Ivoire"data = zip(country_names, country_populations)m = (Map().add('', data, maptype='world', is_map_symbol_show = False).set_series_opts(label_opts=opts.LabelOpts(is_show=False)).set_global_opts(title_opts=opts.TitleOpts(title=''),visualmap_opts=opts.VisualMapOpts(max_=country_populations[0])) )m.render('World.html')# -----------------------------第1次改進(jìn)-----------------------------number = -1country_names = population_data.loc[:, ['Country Name']].values.reshape(population_data.shape[0])[0:number].tolist() country_populations = np.log10(population_data.loc[:, ['Population, total']].values.reshape(population_data.shape[0])[0:number]).tolist()country_names[country_names.index('Russian Federation')] = 'Russia' country_names[country_names.index('Egypt, Arab Rep.')] = 'Egypt' country_names[country_names.index('Congo, Dem. Rep.')] = 'Dem. Rep. Congo' country_names[country_names.index('Iran, Islamic Rep.')] = 'Iran'country_names[country_names.index('Czech Republic')] = 'Czech Rep.' country_names[country_names.index('Slovak Republic')] = 'Slovakia' country_names[country_names.index('Yemen, Rep.')] = 'Yemen' country_names[country_names.index('Korea, Rep.')] = 'Korea' country_names[country_names.index('Korea, Dem. People’s Rep.')] = 'Dem. Rep. Korea' country_names[country_names.index('Kyrgyz Republic')] = 'Kyrgyzstan' country_names[country_names.index('Bosnia and Herzegovina')] = 'Bosnia and Herz.' country_names[country_names.index('Macedonia, FYR')] = 'Macedonia' country_names[country_names.index('South Sudan')] = 'S. Sudan' country_names[country_names.index('Central African Republic')] = 'Central African Rep.' country_names[country_names.index('Congo, Rep.')] = 'Congo' country_names[country_names.index('Venezuela, RB')] = 'Venezuela' country_names[country_names.index('Dominican Republic')] = 'Dominican Rep.' country_names[country_names.index('Syrian Arab Republic')] = 'Syria' country_names[country_names.index('Equatorial Guinea')] = 'Eq. Guinea' country_names[country_names.index("Cote d'Ivoire")] = "C?te d'Ivoire"data = zip(country_names, country_populations)m = (Map().add('', data, maptype='world', is_map_symbol_show = False).set_series_opts(label_opts=opts.LabelOpts(is_show=False)).set_global_opts(title_opts=opts.TitleOpts(title=''),visualmap_opts=opts.VisualMapOpts(max_=country_populations[0], min_=country_populations[-1])) )m.render('World_improved_v1.html')# -----------------------------第2次改進(jìn)-----------------------------number = -1country_names = population_data.loc[:, ['Country Name']].values.reshape(population_data.shape[0])[0:number].tolist() country_populations = np.log10(population_data.loc[:, ['Population, total']].values.reshape(population_data.shape[0])[0:number]).tolist()country_names[country_names.index('Russian Federation')] = 'Russia' country_names[country_names.index('Egypt, Arab Rep.')] = 'Egypt' country_names[country_names.index('Congo, Dem. Rep.')] = 'Dem. Rep. Congo' country_names[country_names.index('Iran, Islamic Rep.')] = 'Iran'country_names[country_names.index('Czech Republic')] = 'Czech Rep.' country_names[country_names.index('Slovak Republic')] = 'Slovakia' country_names[country_names.index('Yemen, Rep.')] = 'Yemen' country_names[country_names.index('Korea, Rep.')] = 'Korea' country_names[country_names.index('Korea, Dem. People’s Rep.')] = 'Dem. Rep. Korea' country_names[country_names.index('Kyrgyz Republic')] = 'Kyrgyzstan' country_names[country_names.index('Bosnia and Herzegovina')] = 'Bosnia and Herz.' country_names[country_names.index('Macedonia, FYR')] = 'Macedonia' country_names[country_names.index('South Sudan')] = 'S. Sudan' country_names[country_names.index('Central African Republic')] = 'Central African Rep.' country_names[country_names.index('Congo, Rep.')] = 'Congo' country_names[country_names.index('Venezuela, RB')] = 'Venezuela' country_names[country_names.index('Dominican Republic')] = 'Dominican Rep.' country_names[country_names.index('Syrian Arab Republic')] = 'Syria' country_names[country_names.index('Equatorial Guinea')] = 'Eq. Guinea' country_names[country_names.index("Cote d'Ivoire")] = "C?te d'Ivoire"data = zip(country_names, country_populations)m = (Map().add('', data, maptype='world', is_map_symbol_show = False).set_series_opts(label_opts=opts.LabelOpts(is_show=False)).set_global_opts(title_opts=opts.TitleOpts(title=''),visualmap_opts=opts.VisualMapOpts(max_=country_populations[0], min_=country_populations[150])) )m.render('World_improved_v2.html')

代碼B：

import pandas as pd import matplotlib.pyplot as plt import numpy as npplt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = Falseraw_data = pd.read_csv('health.csv') country_codes = pd.read_csv('country code.csv')case = [] for code in raw_data['Country Code'].values:case.extend([code in country_codes.values])population_data = raw_data.loc[case].loc[raw_data['Year'] == 2017,['Country Name', 'Population, total']] population_data = population_data.loc[population_data['Population, total']!=0] population_data.sort_values('Population, total', ascending=False, inplace=True)plt.close('all')# number = 100 # plt.figure(figsize=(15.0, 30.0))number = -1 plt.figure(figsize=(15.0, 40.0))width = 0.8country_names = population_data.loc[:, ['Country Name']].values.reshape(population_data.shape[0])[0:number] country_populations = population_data.loc[:, ['Population, total']].values.reshape(population_data.shape[0])[0:number]country_names = np.append(country_names, 'Other countries') other_population = np.sum(population_data.loc[:, ['Population, total']].values.reshape(population_data.shape[0])[number:-1]) country_populations = np.append(country_populations, other_population)plt.barh(np.flipud(country_names), np.flipud(country_populations), width, color='royalblue')plt.gca().xaxis.set_ticks_position('top') plt.xticks(fontname='Arial', fontsize=12) plt.yticks(fontname='Arial', fontsize=12)plt.title('2017年世界主要國(guó)家的人口數(shù)量') plt.grid(which='major', axis='both', color='black', linestyle='--', alpha=0.2)plt.savefig('img7_1.jpg') plt.show()plt.close('all')# number = 100 # plt.figure(figsize=(15.0, 30.0))number = -1 plt.figure(figsize=(15.0, 40.0))width = 0.8country_names = population_data.loc[:, ['Country Name']].values.reshape(population_data.shape[0])[0:number] country_populations = population_data.loc[:, ['Population, total']].values.reshape(population_data.shape[0])[0:number]country_names = np.append(country_names, 'Other countries') other_population = np.sum(population_data.loc[:, ['Population, total']].values.reshape(population_data.shape[0])[number:-1]) country_populations = np.append(country_populations, other_population) country_populations = np.log10(country_populations)plt.barh(np.flipud(country_names), np.flipud(country_populations), width, color='royalblue')plt.gca().xaxis.set_ticks_position('top') plt.xticks(fontname='Arial', fontsize=12) plt.yticks(fontname='Arial', fontsize=12)plt.title('2017年世界主要國(guó)家的人口數(shù)量級(jí)') plt.grid(which='major', axis='both', color='black', linestyle='--', alpha=0.2)plt.savefig('img7_2.jpg') plt.show()

總結(jié)

以上是生活随笔為你收集整理的数据可视化：世界银行数据（1960-2017）的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： OpenCore黑苹果引导开机声音与图形
下一篇：游侠原创：安全狗“服云”深度评测！

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

数据可视化：世界银行数据（1960-2017）

數(shù)據(jù)可視化：世界銀行數(shù)據(jù)（1960-2017）

1. 二氧化碳排放量

2. 中國(guó)人口

3. 世界各國(guó)人口

總結(jié)