當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

商务与经济统计（13版，Python）笔记 01-02章

發(fā)布時(shí)間：2023/12/20 python 38 豆豆

生活随笔收集整理的這篇文章主要介紹了商务与经济统计（13版，Python）笔记 01-02章小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

第1章數(shù)據(jù)與統(tǒng)計(jì)資料
- 1.1 統(tǒng)計(jì)學(xué)在商務(wù)經(jīng)濟(jì)中的應(yīng)用
- 1.2 數(shù)據(jù)
- 1.3 數(shù)據(jù)來(lái)源
- 1.4 描述統(tǒng)計(jì)
- 1.5 統(tǒng)計(jì)推斷
- 1.6 邏輯分析方法
- 1.7 大數(shù)據(jù)與數(shù)據(jù)挖掘
- 1.8 計(jì)算機(jī)與統(tǒng)計(jì)分析
- 1.9 統(tǒng)計(jì)實(shí)踐的道德準(zhǔn)則
第2章描述統(tǒng)計(jì)學(xué)1:表格法和圖形法
- 2.1 匯總分類(lèi)變量的數(shù)據(jù)
- - 條形圖及樣例（bar chart）
  - 餅形圖及樣例（pie chart）
- 2.2 匯總數(shù)量變量的數(shù)據(jù)
- - 單變量：打點(diǎn)圖（dot plot）
  - 單變量：直方圖（histogram）
  - 單變量：累積分布（displot）
  - 單變量：莖葉顯示（stem-and-leaf display）
- 2.3 用表格方法匯總兩個(gè)變量的數(shù)據(jù)
- - 交叉分組表（crosstabulation）
- 2.4 用圖形顯示方法匯總兩個(gè)變量的數(shù)據(jù)
- - 散點(diǎn)圖（scatter diagram)和趨勢(shì)線(trendline)
  - 復(fù)合條形圖（side-by-side bar chart)和結(jié)構(gòu)條形圖(stacked chart)
- 2.5 數(shù)據(jù)可視化：創(chuàng)建有效圖形顯示的最佳實(shí)踐
- - 創(chuàng)建有效的圖形顯示

第一次讀本書(shū)的時(shí)候，因?yàn)橛写髮W(xué)課程的基礎(chǔ)，更關(guān)注于技術(shù)性的內(nèi)容和理解，而忽略了看似簡(jiǎn)單的基礎(chǔ)知識(shí)。實(shí)際上這應(yīng)該是入門(mén)新手的通病，總是著眼于實(shí)用性?xún)?nèi)容，而忽略基礎(chǔ)知識(shí)。雖然這樣做有助于維持學(xué)習(xí)興趣，幫助新人堅(jiān)持到入門(mén)，然后在實(shí)踐之中反過(guò)來(lái)學(xué)習(xí)基礎(chǔ)知識(shí)。但是最好在第一次學(xué)習(xí)就能認(rèn)識(shí)到基礎(chǔ)知識(shí)的重要性，并且盡量掌握。最好的辦法就是做習(xí)題。
最初是為了學(xué)習(xí)數(shù)據(jù)分析，然而當(dāng)業(yè)內(nèi)人士說(shuō)數(shù)據(jù)分析最重要的知識(shí)是‘描述統(tǒng)計(jì)學(xué)’，我記憶中卻是將其歸為顯淺知識(shí)，囫圇吞棗。

第1章數(shù)據(jù)與統(tǒng)計(jì)資料

1.1 統(tǒng)計(jì)學(xué)在商務(wù)經(jīng)濟(jì)中的應(yīng)用

會(huì)計(jì)、財(cái)務(wù)、市場(chǎng)營(yíng)銷(xiāo)、生產(chǎn)、經(jīng)濟(jì)、信息系統(tǒng)

1.2 數(shù)據(jù)

數(shù)據(jù)、數(shù)據(jù)集、個(gè)體、變量、觀測(cè)值、分類(lèi)型數(shù)據(jù)、分類(lèi)變量、數(shù)量型數(shù)據(jù)、數(shù)量變量、截面數(shù)據(jù)、時(shí)間序列數(shù)據(jù)
**1.2.2 測(cè)量尺度**
名義尺度、順序尺度、間隔尺度、比率尺度按順序?qū)訉影?br /> 其中，順序尺度加減無(wú)意義，間隔尺度乘除無(wú)意義，只有間隔尺度、比例尺度有計(jì)量單位測(cè)量尺度

1.3 數(shù)據(jù)來(lái)源

來(lái)源有：現(xiàn)有來(lái)源、觀測(cè)性研究、實(shí)驗(yàn)，需要注意：時(shí)間與成本問(wèn)題、數(shù)據(jù)采集誤差

1.4 描述統(tǒng)計(jì)

將數(shù)據(jù)以表格、圖形或數(shù)值形式匯總的統(tǒng)計(jì)方法

1.5 統(tǒng)計(jì)推斷

總體、樣本、普查、抽樣調(diào)查
統(tǒng)計(jì)學(xué)的一個(gè)主要貢獻(xiàn)就是利用樣本數(shù)據(jù)對(duì)總體特征進(jìn)行估計(jì)和假設(shè)檢驗(yàn)，即統(tǒng)計(jì)推斷

1.6 邏輯分析方法

邏輯分析方法包括：
描述性分析對(duì)過(guò)去數(shù)據(jù)的分析、BI、或復(fù)盤(pán)
預(yù)測(cè)性分析預(yù)測(cè)，或指出變量之間的影響
規(guī)范性分析產(chǎn)生一個(gè)最佳行動(dòng)過(guò)程的分析技術(shù)集合，即在實(shí)際條件約束情況下的行動(dòng)指導(dǎo)

1.7 大數(shù)據(jù)與數(shù)據(jù)挖掘

大數(shù)據(jù)容量（volume）、速度（velocity）、種類(lèi)（variety），3V
數(shù)據(jù)挖掘data mining，從龐大的數(shù)據(jù)庫(kù)中自動(dòng)提取預(yù)測(cè)性的信息

1.8 計(jì)算機(jī)與統(tǒng)計(jì)分析

1.9 統(tǒng)計(jì)實(shí)踐的道德準(zhǔn)則

統(tǒng)計(jì)是搜集、分析、表述、和解析數(shù)據(jù)的藝術(shù)和科學(xué)

第2章描述統(tǒng)計(jì)學(xué)1:表格法和圖形法

2.1 匯總分類(lèi)變量的數(shù)據(jù)

頻數(shù)分布、相對(duì)頻數(shù)分布、百分比頻數(shù)分布

條形圖及樣例（bar chart）

條形圖（bar chat）描述：頻數(shù)分布、相對(duì)頻數(shù)分布、百分比頻數(shù)分布，分類(lèi)變量的條形圖，應(yīng)該有一定的間隔
matplotlib.bar（有樣例）基本用法：

from matplotlib import pyplot as plt x,y,x2,y2= [5,8,10] ,[12,16,6],[6,9,11] ,[6,15,7] plt.bar(x, y, align = 'center') plt.bar(x2, y2, color = 'g', align = 'center') plt.title('Bar graph') plt.ylabel('Y axis') plt.xlabel('X axis') plt.show()

極坐標(biāo)條形圖：

import numpy as np import matplotlib.pyplot as plt np.random.seed(19680801) N = 20 theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False) radii = 10 * np.random.rand(N) width = np.pi / 4 * np.random.rand(N) colors = plt.cm.viridis(radii / 10.) ax = plt.subplot(111, projection='polar') ax.bar(theta, radii, width=width, bottom=0.0, color=colors, alpha=0.5) plt.show()

seaborn.barplot（有樣例）就簡(jiǎn)單多了：

ax = sns.barplot(x="day", y="total_bill", hue="sex", data=tips)

餅形圖及樣例（pie chart）

餅形圖（pie chat）描述：相對(duì)頻數(shù)分布、百分比頻數(shù)分布（相對(duì)角度差異，人更能判斷長(zhǎng)度間的差異，所以最好標(biāo)注比例）
matplotlib.pyplot.pie（有樣例），個(gè)人覺(jué)得不錯(cuò)的3各樣例（后附代碼）：

import matplotlib.pyplot as plt labels = 'Frogs', 'Hogs', 'Dogs', 'Logs' sizes = [15, 30, 45, 10] explode = (0, 0.1, 0, 0) # only "explode" the 2nd slice (i.e. 'Hogs') fig1, ax1 = plt.subplots() ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',shadow=True, startangle=90) ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle. plt.show() import numpy as np import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(aspect="equal")) recipe = ["375 g flour","75 g sugar","250 g butter","300 g berries"] data = [float(x.split()[0]) for x in recipe] ingredients = [x.split()[-1] for x in recipe] def func(pct, allvals):absolute = int(pct/100.*np.sum(allvals))return "{:.1f}%\n({:d} g)".format(pct, absolute) wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data),textprops=dict(color="w")) ax.legend(wedges, ingredients,title="Ingredients",loc="center left",bbox_to_anchor=(1, 0, 0.5, 1)) plt.setp(autotexts, size=8, weight="bold") ax.set_title("Matplotlib bakery: A pie") plt.show() fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(aspect="equal")) recipe = ["225 g flour","90 g sugar","1 egg","60 g butter","100 ml milk","1/2 package of yeast"] data = [225, 90, 50, 60, 100, 5] wedges, texts = ax.pie(data, wedgeprops=dict(width=0.5), startangle=-40) bbox_props = dict(boxstyle="square,pad=0.3", fc="w", ec="k", lw=0.72) kw = dict(arrowprops=dict(arrowstyle="-"),bbox=bbox_props, zorder=0, va="center") for i, p in enumerate(wedges):ang = (p.theta2 - p.theta1)/2. + p.theta1y = np.sin(np.deg2rad(ang))x = np.cos(np.deg2rad(ang))horizontalalignment = {-1: "right", 1: "left"}[int(np.sign(x))]connectionstyle = "angle,angleA=0,angleB={}".format(ang)kw["arrowprops"].update({"connectionstyle": connectionstyle})ax.annotate(recipe[i], xy=(x, y), xytext=(1.35*np.sign(x), 1.4*y),horizontalalignment=horizontalalignment, **kw) ax.set_title("Matplotlib bakery: A donut") plt.show()

Pandas 畫(huà)圖一個(gè)函數(shù)應(yīng)該夠用了，參數(shù)詳解

DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, sharex=None, sharey=False, layout=None,figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None,xerr=None,secondary_y=False, sort_columns=False, **kwds)

樣例 Matplotlib examples
樣例 Seaborn Example gallery

2.2 匯總數(shù)量變量的數(shù)據(jù)

組數(shù)、組寬、組限、組中值、相對(duì)頻數(shù)分布、百分比頻數(shù)分布、累積頻數(shù)分布

單變量：打點(diǎn)圖（dot plot）

使用 matplotlib.scatter,seaborn.swarmplot模擬

import numpy as np import matplotlib.pyplot as plt import seaborn as sns import pandas as pdfrom matplotlib.pyplot import MultipleLocator fig,ax=plt.subplots(1,2,figsize=(12,2)) np.random.seed(1900) x=np.random.randint(1,99,size=20) data=pd.DataFrame(x,columns=['x']) data['y']=1 for i in range(len(data)):data['y'].at[i]=data['x'].iloc[:i+1][data['x'].iloc[:i+1]==data['x'].at[i]].count() plt.subplot(121)plt.scatter(data['x'],data['y']) plt.tick_params(axis='both',which='major') #刻度設(shè)置 # y_major_locator=MultipleLocator(1) # x_major_locator=MultipleLocator(10) # ax[0]=plt.gca() # ax[0].xaxis.set_major_locator(y_major_locator) # ax[0].xaxis.set_major_locator(x_major_locator) sns.swarmplot(x="x", y="y",palette=["r", "c", "y"],data=data,ax=ax[1]) plt.show()

單變量：直方圖（histogram）

與條形圖原理一樣，只是數(shù)量型變量進(jìn)行分組，方條之間無(wú)間隔

from matplotlib import pyplot as plt import numpy as np np.random.seed(1900) x=np.random.randint(1,99,size=50) plt.hist(x, bins = [0,20,40,60,80,100]) plt.show()

單變量：累積分布（displot）

累積分布如果使用matplotlib則需要計(jì)算累積量，使用seaborn.displot，一口氣能畫(huà)4張圖Distribution plot options

import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(style="white", palette="muted", color_codes=True) rs = np.random.RandomState(10) f, axes = plt.subplots(2, 2, figsize=(7, 7), sharex=True) sns.despine(left=True) d = rs.normal(size=100) sns.distplot(d, kde=False, color="b", ax=axes[0, 0]) sns.distplot(d, hist=False, rug=True, color="r", ax=axes[0, 1]) sns.distplot(d, hist=False, color="g", kde_kws={"shade": True}, ax=axes[1, 0]) sns.distplot(d, color="m", ax=axes[1, 1]) plt.setp(axes, yticks=[]) plt.tight_layout()

單變量：莖葉顯示（stem-and-leaf display）

暫時(shí)沒(méi)找到莖葉圖的庫(kù)，手動(dòng)實(shí)現(xiàn)

0 | 6 9 8 4
1 | 6 3 7 3 6 1 2
2 | 5 5 9 2
3 | 2 8 0 4
4 | 9 9
5 | 1 5 2 4 9 8 6
6 | 3 6 2
7 | 3 2 1 2
8 | 9 4 1 3 0 7 7 1 9 3 1
9 | 6 2 7 8

import numpy as np np.random.seed(2019) data=np.random.randint(1,99,size=50) _stem=[] for x in data:_stem.append(x//10)stem=list(set(_stem)) for m in stem:leaf=[]leaf.append(m)for n in data:if n//10==m:leaf.append(n%10)print(leaf[0],'|',end=' ')for i in range(1,len(leaf)):print(leaf[i],end=' ')print('\n')

2.3 用表格方法匯總兩個(gè)變量的數(shù)據(jù)

辛普森悖論：依據(jù)綜合和未綜合的數(shù)據(jù)得到相反的結(jié)論。（原因是未綜合的變量，本身權(quán)重不等）

交叉分組表（crosstabulation）

使用pandas.corsstab模擬了一下書(shū)上的表格:

import numpy as np import pandas as pd np.random.seed(900) y=np.random.randint(0,3,size=300) z=np.random.randint(11,49,size=300) data=pd.DataFrame({'質(zhì)量等級(jí)':y,'餐價(jià)':z}) data['質(zhì)量等級(jí)'].replace({0:'好',1:'很好',2:'優(yōu)秀'},inplace=True) bins=[10,19,29,39,49] quartiles = pd.cut(data['餐價(jià)'], bins,labels=['10~19','20~29','30~39','40~49']) data['餐價(jià)']=quartiles pd.crosstab(data['質(zhì)量等級(jí)'],data['餐價(jià)'],margins=True,margins_name='總計(jì)')

2.4 用圖形顯示方法匯總兩個(gè)變量的數(shù)據(jù)

散點(diǎn)圖（scatter diagram)和趨勢(shì)線(trendline)

帥氣的散點(diǎn)圖（matplotlib中，趨勢(shì)線要用numpy.ployfit函數(shù)）：

import matplotlib.pyplot as plt import numpy as np np.random.seed(19680801) x = np.arange(0.0, 50.0, 2.0) y = x ** 1.3 + np.random.rand(*x.shape) * 30.0 s = np.random.rand(*x.shape) * 800 + 500 colors = np.random.rand(*x.shape) plt.figure(figsize=(12,6)) plt.scatter(x, y, s, c=colors,alpha=0.5, marker=r'$\clubsuit$',label="Luck") p1 = np.poly1d(np.polyfit(x, y, 1)) l1=plt.plot(x,p1(x),'r--',label='trendline') plt.xlabel("Leprechauns") plt.ylabel("Gold") plt.legend(loc='upper left') plt.show()

使用seaborn庫(kù)則可以更加絢麗（sns.jointplot太占位置了，沒(méi)畫(huà)）：

import seaborn as sns; sns.set() import matplotlib.pyplot as plt fig,axes=plt.subplots(2,2,figsize=(12,6)) tips = sns.load_dataset("tips") cmap = sns.cubehelix_palette(dark=.3, light=.8, as_cmap=True) sns.scatterplot(x="total_bill", y="tip",hue="time", data=tips,ax=axes[0,0]) sns.residplot(x="total_bill", y="tip", data=tips,ax=axes[0,1]) sns.regplot(x="size", y="total_bill", data=tips, x_jitter=.1,ax=axes[1,1]) sns.lmplot(x="size", y="total_bill", hue="day", col="day",data=tips, height=6, aspect=.4, x_jitter=.1) #sns.jointplot("total_bill", "tip", data=tips, kind="reg", # xlim=(0, 60), ylim=(0, 12), color="m", height=7)

復(fù)合條形圖（side-by-side bar chart)和結(jié)構(gòu)條形圖(stacked chart)

matplotlib做這種復(fù)合圖，有點(diǎn)復(fù)雜，附上鏈接
Stacked Bar Graph
Grouped bar chart with labels
Discrete distribution as horizontal bar chart
首先使用，pandas畫(huà)圖，還是2.3模擬表格的數(shù)字，這次用groupby聚合，然后增加匯總，轉(zhuǎn)置

import numpy as np import pandas as pd import matplotlib.pyplot as plt pd.set_option('precision',1)#設(shè)置小數(shù)位 np.random.seed(900) y=np.random.randint(0,3,size=300) z=np.random.randint(11,49,size=300) data=pd.DataFrame({'質(zhì)量等級(jí)':y,'餐價(jià)':z}) data['質(zhì)量等級(jí)'].replace({0:'好',1:'很好',2:'優(yōu)秀'},inplace=True) bins=[10,19,29,39,49] quartiles = pd.cut(data['餐價(jià)'], bins,labels=['10~19','20~29','30~39','40~49']) df=data.groupby(['質(zhì)量等級(jí)',quartiles]).count().unstack() df=df.apply(lambda x: x/x.sum()*100) df.loc['總計(jì)'] = df.apply(lambda x: x.sum())#總計(jì)，作圖時(shí)候不需要 df.T.plot(kind='bar',stacked=True)

分組的條形圖，seaborn庫(kù)寫(xiě)得少，圖多：

import matplotlib.pyplot as plt import seaborn as sns sns.set(style="darkgrid") fig,(ax1,ax2)=plt.subplots(1,2,figsize=(12,6)) tips = sns.load_dataset("tips") sns.countplot(y="day", hue="sex", data=tips,ax=ax1) sns.barplot(x="day", y="total_bill", data=tips,ax=ax2) sns.catplot(x="sex", y="total_bill",hue="smoker", col="time",data=tips, kind="bar",height=4, aspect=.7) g = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True) bins = np.linspace(0, 60, 13) g.map(plt.hist, "total_bill", color="steelblue", bins=bins)

結(jié)構(gòu)條形圖：

import seaborn as sns import matplotlib.pyplot as plt sns.set(style="whitegrid") f, ax = plt.subplots(figsize=(15, 6)) crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False) sns.set_color_codes("pastel") sns.barplot(y="total", x="abbrev", data=crashes,label="Total", color="b") sns.set_color_codes("muted") sns.barplot(y="alcohol", x="abbrev", data=crashes,label="Alcohol-involved", color="b") ax.legend(ncol=2, loc="lower right", frameon=True) ax.set(xlim=(0, 24), ylabel="",xlabel="Automobile collisions per billion miles") sns.despine(left=True, bottom=True)

2.5 數(shù)據(jù)可視化：創(chuàng)建有效圖形顯示的最佳實(shí)踐

創(chuàng)建有效的圖形顯示

1、給予圖形顯示一個(gè)清晰、簡(jiǎn)明的標(biāo)題。
2、使圖形顯示保持簡(jiǎn)潔，當(dāng)能用二維表示時(shí)不要用三維表示。
3、每個(gè)坐標(biāo)有清楚的標(biāo)記，并給出測(cè)量單位。
4、如果使用顏色來(lái)區(qū)分類(lèi)別，要確保顏色是不同的。
5、如果使用多種顏色或線型，用圖例來(lái)標(biāo)明時(shí)，要將圖例靠近所表示的數(shù)據(jù)。

總結(jié)

以上是生活随笔為你收集整理的商务与经济统计（13版，Python）笔记 01-02章的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。