當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

某酒店预定需求分析

發(fā)布時(shí)間：2023/12/10 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了某酒店预定需求分析小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

分析流程

了解數(shù)據(jù)信息分析問(wèn)題作出假設(shè)數(shù)據(jù)清洗-缺失值/異常值處理探索性分析-結(jié)合可視化做出結(jié)論

1. 了解數(shù)據(jù)信息

使用pandas_profiling中的profilereport能夠得到關(guān)于數(shù)據(jù)的概覽；

import pandas_profilingfile_path = "F:/jupyter/kaggle/數(shù)據(jù)集/1、Hotel booking demand酒店預(yù)訂需求\hotel_booking_demand.csv" hb_df = pd.read_csv(file_path)# hb_df.profile_report() pandas_profiling.ProfileReport(hb_df)

簡(jiǎn)單了解各字段分布后，我打算從以下三個(gè)方面分析：

分析方向

酒店運(yùn)營(yíng)情況——取消數(shù)、入住率、人均每晚房間價(jià)格、不同月份人均每晚價(jià)格

旅客情況——來(lái)自國(guó)家、餐食選擇情況、居住時(shí)長(zhǎng)、提前預(yù)定時(shí)長(zhǎng)

預(yù)定渠道情況——不同市場(chǎng)細(xì)分下的價(jià)格

2. 數(shù)據(jù)清洗

2.1 缺失值

hb_df.isnull().sum()[hb_df.isnull().sum()!=0]

確定含缺失值的字段

輸出：

children 4 country 488 agent 16340 company 112593

處理：

處理方法

假設(shè)agent中缺失值代表未指定任何機(jī)構(gòu)，即nan=0

country則直接使用其字段內(nèi)眾數(shù)填充

childred使用其字段內(nèi)眾數(shù)填充

company因缺失數(shù)值過(guò)大，且其信息較雜（單個(gè)值分布太多），所以直接刪除

代碼如下：

hb_new = hb_df.copy(deep=True) hb_new.drop("company", axis=1, inplace=True)hb_new["agent"].fillna(0, inplace=True) hb_new["children"].fillna(hb_new["children"].mode()[0], inplace=True) hb_new["country"].fillna(hb_new["country"].mode()[0], inplace=True)

2.2 異常值

此數(shù)據(jù)集中異常值為那些總?cè)藬?shù)（adults+children+babies)為0的記錄，同時(shí)，因?yàn)橄惹耙阎该癿eal”中“SC”和“Undefined”為同一類別，因此也許處理一下。

代碼如下：

hb_new["children"] = hb_new["children"].astype(int) hb_new["agent"] = hb_new["agent"].astype(int)hb_new["meal"].replace("Undefined", "SC", inplace=True) # 處理異常值 # 將變量 adults + children + babies == 0 的數(shù)據(jù)刪除 zero_guests = list(hb_new["adults"] +hb_new["children"] +hb_new["babies"] == 0) # hb_new.info() hb_new.drop(hb_new.index[zero_guests], inplace=True)

此時(shí)數(shù)據(jù)基本已經(jīng)沒(méi)有問(wèn)題，可以開始結(jié)合可視化的探索性分析了。

3. 探索性分析（+可視化）

3.1 酒店運(yùn)營(yíng)方面

3.1.1 取消數(shù)、入住率

plt.rcParams["font.sans-serif"] = ["SimHei"] plt.rcParams["font.serif"] = ["SimHei"] # 從預(yù)定是否取消考慮 rh_iscancel_count = hb_new[hb_new["hotel"]=="Resort Hotel"].groupby(["is_canceled"])["is_canceled"].count() ch_iscancel_count = hb_new[hb_new["hotel"]=="City Hotel"].groupby(["is_canceled"])["is_canceled"].count()rh_cancel_data = pd.DataFrame({"hotel": "度假酒店","is_canceled": rh_iscancel_count.index,"count": rh_iscancel_count.values})ch_cancel_data = pd.DataFrame({"hotel": "城市酒店","is_canceled": ch_iscancel_count.index,"count": ch_iscancel_count.values}) iscancel_data = pd.concat([rh_cancel_data, ch_cancel_data], ignore_index=True)plt.figure(figsize=(12, 8)) w, t, autotexts = plt.pie(hb_new["hotel"].value_counts(), autopct="%.2f%%",textprops={"fontsize":18}) plt.title("酒店總預(yù)定數(shù)分布", fontsize=16) plt.legend(w, (iscancel_data.loc[iscancel_data.is_canceled==1, "hotel"].value_counts().index)[::-1], loc="upper right",fontsize=14) # plt.savefig("F:/文章/酒店總預(yù)定數(shù)分布.png") plt.show();

此為獲得圖形：

plt.figure(figsize=(12, 8))cmap = plt.get_cmap("tab20c") outer_colors = cmap(np.arange(2)*4) inner_colors = cmap(np.array([1, 2, 5, 6]))w , t, at = plt.pie(hb_new["is_canceled"].value_counts(), autopct="%.2f%%",textprops={"fontsize":18},radius=0.7, wedgeprops=dict(width=0.3), pctdistance=0.75, colors=outer_colors) plt.legend(w, ["未取消預(yù)定", "取消預(yù)定"], loc="upper right", bbox_to_anchor=(0, 0, 0.2, 1), fontsize=12)val_array = np.array((iscancel_data.loc[(iscancel_data.hotel=="城市酒店")&(iscancel_data.is_canceled==0), "count"].values,iscancel_data.loc[(iscancel_data.hotel=="度假酒店")&(iscancel_data.is_canceled==0), "count"].values,iscancel_data.loc[(iscancel_data.hotel=="城市酒店")&(iscancel_data.is_canceled==1), "count"].values,iscancel_data.loc[(iscancel_data.hotel=="度假酒店")&(iscancel_data.is_canceled==1), "count"].values))w2, t2, at2 = plt.pie(val_array, autopct="%.2f%%",textprops={"fontsize":16}, radius=1,wedgeprops=dict(width=0.3), pctdistance=0.85, colors=inner_colors) plt.title("不同酒店預(yù)定情況", fontsize=16)bbox_props = dict(boxstyle="square,pad=0.3", fc="w", ec="k", lw=0.72) kw = dict(arrowprops=dict(arrowstyle="-", color="k"), bbox=bbox_props, zorder=3, va="center") for i, p in enumerate(w2): # print(i, p, sep="---")text = ["城市酒店", "度假酒店", "城市酒店", "度假酒店"]ang = (p.theta2 - p.theta1) / 2. + p.theta1y = np.sin(np.deg2rad(ang))x = np.cos(np.deg2rad(ang))horizontalalignment = {-1: "right", 1: "left"}[int(np.sign(x))]connectionstyle = "angle, angleA=0, angleB={}".format(ang)kw["arrowprops"].update({"connectionstyle": connectionstyle})plt.annotate(text[i], xy=(x, y), xytext=(1.15*np.sign(x), 1.2*y),horizontalalignment=horizontalalignment, **kw, fontsize=18)

可以看到，城市酒店的預(yù)定數(shù)要大于度假酒店，但城市酒店的取消率也相對(duì)較高。

3.1.2 酒店人均價(jià)格

接下來(lái)可以從人均價(jià)格入手，看看兩家酒店的運(yùn)營(yíng)情況。

因?yàn)閎abies年齡過(guò)小，所以人均價(jià)格中未將babies帶入計(jì)算。
$\frac{adr}{adults+children}$
此時(shí)來(lái)查看不同月份下的平均酒店價(jià)格，代碼如下：

# 從月份上看人均平均每晚價(jià)格room_price_monthly = full_data_guests[["hotel", "arrival_date_month", "adr_pp"]].sort_values("arrival_date_month")ordered_months = ["January", "February", "March", "April", "May", "June", "July", "August","September", "October", "November", "December"] month_che = ["一月", "二月", "三月", "四月", "五月", "六月", "七月", "八月", "九月", "十月", "十一月", "十二月", ]for en, che in zip(ordered_months, month_che):room_price_monthly["arrival_date_month"].replace(en, che, inplace=True) room_price_monthly["arrival_date_month"] = pd.Categorical(room_price_monthly["arrival_date_month"],categories=month_che, ordered=True) room_price_monthly["hotel"].replace("City Hotel", "城市酒店", inplace=True) room_price_monthly["hotel"].replace("Resort Hotel", "度假酒店", inplace=True) room_price_monthly.head(15)plt.figure(figsize=(12, 8)) sns.lineplot(x="arrival_date_month", y="adr_pp", hue="hotel", data=room_price_monthly,hue_order=["城市酒店", "度假酒店"], ci="sd", size="hotel", sizes=(2.5, 2.5)) plt.title("不同月份人均居住價(jià)格/晚", fontsize=16) plt.xlabel("月份", fontsize=16) plt.ylabel("人均居住價(jià)格/晚", fontsize=16) # plt.savefig("F:/文章/不同月份人均居住價(jià)格每晚")

輸出圖形：

同時(shí)，我認(rèn)為結(jié)合15-17年內(nèi)平均人流量（即未取消預(yù)定下的不同月份的預(yù)定數(shù)），會(huì)有一個(gè)更加清晰的認(rèn)識(shí)：

注意：因?yàn)榍蟮氖蔷?#xff0c;所以最后繪圖前需要把不同月份的總預(yù)定數(shù)除以此月份的計(jì)數(shù)（15-17年這個(gè)月份出現(xiàn)過(guò)幾次）。

# 查看月度人流量 rh_bookings_monthly = full_data_guests[full_data_guests.hotel=="Resort Hotel"].groupby("arrival_date_month")["hotel"].count() ch_bookings_monthly = full_data_guests[full_data_guests.hotel=="City Hotel"].groupby("arrival_date_month")["hotel"].count()rh_bookings_data = pd.DataFrame({"arrival_date_month": list(rh_bookings_monthly.index),"hotel": "度假酒店","guests": list(rh_bookings_monthly.values)}) ch_bookings_data = pd.DataFrame({"arrival_date_month": list(ch_bookings_monthly.index),"hotel": "城市酒店","guests": list(ch_bookings_monthly.values)}) full_booking_monthly_data = pd.concat([rh_bookings_data, ch_bookings_data], ignore_index=True)ordered_months = ["January", "February", "March", "April", "May", "June", "July", "August","September", "October", "November", "December"] month_che = ["一月", "二月", "三月", "四月", "五月", "六月", "七月", "八月", "九月", "十月", "十一月", "十二月"]for en, che in zip(ordered_months, month_che):full_booking_monthly_data["arrival_date_month"].replace(en, che, inplace=True)full_booking_monthly_data["arrival_date_month"] = pd.Categorical(full_booking_monthly_data["arrival_date_month"],categories=month_che, ordered=True)full_booking_monthly_data.loc[(full_booking_monthly_data["arrival_date_month"]=="七月")|\(full_booking_monthly_data["arrival_date_month"]=="八月"), "guests"] /= 3 full_booking_monthly_data.loc[~((full_booking_monthly_data["arrival_date_month"]=="七月")|\(full_booking_monthly_data["arrival_date_month"]=="八月")), "guests"] /= 2 plt.figure(figsize=(12, 8)) sns.lineplot(x="arrival_date_month",y="guests",hue="hotel", hue_order=["城市酒店", "度假酒店"],data=full_booking_monthly_data, size="hotel", sizes=(2.5, 2.5)) plt.title("不同月份平均旅客數(shù)", fontsize=16) plt.xlabel("月份", fontsize=16) plt.ylabel("旅客數(shù)", fontsize=16) # plt.savefig("F:/文章/不同月份平均旅客數(shù)")

得到圖形：

結(jié)合上述兩幅圖可以了解到：

在春秋兩季城市酒店價(jià)格雖然高，但其入住人數(shù)一點(diǎn)也沒(méi)降低，反而處于旺季；

而度假酒店在6-9月份游客數(shù)本身就偏低，可這個(gè)時(shí)間段內(nèi)的價(jià)格卻在持續(xù)上升，遠(yuǎn)高于其他月份；

不論是城市酒店還是度假酒店，冬季的生意都不是特別好。

3.2 游客簡(jiǎn)易畫像

3.2.1 游客分布

可以簡(jiǎn)單了解一下選擇這兩家酒店入住的旅客都來(lái)自于哪些國(guó)家。

這次使用了plotly中的一個(gè)map圖形，有一定交互性，便于查看。

map = px.choropleth(country_data, locations="country", color="總游客數(shù)", hover_name="country",color_continuous_scale=px.colors.sequential.Plasma,title="游客分布") map.show()

可以明顯看到游客主要還是集中在歐洲地區(qū)。

3.2.2 餐食選擇

現(xiàn)在我們可以了解以下對(duì)餐食的選擇是否會(huì)影響游客取消預(yù)定這一行為。

meal_data = hb_new[["hotel", "is_canceled", "meal"]] # meal_dataplt.figure(figsize=(12, 8)) plt.subplot(121) plt.pie(meal_data.loc[meal_data["is_canceled"]==0, "meal"].value_counts(), labels=meal_data.loc[meal_data["is_canceled"]==0, "meal"].value_counts().index, autopct="%.2f%%") plt.title("未取消預(yù)訂旅客餐食選擇", fontsize=16) plt.legend(loc="upper right")plt.subplot(122) plt.pie(meal_data.loc[meal_data["is_canceled"]==1, "meal"].value_counts(), labels=meal_data.loc[meal_data["is_canceled"]==1, "meal"].value_counts().index, autopct="%.2f%%") plt.title("取消預(yù)訂旅客餐食選擇", fontsize=16) plt.legend(loc="upper right")

很明顯，取消預(yù)訂旅客和未取消預(yù)訂旅客有基本相同的餐食選擇。

我們不能因?yàn)橐晃挥慰蚥ed&breakfast選擇的是就說(shuō)他一定會(huì)取消預(yù)定，我們趕緊不要管他；或者說(shuō)他一定不會(huì)取消預(yù)訂，這位客人很重要。

3.2.3 居住時(shí)長(zhǎng)

那么在不同酒店居住的旅客通常會(huì)選擇住幾天呢？我們可以使用柱形圖來(lái)看一下其時(shí)長(zhǎng)的不同分布；

首先計(jì)算出總時(shí)長(zhǎng)：總時(shí)長(zhǎng)=周末停留夜晚數(shù)+工作日停留夜晚數(shù)

full_data_guests["total_nights"] = full_data_guests["stays_in_weekend_nights"] + full_data_guests["stays_in_week_nights"]

因?yàn)榫幼r(shí)長(zhǎng)獨(dú)立值過(guò)多，所以我新建一個(gè)變量來(lái)將其變?yōu)榉诸愋蛿?shù)據(jù)：

# 新建字段：total_nights_bin——居住時(shí)長(zhǎng)區(qū)間 full_data_guests["total_nights_bin"] = "住1晚" full_data_guests.loc[(full_data_guests["total_nights"]>1)&(full_data_guests["total_nights"]<=5), "total_nights_bin"] = "2-5晚" full_data_guests.loc[(full_data_guests["total_nights"]>5)&(full_data_guests["total_nights"]<=10), "total_nights_bin"] = "6-10晚" full_data_guests.loc[(full_data_guests["total_nights"]>10), "total_nights_bin"] = "11晚以上"

此時(shí)，再來(lái)繪圖：

ch_nights_count = full_data_guests["total_nights_bin"][full_data_guests.hotel=="City Hotel"].value_counts() rh_nights_count = full_data_guests["total_nights_bin"][full_data_guests.hotel=="Resort Hotel"].value_counts()ch_nights_index = full_data_guests["total_nights_bin"][full_data_guests.hotel=="City Hotel"].value_counts().index rh_nights_index = full_data_guests["total_nights_bin"][full_data_guests.hotel=="Resort Hotel"].value_counts().indexch_nights_data = pd.DataFrame({"hotel": "城市酒店","nights": ch_nights_index,"guests": ch_nights_count}) rh_nights_data = pd.DataFrame({"hotel": "度假酒店","nights": rh_nights_index,"guests": rh_nights_count}) # 繪圖數(shù)據(jù) nights_data = pd.concat([ch_nights_data, rh_nights_data], ignore_index=True) order = ["住1晚", "2-5晚", "6-10晚", "11晚以上"] nights_data["nights"] = pd.Categorical(nights_data["nights"], categories=order, ordered=True)plt.figure(figsize=(12, 8)) sns.barplot(x="nights", y="guests", hue="hotel", data=nights_data) plt.title("旅客居住時(shí)長(zhǎng)分布", fontsize=16) plt.xlabel("居住時(shí)長(zhǎng)", fontsize=16) plt.ylabel("旅客數(shù)", fontsize=16)plt.legend()

輸出：

不論哪家游客基本選擇都在1-5晚，而其中度假酒店中的旅客還有另外一種選擇——6-10晚。

3.2.4 提前預(yù)定時(shí)長(zhǎng)

提前預(yù)定期對(duì)旅客是否選擇取消預(yù)訂也有很大影響，因?yàn)閘ead_time字段中的值分布多且散亂，所以使用散點(diǎn)圖比較合適，同時(shí)還可以繪制一條回歸線。

lead_cancel_data = pd.DataFrame(hb_new.groupby("lead_time")["is_canceled"].describe()) # lead_cancel_data # 因?yàn)閘ead_time中值范圍大且數(shù)量分布不勻，所以選取lead_time>10次的數(shù)據(jù)（<10的數(shù)據(jù)不具代表性） lead_cancel_data_10 = lead_cancel_data[lead_cancel_data["count"]>10]y = list(round(lead_cancel_data_10["mean"], 4) * 100)plt.figure(figsize=(12, 8)) sns.regplot(x=list(lead_cancel_data_10.index),y=y) plt.title("提前預(yù)定時(shí)長(zhǎng)對(duì)取消的影響", fontsize=16) plt.xlabel("提前預(yù)定時(shí)長(zhǎng)", fontsize=16) plt.ylabel("取消數(shù) [%]", fontsize=16) # plt.savefig("F:/文章/提前預(yù)定時(shí)長(zhǎng)對(duì)取消的影響")

輸出：

可以明顯看到：不同的提前預(yù)定時(shí)長(zhǎng)確定對(duì)旅客是否取消預(yù)定有一定影響；

通常，離入住日期越早約定，越不容易取消酒店房間預(yù)定。

3.3 市場(chǎng)細(xì)分

最后還可以查看以下不同市場(chǎng)細(xì)分下的預(yù)定分布情況。

segment_count = list(full_data_guests["market_segment"].value_counts()) segment_ls = list(full_data_guests["market_segment"].value_counts().index)# 查看市場(chǎng)細(xì)分分布 plt.figure(figsize=(12, 8)) fig = px.pie(values=segment_count, names=segment_ls, title="市場(chǎng)細(xì)分預(yù)訂分布（未取消預(yù)訂）", template="seaborn") fig.update_traces(rotation=90, textposition="inside", textinfo="label+percent+value")

還可以查看一下，通過(guò)不同的市場(chǎng)預(yù)定兩家酒店的價(jià)格有何不同。

# 不同市場(chǎng)細(xì)分下的人均價(jià)格每晚 plt.figure(figsize=(6, 8)) # plt.subplot(121) sns.barplot(x="market_segment", y="adr_pp",hue="hotel",data=hb_new, ci="sd", errwidth=1, capsize=0.1, hue_order=["City Hotel", "Resort Hotel"]) plt.title("不同市場(chǎng)細(xì)分下人均每晚價(jià)格", fontsize=16) plt.xlabel("市場(chǎng)細(xì)分", fontsize=16) plt.ylabel("人均每晚價(jià)格", fontsize=16) plt.xticks(rotation=45) plt.legend(loc="upper left", facecolor="white", edgecolor="k", frameon=True)

可以看到，人們大多通過(guò)線上旅行社完成預(yù)定，既因?yàn)槠浔憬菪?#xff0c;也因?yàn)槠鋬r(jià)格合理；

而航空公司的價(jià)格則很高，同時(shí)通過(guò)其預(yù)定的人數(shù)也最少。

4. 做出結(jié)論

從以上三個(gè)維度的分析中可以看到：

度假酒店在6-9月份預(yù)定數(shù)偏少時(shí)其價(jià)格較同年的其他月份都要高，建議可以適當(dāng)降低一些；城市酒店在春秋兩季的房間價(jià)格最高，可此時(shí)預(yù)定數(shù)也最多，但在7、8月份預(yù)定數(shù)急速下降，雖然此時(shí)房間價(jià)格也降下來(lái)了，明顯價(jià)格的降低并沒(méi)有吸引住旅客；

離入住日期越近的旅客越有可能入住，同時(shí)，在這兩家酒店的周邊國(guó)家的旅客也更不可能取消預(yù)定；

線上旅行社是人們與家人朋友出行的首選。

上述分析數(shù)據(jù)來(lái)自kaggle——酒店預(yù)定需求；

有更好的角度分析的，歡迎大家來(lái)討論。
（部分圖形代碼參考自https://www.kaggle.com/marcuswingen/eda-of-bookings-and-ml-to-predict-cancelations）
歡迎關(guān)注微信，一起學(xué)習(xí)！

總結(jié)

以上是生活随笔為你收集整理的某酒店预定需求分析的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： shiro学习(8)：shiro连接数据
下一篇：关于python3中的包operator