火种 ctf_分析我的火种数据
火種 ctf
Originally published at https://www.linkedin.com on March 27, 2020 (data up to date as of March 20, 2020).
最初于 2020年3月27日 在 https://www.linkedin.com 上 發(fā)布 (數(shù)據(jù)截至2020年3月20日)。
Day 3 of social distancing.
社會(huì)疏離的第三天。
As I sit on my couch scrolling through my Instagram feed to see yet another drawing of an orange — apparently the latest Instagram challenge to pass the time — I was starting to get… bored. Who would’ve thought that an INTP like myself would succumb to boredom in day 3 of social distancing?
當(dāng)我坐在沙發(fā)上滾動(dòng)查看我的Instagram提要時(shí),看到另一幅橙色的圖畫(顯然是Instagram最新挑戰(zhàn)),我開(kāi)始感到…… 無(wú)聊 。 誰(shuí)會(huì)想到像我這樣的INTP會(huì)在社交疏遠(yuǎn)的第三天屈服于無(wú)聊?
With no cool restaurants to explore, no plans with friends to hang out, no gyms to go to anytime soon — why not start a project to pass the time?
沒(méi)有很酷的餐廳可供探索,沒(méi)有與朋友閑逛的計(jì)劃,沒(méi)有健身房可供短期使用-為什么不啟動(dòng)一個(gè)打發(fā)時(shí)間的項(xiàng)目?
But, what project? I know I wanted to do something that would allow me to gain insight on some aspect of my life, and what’s more relevant to a 20 year old’s life than dating? In today’s ultra-digital world, dating has become synonymous with Tinder. I mean, how else are we supposed to meet and connect with people nowadays? Through physical and social communities like friends, mutuals, school, and work as has literally been the case for hundreds of generations prior? Nope, that’s crazy.
但是,什么項(xiàng)目? 我知道我想做些能讓我對(duì)生活的某些方面有深入了解的事情,與20歲的生活比約會(huì)更重要的是什么? 在當(dāng)今的超數(shù)字世界中,約會(huì)已成為Tinder的代名詞。 我的意思是,我們今天應(yīng)該如何與其他人見(jiàn)面并建立聯(lián)系? 通過(guò)像朋友,互助,學(xué)校和工作這樣的物質(zhì)和社會(huì)社區(qū),幾百代人以前確實(shí)是這樣? 不,那太瘋狂了 。
Tinder allows us to connect with people within our communities that we would never have met otherwise — for better or for worse. And as with many social media apps, Tinder allows you to request your own personal data.
Tinder使我們能夠與社區(qū)中的人們保持聯(lián)系,否則我們將再也見(jiàn)不到,無(wú)論好壞。 與許多社交媒體應(yīng)用程序一樣,Tinder允許您請(qǐng)求自己的個(gè)人數(shù)據(jù)。
And so I did.
所以我做到了。
火種數(shù)據(jù) (Tinder Data)
The requested Tinder data was in JSON format, and follows this simplified structure:
請(qǐng)求的Tinder數(shù)據(jù)為JSON格式,并遵循以下簡(jiǎn)化結(jié)構(gòu):
Tinder Data Structure火種數(shù)據(jù)結(jié)構(gòu)Reading in the data into Python with the following script:
使用以下腳本將數(shù)據(jù)讀入Python:
import json with open('data.json') as json_file:data ?= json.load(json_file)
Now, the first problem was taking this data structure and converting it to one that I could easily traverse through to analyze. Because Usage is simply a count aggregated daily, it was natural to convert this into a standard tabular data structure with rows as dates and columns as the aforementioned features.
現(xiàn)在,第一個(gè)問(wèn)題是采用這種數(shù)據(jù)結(jié)構(gòu)并將其轉(zhuǎn)換為我可以輕松遍歷以進(jìn)行分析的結(jié)構(gòu)。 由于“用法”只是每天匯總的計(jì)數(shù),因此很自然地將其轉(zhuǎn)換為標(biāo)準(zhǔn)的表格數(shù)據(jù)結(jié)構(gòu),其中行作為日期,列作為上述功能。
import pandas as pddf = pd.DataFrame(list(data['Usage']['app_opens'].keys())) for x in list(data['Usage'].keys()):
df[x]= list(data['Usage'][x].values())
Here’s the first five rows of the data frame — aka my first 5 days on Tinder:
這是數(shù)據(jù)框的前五行,也就是我在Tinder上的前五天:
My first 5 days on Tinder我在Tinder上的前5天With the Messages, however, I wanted to explore other alternatives. Since an individual message can be viewed as an object with attributes text and sent date, I defined a Message Class/Object, and stored these in a dictionary where the key indicated the unique match ID.
但是,對(duì)于“消息”,我想探索其他替代方法。 由于可以將一條單獨(dú)的消息視為具有屬性文本和發(fā)送日期的對(duì)象,因此我定義了一個(gè)消息類別/對(duì)象,并將它們存儲(chǔ)在字典中,其中的鍵表示唯一的匹配ID。
class Message:'''Fields: Text (Str)
Date (Datetime)'''
def __init__(self, text, date):
self.text = text
self.date = date
def __repr__(self):
message_rep = "{}: {}"
return message_rep.format(self.date, self.text)
message_dict={}
for x in data['Messages']:
match_id=x['match_id'].split()[-1]
sent = []
for messages in x['messages']:
sent_date = " ".join(messages['sent_date'].split()[0:-1])[:-3]
sent.append(Message(messages['message'].lower(),sent_date))
message_dict[match_id]=sent
Now, we need more Python to parse through the messages to derive basic insights. Here’s an excerpt containing the basic idea:
現(xiàn)在,我們需要更多的Python來(lái)解析消息以得出基本見(jiàn)解。 以下是包含基本思想的摘錄:
day_count, time_count, emoji_count = {}, {}, {}day_time, date_count, word_count = {}, {}, {}
for matches in message_dict:
messages = message_dict[matches]
for msg in messages:
date=msg.date.split(" ")
day=date[0][:-1]
time = date[-1][:2]+':00'
dt="-".join(date[1:4])
words=msg.text.split(" ")
check_lst = [[day, day_count], [time, time_count],
[day_time, day_time],[dt, date_count],
[words, word_count]]
i=0
while i < 4:
x=check_lst[i]
key=x[0]
dictionary = x[1]
if key not in dictionary.keys():
dictionary[key]=1
i=i+1
else:
dictionary[key]=dictionary[key]+1
i=i+1
for x in words:
t = str.maketrans(dict.fromkeys(string.punctuation))
x = x.translate(t)
stripped = list(x)
for char in stripped:
if char in emojis:
if char not in emoji_count.keys():
emoji_count[char]=1
else:
emoji_count[char]=emoji_count[char]+1
分析與見(jiàn)解 (Analysis & Insights)
Quick stats as of March 20th 2020:
截至2020年3月20日的快速統(tǒng)計(jì)數(shù)據(jù):
- 10,083 total app opens 共有10,083個(gè)應(yīng)用打開(kāi)
- Swiped right on 3,331 profiles, with a daily max of 92 on January 4 2019 在3,331個(gè)配置文件上向右滑動(dòng),2019年1月4日每天最多92個(gè)
- Swiped left on 38,132 profiles, with daily max of 2,145 profiles on January 4 2019 向左滑動(dòng)38,132個(gè)配置文件,2019年1月4日每天最多2,145個(gè)配置文件
- 349 matches, with daily max of 12 matches on March 18 2020 349場(chǎng)比賽,2020年3月18日每天最多12場(chǎng)比賽
- 1,164 total messages sent 共發(fā)送1,164條消息
- 1,289 total messages received 共收到1,289條消息
- 125 unique conversations 125個(gè)獨(dú)特的對(duì)話
- 32 social media/number exchanges 32個(gè)社交媒體/號(hào)碼交換
- 16 meet ups 16個(gè)聚會(huì)
- countless dollars spent on bubble tea 花在泡沫茶上的錢不計(jì)其數(shù)
Traversing through my sent messages, we get the following word cluster:
遍歷我發(fā)送的消息,我們得到以下單詞簇:
My word cloud of sent messages我發(fā)送郵件的詞云Looking at my top words:
看我的熱門話:
“damn you’re cute wanna grab bubbletea ? haha”
“該死的你很可愛(ài),想去買泡泡茶嗎? 哈哈”
Interesting. Seems fairly normal in the context of Tinder. Now, I’m curious as to why statistics is one of my top words…
有趣。 在Tinder的上下文中似乎很正常。 現(xiàn)在,我很好奇為什么統(tǒng)計(jì)是我的熱門詞匯之一……
Now, among the messages sent, about 4% of these were emojis. Evidently, emojis are well integrated into digital messaging. Here are my top 5 sent emojis:
現(xiàn)在,在發(fā)送的消息中,其中約4%是表情符號(hào)。 顯然,表情符號(hào)已很好地集成到數(shù)字消息中。 這是我發(fā)送的前5個(gè)表情符號(hào):
Moreover, data indicates that 15% of my sent messages had only 6 words — with 38% of my sent messages falling between the 5–7 word count range.
此外,數(shù)據(jù)表明,我發(fā)送的郵件中有15%僅包含6個(gè)單詞-我發(fā)送的郵件中有38%位于5-7個(gè)字?jǐn)?shù)范圍內(nèi)。
Length of messages sent郵件長(zhǎng)度Looking at the distribution of conversation length measured in days, we see a left-skewed distribution — with 67% of conversations having tenure of less than one day.
查看以天為單位的會(huì)話長(zhǎng)度分布,我們看到一個(gè)左偏分布-67%的會(huì)話的任期少于一天。
Conversation tenure in days會(huì)話天數(shù)Among these single day conversations, a majority of them are dead-end: in other words, no messages were sent after my initial recorded message.
在這些單日對(duì)話中,大多數(shù)對(duì)話都是死胡同:換句話說(shuō),在我最初記錄的消息之后沒(méi)有發(fā)送任何消息。
Response rate of single day conversations單日對(duì)話的回應(yīng)率Now, before hammering down on my one-liners, there is a slight caveat: because I only have data on my sent messages, I used my first and last message within a match as a proxy for conversation length. As such, it is unclear which participant actually ended the conversation. So these ‘no responses’ could have been messages that I didn’t follow up on.
現(xiàn)在,在敲定單行代碼之前,有一點(diǎn)警告:由于我的發(fā)送消息中只有數(shù)據(jù),因此我將比賽中的第一條和最后一條消息用作對(duì)話長(zhǎng)度的代理。 因此,不清楚哪個(gè)參與者實(shí)際結(jié)束了對(duì)話。 因此,這些“沒(méi)有回應(yīng)”可能是我沒(méi)有跟進(jìn)的消息。
In fact, looking at the count of messages sent versus received indicates that my messages are generally answered — at least when aggregating on the monthly level. So maybe my one liners are somewhat effective — sureeee.
實(shí)際上,查看已發(fā)送消息與已接收消息的數(shù)量表明,我的消息通常得到答復(fù)-至少在按月匯總時(shí)會(huì)得到答復(fù)。 因此,也許我的一支班輪比較有效- 保證人 。
Monthly messages每月留言When are these messages actually sent out?
這些消息何時(shí)真正發(fā)出?
Message activity訊息活動(dòng)Data indicates that peak messaging time occurs at 9 pm.
數(shù)據(jù)表明高峰消息傳遞時(shí)間發(fā)生在晚上9點(diǎn)。
Cool — but these insights are only applicable once a match has actually occurred. We all know that 90% of Tinder consists of swiping.
很酷-但這些見(jiàn)解僅在實(shí)際發(fā)生匹配后才適用。 我們都知道90%的Tinder是刷卡。
Monthly swiping activity每月刷卡活動(dòng)It’s interesting to see that 18% of total swipes were done in my first month of Tinder.
有趣的是,在Tinder的第一個(gè)月中,刷卡總數(shù)就達(dá)到了18%。
Defining match rate as the proportion of matches to swipe rights, we see that my match rate generally hovers at around 12.5% — with the highest match rate of 45% in March 2019 despite its low matches.
將匹配率定義為匹配權(quán)與滑動(dòng)權(quán)的比例,我們看到我的匹配率通常徘徊在12.5%左右,盡管匹配率較低,但2019年3月的最高匹配率為45%。
Monthly matches每月比賽Assuming independence in swipes and holding the probability of a match fixed, we can think of each swipe right as a Bernoulli trial — where a successful outcome is a match.
假設(shè)刷卡獨(dú)立,并且將比賽的可能性固定不變,那么我們可以將每次刷卡都視為一次伯努利試驗(yàn)-成功的結(jié)果就是一場(chǎng)比賽。
Mathematically, we have a random variable, Y, that follows a binomial distribution:
在數(shù)學(xué)上,我們有一個(gè)隨機(jī)變量Y,它遵循二項(xiàng)式分布:
Or in our context:
或在我們的上下文中:
Given my Tinder data and assuming a fixed probability of success (p), the maximum likelihood estimate of the parameter p is simply the estimated match rate.
給定我的Tinder數(shù)據(jù),并假設(shè)成功的概率為固定值(p),則參數(shù)p的最大似然估計(jì)值就是估計(jì)的匹配率。
Holding the number of my received swipe rights constant, we can construct the following cumulative binomial probability distributions:
在我收到的刷卡權(quán)利數(shù)量不變的情況下,我們可以構(gòu)建以下累積二項(xiàng)式概率分布:
Cumulative probability of at least one match as a function of swipe rights至少一項(xiàng)匹配的累積概率作為滑動(dòng)權(quán)限的函數(shù)The figure above shows the probability of at least one match given a fixed probability of success, p. We can see that the probability of at least one match increases with the number of swipe rights. In other words, a match is inevitable as you swipe right — this is, of course, holding the number of received swipe rights constant. This resulting convergence is a consequence of the Law of Large Numbers.
上圖顯示了在給定固定成功概率p的情況下至少一場(chǎng)比賽的概率。 我們可以看到,至少一項(xiàng)匹配的可能性隨刷卡權(quán)限的數(shù)量而增加。 換句話說(shuō),當(dāng)您向右滑動(dòng)時(shí),匹配是不可避免的-當(dāng)然,這將使接收到的滑動(dòng)權(quán)限的數(shù)量保持恒定。 最終的收斂是大數(shù)定律的結(jié)果。
Given my current swiping behaviour (p=0.10), it would take at least 30 swipes to get at least one match — emphasis on at least: meaning the number of matches could range from 1 to the number of swipe rights inclusive.
考慮到我目前的滑動(dòng)行為(p = 0.10),至少需要進(jìn)行30次滑動(dòng)才能獲得至少一場(chǎng)比賽- 至少要強(qiáng)調(diào):意味著比賽次數(shù)的范圍可以從1到包括滑動(dòng)次數(shù)在內(nèi)。
Swiping behaviour刷卡行為Holding the number of my received swipe rights constant, a quick way to increase the probability of at least one match is to increase the number of swipe rights given. However, more doesn’t necessarily mean better: the trade-off between quality and quantity is more nuanced, so I’ll leave it at that.
保持我收到的刷卡權(quán)利數(shù)量不變,一種增加至少一場(chǎng)比賽的可能性的快速方法是增加所給定的刷卡權(quán)利數(shù)量。 但是,更多并不一定意味著更好:質(zhì)量和數(shù)量之間的權(quán)衡更加細(xì)微,因此我將保留它。
A natural question that follows is how many of these matches actually lead to coffee or bubble tea? Data indicates a 12.8% conversion rate among my engaged matches. A 95% confidence interval estimate indicates a lower bound of 7% and an upper bound of 19% — the 6% margin of error could be telling of external factors, such as proximity, that could affect one’s interest to meet up.
隨之而來(lái)的自然問(wèn)題是,這些匹配中有多少實(shí)際上產(chǎn)生了咖啡或泡泡茶? 數(shù)據(jù)顯示我參與的比賽中的轉(zhuǎn)化率為12.8%。 95%的置信區(qū)間估計(jì)值指示下限為7%,上限為19%-6%的誤差幅度可能表示外界因素(例如接近程度)可能會(huì)影響一個(gè)人滿足興趣的外部因素。
Now, assuming independence among engaged matches and that each person is equally open to meet up, we can think of this as yet again another Bernoulli trial — where a successful outcome is a meet up.
現(xiàn)在,假設(shè)參與比賽的獨(dú)立性,并且每個(gè)人都同樣愿意聚會(huì),我們可以將其視為伯努利的又一次審判-成功的結(jié)局就是聚會(huì)。
Given my Tinder data and assuming a fixed probability of success (p), the maximum likelihood estimate of the parameter p is simply the estimated conversion rate.
給定我的Tinder數(shù)據(jù)并假設(shè)成功的概率為固定值(p),則參數(shù)p的最大似然估計(jì)值就是估計(jì)的轉(zhuǎn)換率。
With these assumptions, we can make inferences on future outcomes such as calculating the probability of getting x number of meet ups — in other words, Prob(meet up = x | p = 0.128).
有了這些假設(shè),我們就可以推斷出未來(lái)的結(jié)果,例如計(jì)算獲得x次見(jiàn)面的概率—換句話說(shuō),Prob(見(jiàn)面= x | p = 0.128)。
Pretty cool.
很酷
This is especially useful when it comes to allocating budgets for dates. Personally, first dates for me are around the $10 — $20 ball park — though, the variance on that is somewhat high. Assuming that I allocate $35 per month on dates and each date is $15, we can run simulations with 100 engaged matches over 6 months:
在分配日期預(yù)算時(shí),這特別有用。 就我個(gè)人而言,第一次約會(huì)大約是10美元(20美元球場(chǎng)),但是,這方面的差異有些大。 假設(shè)我每月在日期上分配$ 35,每個(gè)日期為$ 15,我們可以在6個(gè)月內(nèi)進(jìn)行100次參與式比賽的模擬:
Simulation expenses over 6 months6個(gè)月的模擬費(fèi)用Since the number of engaged matches is large (n=100), the binomial distribution can be approximated by a Gaussian probability density. This resulting convergence in distribution is a consequence of the Central Limit Theorem.
由于參與比賽的數(shù)量很大(n = 100),因此可以通過(guò)高斯概率密度來(lái)近似二項(xiàng)式分布。 分布的最終收斂是中央極限定理的結(jié)果。
The probability of a deficit can be calculated as the area under the curve to the left of the red dotted line. Hence we can calculate this probability easily using the Gaussian approximation:
赤字的概率可以計(jì)算為紅色虛線左側(cè)曲線下方的面積。 因此,我們可以使用高斯近似輕松地計(jì)算該概率:
Since the budget remaining is a linear function of meet ups — which we estimate through a Gaussian random variable — then, the budget remaining also follows a Gaussian distribution:
由于剩余預(yù)算是滿足率的線性函數(shù)(我們通過(guò)高斯隨機(jī)變量估算),因此,剩余預(yù)算也遵循高斯分布:
With these assumptions, the probability of a deficit is 0.3594. Yikes — this is somewhat concerning given that I’m on a student budget.
根據(jù)這些假設(shè),出現(xiàn)赤字的概率為0.3594。 Yikes-考慮到我的學(xué)生預(yù)算有限,這有點(diǎn)令人擔(dān)憂。
So, it’s probably not financially viable to message 100 matches over 6 months given my current conversion rate. To stay on budget, I either: decrease the number of messaged matches or decrease my conversion rate. Hmm, tough call — I’d have to go with the former.
因此,考慮到我目前的轉(zhuǎn)化率,在6個(gè)月內(nèi)發(fā)送100個(gè)匹配消息可能在財(cái)務(wù)上不可行。 為了節(jié)省預(yù)算,我要么:減少信息匹配的次數(shù),要么降低轉(zhuǎn)化率。 嗯,艱難的舉動(dòng)-我不得不跟前一個(gè)去。
Tweaking the parameters in the binomial simulation we get the following results:
在二項(xiàng)式仿真中調(diào)整參數(shù)可獲得以下結(jié)果:
Tweaking parameters in budget simulation預(yù)算模擬中的調(diào)整參數(shù)Now, the probability of going over budget when I reduce my engagement to 75 matches is 0.06 (green density) — much better. Having said that, I also don’t want to end up with a big surplus since that would imply little to no dates (yellow). Hence, I should engage with 75 to 85 matches over the course of 6 months to fully utilize my budget.
現(xiàn)在,當(dāng)我將參與度降低到75場(chǎng)比賽時(shí),超出預(yù)算的可能性為0.06(綠色密度),好得多。 話雖如此,我也不想結(jié)余很多,因?yàn)槟且馕吨苌偕踔翛](méi)有約會(huì)(黃色)。 因此,我應(yīng)該在6個(gè)月的時(shí)間內(nèi)進(jìn)行75到85場(chǎng)比賽,以充分利用我的預(yù)算。
Cool. Now I have some new insights about my current Tinder behaviour — however, by no means is this analysis exhaustive. If you happen to have Python installed and have your own personal Tinder data — or if you just want to look at the back-end logic of the Python functions used in this analysis — feel free to check out the code that I wrote for this project:
涼。 現(xiàn)在,我對(duì)當(dāng)前的Tinder行為有了一些新見(jiàn)解-但是,該分析絕不是詳盡無(wú)遺的。 如果您恰好安裝了Python并擁有自己的個(gè)人Tinder數(shù)據(jù)-或僅想查看此分析中使用的Python函數(shù)的后端邏輯-請(qǐng)隨時(shí)查看我為該項(xiàng)目編寫的代碼:
https://github.com/dionbanno/dion_creates
https://github.com/dionbanno/dion_creates
對(duì)進(jìn)一步項(xiàng)目的建議: (Recommendations for further projects:)
- Can we perform A/B testing on certain key words and phrases to see if they increase the probability of a response/meet up? 我們可以對(duì)某些關(guān)鍵詞和短語(yǔ)進(jìn)行A / B測(cè)試,以查看它們是否增加了回應(yīng)/見(jiàn)面的可能性?
- It would be cool to have a repository of individual Tinder data classified per user attribute such as location, gender, age, etc. and doing a regression analysis to see if certain user attributes affect success 擁有按用戶屬性(例如位置,性別,年齡等)分類的單個(gè)Tinder數(shù)據(jù)存儲(chǔ)庫(kù),并進(jìn)行回歸分析以查看某些用戶屬性是否影響成功,這將很酷。
最后的話 (Final words)
Data from this analysis indicates that 64% of matches go un-messaged. So shoot your shot. Go ignite those matches — who knows? It might be worth while.
來(lái)自該分析的數(shù)據(jù)表明,有64%的匹配未發(fā)送消息。 因此,射擊。 點(diǎn)燃那些比賽-誰(shuí)知道? 也許值得。
Feel free to leave your comments, and connect with me on LinkedIn. I’d also be curious to know — what metrics would you have chosen to analyze, and how?
隨時(shí)發(fā)表您的評(píng)論,并在LinkedIn上與我聯(lián)系。 我也很想知道-您將選擇分析哪些指標(biāo),以及如何選擇?
翻譯自: https://medium.com/swlh/analyzing-my-tinder-data-3b4f05a4a34f
火種 ctf
總結(jié)
以上是生活随笔為你收集整理的火种 ctf_分析我的火种数据的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 梦到了好多钱什么意思
- 下一篇: 分析citibike数据eda