日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 综合教程 >内容正文

综合教程

WeRateDog---分析推特数据

發布時間:2023/12/3 综合教程 28 生活家
生活随笔 收集整理的這篇文章主要介紹了 WeRateDog---分析推特数据 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

數據收集

導入需要的庫

In?[60]:

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import requests
import json
import os

打開并評估twitter-archive-enhanced

In?[61]:twitter_archive_enhanced = pd.read_csv('twitter-archive-enhanced.csv')

In?[62]:twitter_archive_enhanced.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):#   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  0   tweet_id                    2356 non-null   int64  1   in_reply_to_status_id       78 non-null     float642   in_reply_to_user_id         78 non-null     float643   timestamp                   2356 non-null   object 4   source                      2356 non-null   object 5   text                        2356 non-null   object 6   retweeted_status_id         181 non-null    float647   retweeted_status_user_id    181 non-null    float648   retweeted_status_timestamp  181 non-null    object 9   expanded_urls               2297 non-null   object 10  rating_numerator            2356 non-null   int64  11  rating_denominator          2356 non-null   int64  12  name                        2356 non-null   object 13  doggo                       2356 non-null   object 14  floofer                     2356 non-null   object 15  pupper                      2356 non-null   object 16  puppo                       2356 non-null   object 
dtypes: float64(4), int64(3), object(10)
memory usage: 313.0+ KB

通過上面的info,可以看出tweet_id, timestamp 類型錯誤,in_reply_to_status_id,in_reply_to_user_id 僅有78列,expanded_urls 含有空值,是沒有照片的數據, 根據項目要求,這些數據后面需要刪除

In?[63]:twitter_archive_enhanced.retweeted_status_id.notnull().value_counts()

Out[63]:

False    2175
True      181
Name: retweeted_status_id, dtype: int64

retweeted_status_id 不為nan的為轉發數據,181條轉發數據,根據項目要求,這些數據后面需要刪除

In?[64]:twitter_archive_enhanced.name.value_counts()

Out[64]:

None        745
a            55
Charlie      12
Oliver       11
Lucy         11... 
Karll         1
Tiger         1
old           1
Meatball      1
Stormy        1
Name: name, Length: 957, dtype: int64

In?[65]:twitter_archive_enhanced.text[twitter_archive_enhanced.name=='a'].iloc[1]

Out[65]:

'Here is a perfect example of someone who has their priorities in order. 13/10 for both owner and Forrest https://t.co/LRyMrU7Wfq'

*55個名字為a的狗狗,調用一個名字為a的看了下,顯然a不是狗狗的名字,是為質量問題
*text里面含有鏈接

In?[66]:twitter_archive_enhanced.rating_denominator.value_counts()

Out[66]:

10     2333
11        3
50        3
80        2
20        2
2         1
16        1
40        1
70        1
15        1
90        1
110       1
120       1
130       1
150       1
170       1
7         1
0         1
Name: rating_denominator, dtype: int64

可見,rating_denominator不全為10

In?[67]:twitter_archive_enhanced.source.iloc[0]

Out[67]:

'<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>'

source中含有html文本

另外,這個數據集還有個整潔度問題,狗狗地位是一個變量,doggo,floofer, pupper, puppo應為一列

收集并評估image-predictions

In?[68]:folder_name ='pred-image'

if not os.path.exists(folder_name):
os.makedirs(folder_name) url='https://raw.githubusercontent.com/udacity/new-dand-advanced-china/master/%E6%95%B0%E6%8D%AE%E6%B8%85%E6%B4%97/WeRateDogs%E9%A1%B9%E7%9B%AE/image-predictions.tsv'

response = requests.get(url)

response

Out[68]:

<Response [200]>

In?[69]:

with open(os.path.join(folder_name, url.split('/')[-1]), mode='wb') as file:

file.write(response.content)

In?[70]:os.listdir(folder_name)

Out[70]:

['image-predictions.tsv']

In?[71]:image_predictions = pd.read_csv('image-predictions.tsv',sep='\t')

In?[72]:image_predictions.head()

Out[72]:

? tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 German_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 Rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True

In?[73]:image_predictions.jpg_url.duplicated().value_counts()

Out[73]:

False    2009
True       66
Name: jpg_url, dtype: int64

有66條重復的圖片鏈接

tweet_id類型錯誤

打開并評估tweet_json

In?[74]:tweet_json = pd.DataFrame()

In?[75]:

file = open('tweet_json.txt','r')

for line in file.readlines():

dic = json.loads(line)

tweet_id = dic['id']

retweet_count = dic['retweet_count']

favorite_count = dic['favorite_count']

tem_df = pd.DataFrame({'tweet_id':tweet_id,

'retweet_count':retweet_count,

'favorite_count':favorite_count},index=[0])

tweet_json = pd.concat([tweet_json,tem_df])

In?[76]:

tweet_json

Out[76]:

? tweet_id retweet_count favorite_count
0 892420643555336193 8842 39492
0 892177421306343426 6480 33786
0 891815181378084864 4301 25445
0 891689557279858688 8925 42863
0 891327558926688256 9721 41016
... ... ... ...
0 666049248165822465 41 111
0 666044226329800704 147 309
0 666033412701032449 47 128
0 666029285002620928 48 132
0 666020888022790149 530 2528

2352 rows × 3 columns

tweet_id 類型錯誤

綜上,

#*數據集里的質量問題:

  1. tweet_id,timestamp類型錯誤
  2. jpg_url有66條重復的鏈接
  3. source中含有html文本
  4. rating_denominator不全為10,還有分母為0的情況出現
  5. 55個名字為a的狗狗,調用一個名字為a的看了下,顯然a不是狗狗的名字,是為質量問題
  6. text里面含有鏈接
  7. retweeted_status_id 不為nan的為轉發數據,181條轉發數據,根據項目要求,這些數據后面需要刪除
  8. in_reply_to_status_id,in_reply_to_user_id 僅有78列
  9. 沒有照片的數據, 根據項目要求,這些數據后面需要刪除

#*整潔度問題:

  1. 狗狗地位是一個變量,doggo,floofer, pupper, puppo應為一列
  2. 三個數據集有一個觀察對象tweet_id,可以合為一個數據集

數據清洗

In?[77]:

twitter_archive_enhanced_clean = twitter_archive_enhanced.copy()

image_predictions_clean = image_predictions.copy()

tweet_json_clean = tweet_json.copy()

issue:?tweet_id類型錯誤

define:?修改tweet_id為str

code:

In?[78]:twitter_archive_enhanced_clean['tweet_id'] = twitter_archive_enhanced_clean['tweet_id'].astype('str')

In?[79]:image_predictions_clean['tweet_id'] = image_predictions_clean['tweet_id'].astype('str')

In?[80]:tweet_json_clean['tweet_id'] = tweet_json_clean['tweet_id'].astype('str')

Test

In?[81]:twitter_archive_enhanced_clean['tweet_id']

Out[81]:

0       892420643555336193
1       892177421306343426
2       891815181378084864
3       891689557279858688
4       891327558926688256...        
2351    666049248165822465
2352    666044226329800704
2353    666033412701032449
2354    666029285002620928
2355    666020888022790149
Name: tweet_id, Length: 2356, dtype: object

In?[82]:image_predictions_clean['tweet_id']

Out[82]:

0       666020888022790149
1       666029285002620928
2       666033412701032449
3       666044226329800704
4       666049248165822465...        
2070    891327558926688256
2071    891689557279858688
2072    891815181378084864
2073    892177421306343426
2074    892420643555336193
Name: tweet_id, Length: 2075, dtype: object

In?[83]:tweet_json_clean['tweet_id']

Out[83]:

0    892420643555336193
0    892177421306343426
0    891815181378084864
0    891689557279858688
0    891327558926688256...        
0    666049248165822465
0    666044226329800704
0    666033412701032449
0    666029285002620928
0    666020888022790149
Name: tweet_id, Length: 2352, dtype: object

issue:?timestamp類型錯誤

define:?修改為datetime

code:

In?[84]:twitter_archive_enhanced_clean['timestamp'] = pd.to_datetime(twitter_archive_enhanced_clean['timestamp'])

Test

In?[85]:twitter_archive_enhanced_clean['timestamp']

Out[85]:

0      2017-08-01 16:23:56+00:00
1      2017-08-01 00:17:27+00:00
2      2017-07-31 00:18:03+00:00
3      2017-07-30 15:58:51+00:00
4      2017-07-29 16:00:24+00:00...           
2351   2015-11-16 00:24:50+00:00
2352   2015-11-16 00:04:52+00:00
2353   2015-11-15 23:21:54+00:00
2354   2015-11-15 23:05:30+00:00
2355   2015-11-15 22:32:08+00:00
Name: timestamp, Length: 2356, dtype: datetime64[ns, UTC]

issue:?55個名字為a的狗狗,調用一個名字為a的看了下,顯然a不是狗狗的名字

define:?將a用None代替

code:

In?[86]:twitter_archive_enhanced_clean['name']= twitter_archive_enhanced_clean['name'].replace('a',np.nan)

Test

In?[88]:twitter_archive_enhanced_clean['name'].value_counts()

Out[88]:

None        745
Charlie      12
Lucy         11
Oliver       11
Cooper       11... 
Karll         1
Tiger         1
old           1
Meatball      1
Stormy        1
Name: name, Length: 956, dtype: int64

Issue:

分母不全為10

define:?Create new column rating=rating_numerator/rating_denominator. Drop rating_numerator and rating_denominator.

Code:

In?[90]:twitter_archive_enhanced_clean=twitter_archive_enhanced_clean[twitter_archive_enhanced_clean.rating_denominator!= 0]

In?[91]:twitter_archive_enhanced_clean['rating']=twitter_archive_enhanced_clean.rating_numerator/twitter_archive_enhanced_clean.rating_denominator

In?[92]:twitter_archive_enhanced_clean=twitter_archive_enhanced_clean.drop(['rating_numerator','rating_denominator'],axis=1)

Test:

In?[93]:twitter_archive_enhanced_clean

Out[93]:

? tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls name doggo floofer pupper puppo rating
0 892420643555336193 NaN NaN 2017-08-01 16:23:56+00:00 <a href="http://twitter.com/download/iphone" r... This is Phineas. He's a mystical boy. Only eve... NaN NaN NaN https://twitter.com/dog_rates/status/892420643... Phineas None None None None 1.3
1 892177421306343426 NaN NaN 2017-08-01 00:17:27+00:00 <a href="http://twitter.com/download/iphone" r... This is Tilly. She's just checking pup on you.... NaN NaN NaN https://twitter.com/dog_rates/status/892177421... Tilly None None None None 1.3
2 891815181378084864 NaN NaN 2017-07-31 00:18:03+00:00 <a href="http://twitter.com/download/iphone" r... This is Archie. He is a rare Norwegian Pouncin... NaN NaN NaN https://twitter.com/dog_rates/status/891815181... Archie None None None None 1.2
3 891689557279858688 NaN NaN 2017-07-30 15:58:51+00:00 <a href="http://twitter.com/download/iphone" r... This is Darla. She commenced a snooze mid meal... NaN NaN NaN https://twitter.com/dog_rates/status/891689557... Darla None None None None 1.3
4 891327558926688256 NaN NaN 2017-07-29 16:00:24+00:00 <a href="http://twitter.com/download/iphone" r... This is Franklin. He would like you to stop ca... NaN NaN NaN https://twitter.com/dog_rates/status/891327558... Franklin None None None None 1.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2351 666049248165822465 NaN NaN 2015-11-16 00:24:50+00:00 <a href="http://twitter.com/download/iphone" r... Here we have a 1949 1st generation vulpix. Enj... NaN NaN NaN https://twitter.com/dog_rates/status/666049248... None None None None None 0.5
2352 666044226329800704 NaN NaN 2015-11-16 00:04:52+00:00 <a href="http://twitter.com/download/iphone" r... This is a purebred Piers Morgan. Loves to Netf... NaN NaN NaN https://twitter.com/dog_rates/status/666044226... NaN None None None None 0.6
2353 666033412701032449 NaN NaN 2015-11-15 23:21:54+00:00 <a href="http://twitter.com/download/iphone" r... Here is a very happy pup. Big fan of well-main... NaN NaN NaN https://twitter.com/dog_rates/status/666033412... NaN None None None None 0.9
2354 666029285002620928 NaN NaN 2015-11-15 23:05:30+00:00 <a href="http://twitter.com/download/iphone" r... This is a western brown Mitsubishi terrier. Up... NaN NaN NaN https://twitter.com/dog_rates/status/666029285... NaN None None None None 0.7
2355 666020888022790149 NaN NaN 2015-11-15 22:32:08+00:00 <a href="http://twitter.com/download/iphone" r... Here we have a Japanese Irish Setter. Lost eye... NaN NaN NaN https://twitter.com/dog_rates/status/666020888... None None None None None 0.8

2355 rows × 16 columns

Issue:?duplicated of jpg_url

define:?delete the duplicated

code:

In?[94]:image_predictions_clean=image_predictions_clean[~image_predictions_clean.jpg_url.duplicated()]

Test:

In?[95]:sum(image_predictions_clean.jpg_url.duplicated())

Out[95]:

Issue:?in_reply_to_status_id in_reply_to_user_id only 23

Define:?drop them directly

Code:

In?[96]:twitter_archive_enhanced_clean.drop(twitter_archive_enhanced_clean[['in_reply_to_status_id','in_reply_to_user_id']],axis=1,inplace=True)

Test:

In?[97]:twitter_archive_enhanced_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2355 entries, 0 to 2355
Data columns (total 14 columns):#   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              0   tweet_id                    2355 non-null   object             1   timestamp                   2355 non-null   datetime64[ns, UTC]2   source                      2355 non-null   object             3   text                        2355 non-null   object             4   retweeted_status_id         181 non-null    float64            5   retweeted_status_user_id    181 non-null    float64            6   retweeted_status_timestamp  181 non-null    object             7   expanded_urls               2297 non-null   object             8   name                        2300 non-null   object             9   doggo                       2355 non-null   object             10  floofer                     2355 non-null   object             11  pupper                      2355 non-null   object             12  puppo                       2355 non-null   object             13  rating                      2355 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(3), object(10)
memory usage: 276.0+ KB

Issue:?html content in source

define:?delete html

Code:

In?[98]:twitter_archive_enhanced_clean.source= twitter_archive_enhanced_clean.source.str.extract('>(.+)<',expand = True)

Test

In?[99]:twitter_archive_enhanced_clean['source'].value_counts()

Out[99]:

Twitter for iPhone     2220
Vine - Make a Scene      91
Twitter Web Client       33
TweetDeck                11
Name: source, dtype: int64

Issue:?text column contain url

define:?delete url

code:

In?[100]:twitter_archive_enhanced_clean.text.replace(r'https.*','',regex=True, inplace=True)

test

In?[101]:twitter_archive_enhanced_clean.text

Out[101]:

0       This is Phineas. He's a mystical boy. Only eve...
1       This is Tilly. She's just checking pup on you....
2       This is Archie. He is a rare Norwegian Pouncin...
3       This is Darla. She commenced a snooze mid meal...
4       This is Franklin. He would like you to stop ca......                        
2351    Here we have a 1949 1st generation vulpix. Enj...
2352    This is a purebred Piers Morgan. Loves to Netf...
2353    Here is a very happy pup. Big fan of well-main...
2354    This is a western brown Mitsubishi terrier. Up...
2355    Here we have a Japanese Irish Setter. Lost eye...
Name: text, Length: 2355, dtype: object

issue:?含有轉發數據

define:?刪除轉發數據

code:

In?[102]:twitter_archive_enhanced_clean=twitter_archive_enhanced_clean[twitter_archive_enhanced_clean.retweeted_status_id.isnull()]

twitter_archive_enhanced_clean=twitter_archive_enhanced_clean.drop(['retweeted_status_id'],axis=1)

Test

In?[103]:twitter_archive_enhanced_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2174 entries, 0 to 2355
Data columns (total 13 columns):#   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              0   tweet_id                    2174 non-null   object             1   timestamp                   2174 non-null   datetime64[ns, UTC]2   source                      2174 non-null   object             3   text                        2174 non-null   object             4   retweeted_status_user_id    0 non-null      float64            5   retweeted_status_timestamp  0 non-null      object             6   expanded_urls               2117 non-null   object             7   name                        2119 non-null   object             8   doggo                       2174 non-null   object             9   floofer                     2174 non-null   object             10  pupper                      2174 non-null   object             11  puppo                       2174 non-null   object             12  rating                      2174 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(2), object(10)
memory usage: 237.8+ KB

issue:?狗狗地位是一個變量,應該為一列

define?將其放在一列

code

In?[104]:

twitter_archive_enhanced_clean['stage']= twitter_archive_enhanced_clean.text.str.findall('(doggo|pupper|puppo|floofer)')twitter_archive_enhanced_clean['stage'] = twitter_archive_enhanced_clean['stage'].apply(lambda x: ','.join(set(x)))

In?[105]:

twitter_archive_enhanced_clean['stage']=twitter_archive_enhanced_clean['stage'].replace('',np.nan)

In?[106]:

twitter_archive_enhanced_clean.drop(twitter_archive_enhanced_clean[['doggo','puppo','pupper','floofer']],axis=1,inplace=True)

Test

In?[107]:

twitter_archive_enhanced_clean.stage.value_counts()

Out[107]:

pupper          242
doggo            78
puppo            30
pupper,doggo      8
floofer           4
puppo,doggo       2
Name: stage, dtype: int64

ISSUE:?三個數據集共有一個觀察對象,可以合并為一個數據集. 無照片的數據也可以刪除。

define:?將3個數據集合并在一起,并且刪除無照片的數據

code

In?[108]:

df1_clean = twitter_archive_enhanced_clean.merge(image_predictions_clean,how='inner',on='tweet_id')

In?[109]:

df_clean = df1_clean.merge(tweet_json_clean,how='left',on='tweet_id')

test

In?[110]:

df_clean

Out[110]:

? tweet_id timestamp source text retweeted_status_user_id retweeted_status_timestamp expanded_urls name rating stage ... p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog retweet_count favorite_count
0 892420643555336193 2017-08-01 16:23:56+00:00 Twitter for iPhone This is Phineas. He's a mystical boy. Only eve... NaN NaN https://twitter.com/dog_rates/status/892420643... Phineas 1.3 NaN ... 0.097049 False bagel 0.085851 False banana 0.076110 False 8842 39492
1 892177421306343426 2017-08-01 00:17:27+00:00 Twitter for iPhone This is Tilly. She's just checking pup on you.... NaN NaN https://twitter.com/dog_rates/status/892177421... Tilly 1.3 NaN ... 0.323581 True Pekinese 0.090647 True papillon 0.068957 True 6480 33786
2 891815181378084864 2017-07-31 00:18:03+00:00 Twitter for iPhone This is Archie. He is a rare Norwegian Pouncin... NaN NaN https://twitter.com/dog_rates/status/891815181... Archie 1.2 NaN ... 0.716012 True malamute 0.078253 True kelpie 0.031379 True 4301 25445
3 891689557279858688 2017-07-30 15:58:51+00:00 Twitter for iPhone This is Darla. She commenced a snooze mid meal... NaN NaN https://twitter.com/dog_rates/status/891689557... Darla 1.3 NaN ... 0.170278 False Labrador_retriever 0.168086 True spatula 0.040836 False 8925 42863
4 891327558926688256 2017-07-29 16:00:24+00:00 Twitter for iPhone This is Franklin. He would like you to stop ca... NaN NaN https://twitter.com/dog_rates/status/891327558... Franklin 1.2 NaN ... 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True 9721 41016
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1989 666049248165822465 2015-11-16 00:24:50+00:00 Twitter for iPhone Here we have a 1949 1st generation vulpix. Enj... NaN NaN https://twitter.com/dog_rates/status/666049248... None 0.5 NaN ... 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True 41 111
1990 666044226329800704 2015-11-16 00:04:52+00:00 Twitter for iPhone This is a purebred Piers Morgan. Loves to Netf... NaN NaN https://twitter.com/dog_rates/status/666044226... NaN 0.6 NaN ... 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True 147 309
1991 666033412701032449 2015-11-15 23:21:54+00:00 Twitter for iPhone Here is a very happy pup. Big fan of well-main... NaN NaN https://twitter.com/dog_rates/status/666033412... NaN 0.9 NaN ... 0.596461 True malinois 0.138584 True bloodhound 0.116197 True 47 128
1992 666029285002620928 2015-11-15 23:05:30+00:00 Twitter for iPhone This is a western brown Mitsubishi terrier. Up... NaN NaN https://twitter.com/dog_rates/status/666029285... NaN 0.7 NaN ... 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True 48 132
1993 666020888022790149 2015-11-15 22:32:08+00:00 Twitter for iPhone Here we have a Japanese Irish Setter. Lost eye... NaN NaN https://twitter.com/dog_rates/status/666020888... None 0.8 NaN ... 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True 530 2528

1994 rows × 23 columns

保存數據集

In?[112]:

#save the file

save_file_name = 'twitter_archive_master.csv'

df_clean.to_csv(save_file_name, encoding='utf-8',index=False)

分析與可視化

In?[114]:

#data analysisdata = pd.read_csv('twitter_archive_master.csv', encoding='utf-8')

In?[115]:

data.head(10)

Out[115]:

? tweet_id timestamp source text retweeted_status_user_id retweeted_status_timestamp expanded_urls name rating stage ... p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog retweet_count favorite_count
0 892420643555336193 2017-08-01 16:23:56+00:00 Twitter for iPhone This is Phineas. He's a mystical boy. Only eve... NaN NaN https://twitter.com/dog_rates/status/892420643... Phineas 1.3 NaN ... 0.097049 False bagel 0.085851 False banana 0.076110 False 8842 39492
1 892177421306343426 2017-08-01 00:17:27+00:00 Twitter for iPhone This is Tilly. She's just checking pup on you.... NaN NaN https://twitter.com/dog_rates/status/892177421... Tilly 1.3 NaN ... 0.323581 True Pekinese 0.090647 True papillon 0.068957 True 6480 33786
2 891815181378084864 2017-07-31 00:18:03+00:00 Twitter for iPhone This is Archie. He is a rare Norwegian Pouncin... NaN NaN https://twitter.com/dog_rates/status/891815181... Archie 1.2 NaN ... 0.716012 True malamute 0.078253 True kelpie 0.031379 True 4301 25445
3 891689557279858688 2017-07-30 15:58:51+00:00 Twitter for iPhone This is Darla. She commenced a snooze mid meal... NaN NaN https://twitter.com/dog_rates/status/891689557... Darla 1.3 NaN ... 0.170278 False Labrador_retriever 0.168086 True spatula 0.040836 False 8925 42863
4 891327558926688256 2017-07-29 16:00:24+00:00 Twitter for iPhone This is Franklin. He would like you to stop ca... NaN NaN https://twitter.com/dog_rates/status/891327558... Franklin 1.2 NaN ... 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True 9721 41016
5 891087950875897856 2017-07-29 00:08:17+00:00 Twitter for iPhone Here we have a majestic great white breaching ... NaN NaN https://twitter.com/dog_rates/status/891087950... None 1.3 NaN ... 0.425595 True Irish_terrier 0.116317 True Indian_elephant 0.076902 False 3240 20548
6 890971913173991426 2017-07-28 16:27:12+00:00 Twitter for iPhone Meet Jax. He enjoys ice cream so much he gets ... NaN NaN https://gofundme.com/ydvmve-surgery-for-jax,ht... Jax 1.3 NaN ... 0.341703 True Border_collie 0.199287 True ice_lolly 0.193548 False 2142 12053
7 890729181411237888 2017-07-28 00:22:40+00:00 Twitter for iPhone When you watch your owner call another dog a g... NaN NaN https://twitter.com/dog_rates/status/890729181... None 1.3 NaN ... 0.566142 True Eskimo_dog 0.178406 True Pembroke 0.076507 True 19548 66596
8 890609185150312448 2017-07-27 16:25:51+00:00 Twitter for iPhone This is Zoey. She doesn't want to be one of th... NaN NaN https://twitter.com/dog_rates/status/890609185... Zoey 1.3 NaN ... 0.487574 True Irish_setter 0.193054 True Chesapeake_Bay_retriever 0.118184 True 4403 28187
9 890240255349198849 2017-07-26 15:59:51+00:00 Twitter for iPhone This is Cassie. She is a college pup. Studying... NaN NaN https://twitter.com/dog_rates/status/890240255... Cassie 1.4 doggo ... 0.511319 True Cardigan 0.451038 True Chihuahua 0.029248 True 7684 32467

10 rows × 23 columns

In?[116]:data.favorite_count.describe()

Out[116]:

count      1994.000000
mean       8923.133400
std       12400.238808
min          81.000000
25%        1972.250000
50%        4117.000000
75%       11275.500000
max      132318.000000
Name: favorite_count, dtype: float64

In?[117]:data.retweet_count.describe()

Out[117]:

count     1994.000000
mean      2770.021063
std       4715.961325
min         15.000000
25%        622.250000
50%       1348.500000
75%       3202.750000
max      79116.000000
Name: retweet_count, dtype: float64

In?[118]:

import matplotlib.pyplot as plt

%matplotlib inline

In?[119]:

plt.bar(x=['favorite_count','retweet_count'], height = [data.favorite_count.sum(),data.retweet_count.sum()])plt.title('Number of Favorite count VS Retweet Count')

Out[119]:

Text(0.5, 1.0, 'Number of Favorite count VS Retweet Count')

*?So the first conclusion is : favorate count more than retweet count

In?[120]:data[data.p1_conf > 0.5].p1.value_counts()

Out[120]:

golden_retriever       116
Pembroke                70
Labrador_retriever      65
Chihuahua               47
pug                     43... 
scorpion                 1
Appenzeller              1
flamingo                 1
axolotl                  1
Irish_water_spaniel      1
Name: p1, Length: 245, dtype: int64

the second conclusion: the most dog: golden_retriever

In?[121]:data['rating'].value_counts()

Out[121]:

1.200000      454
1.000000      421
1.100000      402
1.300000      261
0.900000      151
0.800000       95
0.700000       51
1.400000       35
0.500000       34
0.600000       32
0.300000       19
0.400000       15
0.200000       10
0.100000        4
0.000000        2
177.600000      1
2.600000        1
3.428571        1
0.636364        1
0.818182        1
42.000000       1
7.500000        1
2.700000        1
Name: rating, dtype: int64

#the third conclusion: most numerator are more than 10

總結

以上是生活随笔為你收集整理的WeRateDog---分析推特数据的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。