日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

spotify 数据分析_没有数据? 没问题! 如何从Wikipedia和Spotify收集重金属数据

發(fā)布時(shí)間:2023/11/29 编程问答 34 豆豆
生活随笔 收集整理的這篇文章主要介紹了 spotify 数据分析_没有数据? 没问题! 如何从Wikipedia和Spotify收集重金属数据 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

spotify 數(shù)據(jù)分析

For many data science students, collecting data is seen as a solved problem. It’s just there in Kaggle or UCI. However, that’s not how data is available daily for working Data Scientists. Also, many of the datasets used for learning have been largely explored, so how innovative would be building a portfolio based on them? What about building your dataset by combining different sources?

對(duì)于許多數(shù)據(jù)科學(xué)專業(yè)的學(xué)生來(lái)說(shuō),收集數(shù)據(jù)被視為已解決的問(wèn)題。 它就在Kaggle或UCI中。 但是,這不是每天為工作的數(shù)據(jù)科學(xué)家提供數(shù)據(jù)的方式。 此外,已經(jīng)廣泛探索了許多用于學(xué)習(xí)的數(shù)據(jù)集,因此如何基于這些數(shù)據(jù)集構(gòu)建投資組合? 如何通過(guò)組合不同的來(lái)源來(lái)構(gòu)建數(shù)據(jù)集?

Let’s dive in.

讓我們潛入。

為什么選擇重金屬數(shù)據(jù)? (Why Heavy Metal data?)

Seen by many as a very strict music genre (screaming vocals, fast drums, distorted guitars), it actually goes the other way round. Metal music it’s not as mainstream as most genres but it has, without question, the largest umbrella of subgenres with so many distinct sounds. Therefore, seeing its differences through data could be a good idea, even for the listener not familiarized with it.

被許多人視為非常嚴(yán)格的音樂(lè)類型(尖叫的人聲,快速的鼓聲,失真的吉他),但實(shí)際上卻相反。 金屬音樂(lè)不像大多數(shù)流派那樣流行,但是毫無(wú)疑問(wèn),它擁有最大的亞流派,并具有如此多的獨(dú)特聲音。 因此,即使對(duì)于聽(tīng)眾不熟悉的數(shù)據(jù),通過(guò)數(shù)據(jù)查看其差異也是一個(gè)好主意。

為什么選擇維基百科? 怎么樣? (Why Wikipedia? How?)

Wikipedia is frequently updated and it presents relevant information on almost every topic. Also, it gathers many useful links in every page. With heavy metal, it would be no different, especially listing the most popular subgenres and their most relevant bands.

維基百科經(jīng)常更新,并提供幾乎每個(gè)主題的相關(guān)信息。 而且,它在每個(gè)頁(yè)面中收集許多有用的鏈接。 對(duì)于重金屬,這沒(méi)什么兩樣,特別是列出了最受歡迎的子類型及其最相關(guān)的樂(lè)隊(duì)。

The following page presents not only a historic perspective of the style, but also hyperlinks for many other subgenres and fusion genres in the bottom. The idea here is to collect metal genres only by this starting point.

下一頁(yè)不僅顯示了樣式的歷史觀點(diǎn),還顯示了底部許多其他子類型和融合類型的超鏈接。 這里的想法是僅在此起點(diǎn)上收集金屬類型。

https://en.wikipedia.org/wiki/Heavy_metal_genreshttps://en.wikipedia.org/wiki/Heavy_metal_genres

Therefore, the genres’ names in the table were scrapped and compiled in a Python list.

因此,表格的流派名稱被廢棄并編譯到Python列表中。

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import re# Getting URLssource = 'https://en.wikipedia.org/wiki/Heavy_metal_genres'
response = requests.get(source)
soup = BeautifulSoup(response.text, 'html.parser')
pages = soup.find(class_='navbox-list navbox-odd')
pages = pages.findAll('a')links = []for page in pages:
links.append(('List_of_' + page.get('title').lower().replace(' ','_') + '_bands').replace('_music',''))

After inspecting some of the genres’ pages, we discovered that the most relevant ones have pages listing their most important bands. The pages’ URLs can be presented in one or more of the following patterns:

在檢查了某些類型的頁(yè)面后,我們發(fā)現(xiàn)最相關(guān)的頁(yè)面中列出了其最重要的樂(lè)隊(duì)。 頁(yè)面的URL可以以下一種或多種模式顯示:

“https://en.wikipedia.org/wiki/List_of_” + genre + “_bands”

“ https://en.wikipedia.org/wiki/List_of_” +類型+“ _bands”

“https://en.wikipedia.org/wiki/List_of_” + genre + “_bands,_0–K”

“ https://zh.wikipedia.org/wiki/List_of_” +類型+“ _ bands,_0–K”

“https://en.wikipedia.org/wiki/List_of_” + genre + “_bands,_L–Z”

“ https://zh.wikipedia.org/wiki/List_of_” +類型+“ _bands,_L–Z”

After inspecting the links, we were able to detect that the band names were presented in varying forms according to each page. Some were tables, some alphabetical lists and some in both. Each presentation form required a different scrapping approach.

在檢查了鏈接之后,我們能夠檢測(cè)到根據(jù)每個(gè)頁(yè)面以不同形式顯示的樂(lè)隊(duì)名稱。 有些是表格,有些是字母順序的列表,而兩者都有。 每個(gè)演示文稿表格都需要不同的報(bào)廢方法。

Data in tables, alphabetical lists and lists of tables.表格,字母列表和表格列表中的數(shù)據(jù)。

Some band names were polluted with additional characters (mostly for notes or referencing), so a function was developed to deal with these issues.

一些樂(lè)隊(duì)名稱被附加字符污染(大多數(shù)用于注釋或參考),因此開(kāi)發(fā)了一個(gè)功能來(lái)處理這些問(wèn)題。

def string_ajustment(band):
"""Ajustment of the retrieved band name string"""

end = band.find('[') # Remove brackets from band name
if end > -1:
band = band[:end]
else:
band = band

end = band.find('(') # Remove parentesis from band name
if end > -1:
band = band[:end]

band = band.title().rstrip() # Uppercase in first letters; last space removal

return band

The scrapping code gathered the data which was later compiled into a Pandas dataframe object.

剪貼代碼收集了數(shù)據(jù),這些數(shù)據(jù)隨后被編譯為Pandas數(shù)據(jù)框?qū)ο蟆?

%%timedata = []
genres = []for link in links:
url = 'https://en.wikipedia.org/wiki/' + link
genre = url[url.rfind('/') + 1:]

list_from = ['List_of_', '_bands', ',_!–K', ',_L–Z', '_']
list_to = ['', '', '', '', ' ']

for idx, element in enumerate(list_from):
genre = genre.replace(list_from[idx], list_to[idx])

genre = genre.title()
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Table detection
tables = []
tables = soup.find_all('table', {'class':'wikitable'}) # 1st attempt
if len(tables) == 0:
tables = soup.find_all('table', {'class':'wikitable sortable'}) # 2nd attempt

# Getting table data
if len(tables) > 0: # pages with tables
genres.append(genre)
for table in tables:
table = table.tbody
rows = table.find_all('tr')
columns = [v.text.replace('\n', '') for v in rows[0].find_all('th')]for i in range(1, len(rows)):
tds = rows[i].find_all('td')
band = tds[0].text.replace('\n', '')
band = string_ajustment(band)
values = [band, genre]

data.append(pd.Series(values)) # Append band
else:
# Getting data in lists
groups = soup.find_all('div', class_ = 'div-col columns column-width') # Groups being lists of bands, 1st attempt
if len(groups) == 0:
groups = soup.find_all('table', {'class':'multicol'}) # Groups being lists of bands, 2nd attempt

for group in groups:
genres.append(genre)
array = group.text.split('\n')[1:len(group.text.split('\n'))-1]

for band in array:
if (band != '0-9'):
band = string_ajustment(band)
if (band.find('Reference') > -1) or (band.find('See also') > -1): # Remove text without band name
break
elif len(band) > 1:
values = [band, genre]
data.append(pd.Series(values)) # Append band

if genre not in genres: # Two possibilities: either data in multiple urls or no data available (non-relevant genre)
additional_links = [link + ',_!–K', link + ',_L–Z']

for additional_link in additional_links:
url = 'https://en.wikipedia.org/wiki/' + additional_link
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
groups = soup.find_all('table', {'class':'multicol'}) # Groups being lists of bands

for group in groups:
genres.append(genre)
array = group.text.split('\n')[1:len(group.text.split('\n'))-1]

for band in array:
if (band != '0-9'):
band = string_ajustment(band)

if (band.find('Reference') > -1) or (band.find('See also') > -1): # Remove text without band name
break
elif len(band) > 1:
values = [band, genre]
data.append(pd.Series(values)) # Append band

Creating the Pandas dataframe object:

創(chuàng)建熊貓數(shù)據(jù)框?qū)ο?#xff1a;

df_bands = pd.DataFrame(data)
df_bands.columns = ['Band', 'Genre']
df_bands.drop_duplicates(inplace=True)df_bands

Adding a label to anything is hard and with music genres, it wouldn’t be different. Some bands played different styles through the years and other bands made crossovers between varying musical elements. If the band is listed in multiple Wikipedia pages, our Pandas dataframe presents it multiple times, each time with a different label/genre.

在任何事物上加上標(biāo)簽都是很困難的,而且對(duì)于音樂(lè)流派來(lái)說(shuō),也沒(méi)有什么不同。 多年來(lái),有些樂(lè)隊(duì)演奏風(fēng)格各異,另一些樂(lè)隊(duì)則在不同的音樂(lè)元素之間進(jìn)行了轉(zhuǎn)換。 如果樂(lè)隊(duì)在多個(gè)Wikipedia頁(yè)面上列出,則我們的Pandas數(shù)據(jù)框會(huì)多次顯示該樂(lè)隊(duì),每次帶有不同的標(biāo)簽/流派。

You might be asking how to deal with it.

您可能會(huì)問(wèn)如何處理它

Well, it depends on your intentions with the data.

好吧,這取決于您對(duì)數(shù)據(jù)的意圖

If the intention is to create a genre classifier given the songs’ attributes, the most relevant label could be kept. This information could be found scrapping the number of Google Search results of the band name and the genre, for example. The one with most results should be kept. If the intention is to develop a multi output-multiclass classifier, there is no need to drop the labels.

如果打算根據(jù)歌曲的屬性創(chuàng)建流派分類器 ,則可以保留最相關(guān)的標(biāo)簽。 例如,可以找到這些信息,刪除了樂(lè)隊(duì)名稱和流派的Google搜索結(jié)果數(shù)量。 結(jié)果最多的一個(gè)應(yīng)該保留。 如果要開(kāi)發(fā)多輸出多類分類器 ,則無(wú)需刪除標(biāo)簽。

為什么選擇Spotify? 怎么樣? (Why Spotify? How?)

Differently from Wikipedia, Spotify provides an API for data collection. Based on this video by CodingEntrepreneurs with minor changes, we were able to collect an artists’ albums, tracks and its features. The first thing to access the API is to register your application in Spotify’s developer’s page. You’ll be able to find your client_id and your client_secret after registering.

與Wikipedia不同,Spotify提供了用于數(shù)據(jù)收集的API。 根據(jù)CodingEntrepreneurs的 這段視頻進(jìn)行了微小的更改,我們就可以收集藝術(shù)家的專輯,曲目及其功能。 訪問(wèn)API的第一件事是在Spotify開(kāi)發(fā)人員頁(yè)面上注冊(cè)您的應(yīng)用程序。 注冊(cè)后,您將可以找到client_id和client_secret 。

Before using this approach, I’ve tried using Spotipy. However, the amount of data we were trying to collect was requiring too much token refreshes (also in a non-understandable pattern). Thus, we changed our approach to match CodingEntrepreneurs’, which became much more reliable.

在使用這種方法之前,我嘗試使用Spotipy 。 但是,我們?cè)噲D收集的數(shù)據(jù)量需要太多的令牌刷新(也是一種不可理解的模式)。 因此,我們更改了方法以匹配CodingEntrepreneurs ,這變得更加可靠。

import base64
import requests
import datetime
from urllib.parse import urlencodeclient_id ='YOUR_CLIENT_ID'
client_secret = 'YOUR_CLIENT_SECRET'class SpotifyAPI(object):
access_token = None
access_token_expires = datetime.datetime.now()
access_token_did_expire = True
client_id = None
client_secret = None
token_url = 'https://accounts.spotify.com/api/token'

def __init__(self, client_id, client_secret, *args, **kwargs):
super().__init__(*args, **kwargs)
self.client_id = client_id
self.client_secret = client_secret

def get_client_credentials(self):
"""
Returns a base64 encoded string
"""
client_id = self.client_id
client_secret = self.client_secret

if (client_id == None) or (client_secret == None):
raise Exception('You must set client_id and client secret')
client_creds = f'{client_id}:{client_secret}'
client_creds_b64 = base64.b64encode(client_creds.encode())
return client_creds_b64.decode()

def get_token_headers(self):
client_creds_b64 = self.get_client_credentials()
return {
'Authorization': f'Basic {client_creds_b64}' # <base64 encoded client_id:client_secret>
}

def get_token_data(self):
return {
'grant_type': 'client_credentials'
}

def perform_auth(self):
token_url = self.token_url
token_data = self.get_token_data()
token_headers = self.get_token_headers()
r = requests.post(token_url, data=token_data, headers=token_headers)
if r.status_code not in range(200, 299):
raise Exception('Could not authenticate client.')
data = r.json()
now = datetime.datetime.now()
access_token = data['access_token']
expires_in = data['expires_in'] # seconds
expires = now + datetime.timedelta(seconds=expires_in)
self.access_token = access_token
self.access_token_expires = expires
self.access_token_did_expire = expires < now
return True

def get_access_token(self):
token = self.access_token
expires = self.access_token_expires
now = datetime.datetime.now()
if expires < now:
self.perform_auth()
return self.get_access_token()
elif token == None:
self.perform_auth()
return self.get_access_token()
return token

def get_resource_header(self):
access_token = self.get_access_token()
headers = {
'Authorization': f'Bearer {access_token}'
}
return headers

def get_resource(self, lookup_id, resource_type='albums', version='v1'):

if resource_type == 'tracks':
endpoint = f'https://api.spotify.com/{version}/albums/{lookup_id}/{resource_type}'
elif resource_type == 'features':
endpoint = f'https://api.spotify.com/{version}/audio-features/{lookup_id}'
elif resource_type == 'analysis':
endpoint = f'https://api.spotify.com/{version}/audio-analysis/{lookup_id}'
elif resource_type == 'popularity':
endpoint = f'https://api.spotify.com/{version}/tracks/{lookup_id}'
elif resource_type != 'albums':
endpoint = f'https://api.spotify.com/{version}/{resource_type}/{lookup_id}'
else:
endpoint = f'https://api.spotify.com/{version}/artists/{lookup_id}/albums' # Get an Artist's Albums

headers = self.get_resource_header()
r = requests.get(endpoint, headers=headers)
if r.status_code not in range(200, 299):
return {}
return r.json()

def get_artist(self, _id):
return self.get_resource(_id, resource_type='artists')

def get_albums(self, _id):
return self.get_resource(_id, resource_type='albums')

def get_album_tracks(self, _id):
return self.get_resource(_id, resource_type='tracks')

def get_track_features(self, _id):
return self.get_resource(_id, resource_type='features')def get_track_analysis(self, _id):
return self.get_resource(_id, resource_type='analysis')

def get_track_popularity(self, _id):
return self.get_resource(_id, resource_type='popularity')

def get_next(self, result):
""" returns the next result given a paged result
Parameters:
- result - a previously returned paged result
"""
if result['next']:
return self.get_next_resource(result['next'])
else:
return None

def get_next_resource(self, url):
endpoint = url
headers = self.get_resource_header()
r = requests.get(endpoint, headers=headers)
if r.status_code not in range(200, 299):
return {}
return r.json()

def base_search(self, query_params): # search_type = spotify's type
headers = self.get_resource_header()
endpoint = 'https://api.spotify.com/v1/search'
lookup_url = f'{endpoint}?{query_params}'
r = requests.get(lookup_url, headers=headers)
if r.status_code not in range(200, 299):
return {}
return r.json()

def search(self, query=None, operator=None, operator_query=None, search_type='artist'):
if query == None:
raise Exception('A query is required.')
if isinstance(query, dict):
query = ' '.join([f'{k}:{v}' for k, v in query.items()])
if operator != None and operator_query != None:
if (operator.lower() == 'or') or (operator.lower() == 'not'): # Operators can only be OR or NOT
operator = operator.upper()
if isinstance(operator_query, str):
query = f'{query} {operator} {operator_query}'
query_params = urlencode({'q': query, 'type': search_type.lower()})
return self.base_search(query_params)

We implemented our functions to retrieve more specific data, such as band_id given band name, albums given band_id, tracks given album, features given track and popularity given track. To have more control during the process, each one of these processes were performed in a different for-loop and aggregated in a dataframe. Different approaches in this part of the data collection are encouraged, especially aiming performance gain.

我們實(shí)現(xiàn)了檢索特定數(shù)據(jù)的功能,例如, band_id指定樂(lè)隊(duì)名稱 , 專輯指定band_id , 曲目指定專輯 , 功能指定曲目 , 流行度指定曲目 。 為了在此過(guò)程中擁有更多控制權(quán) ,這些過(guò)程中的每一個(gè)都在不同的for循環(huán)中執(zhí)行,并匯總在一個(gè)數(shù)據(jù)幀中。 鼓勵(lì)在數(shù)據(jù)收集的這一部分采用不同的方法, 尤其是針對(duì)性能提升

Below, we sample the code used to catch band_id given band name.

下面,我們對(duì)用于捕獲給定樂(lè)隊(duì)名稱的band_id的代碼進(jìn)行示例。

spotify = SpotifyAPI(client_id, client_secret)%%timebands_id = []
bands_popularity = []for band in df_unique['Band']:
id_found = False
result = spotify.search(query=band, search_type='artist')
items = result['artists']['items']

if len(items) > 0: # Loop to check whether more than one band is in items and retrieve desired band
i = 0
while i < len(items):
artist = items[i]
if band.lower() == artist['name'].lower():
bands_id.append(artist['id'])
bands_popularity.append(artist['popularity'])
id_found = True
break
i = i + 1

if (id_found == False) or (len(items) == 0): # If band not found
bands_id.append(np.nan)
bands_popularity.append(np.nan)df_unique['Band ID'] = bands_id
df_unique['Band Popularity'] = bands_popularity
df_unique = df_unique.dropna() # Dropping bands with uri not found
df_unique.sort_values('Band')
df_unique

Finally, one could store it in a SQL database, but we did save it into a .csv file. At the end, our final dataframe contained 498576 songs. Not bad.

最后,可以將其存儲(chǔ)在SQL數(shù)據(jù)庫(kù)中,但是我們確實(shí)將其保存到.csv文件中。 最后,我們的最終數(shù)據(jù)幀包含498576首歌曲。 不錯(cuò)。

Photo by Edward Xu on Unsplash 愛(ài)德華·許在Unsplash上的照片

下一步是什么? (What’s next?)

After collecting all this data there are many possibilities. As exposed earlier, one could create genre classifiers giving audio features. Another possibility is to use the features to create playlists/recommender systems. Regression analysis could be applied to predict song/band popularity. Last but not least, developing an exploratory data analysis could mathematically show how each genre differs from each other. What would you like to see?

收集所有這些數(shù)據(jù)之后,有很多可能性。 如前所述,可以創(chuàng)建提供音頻功能的流派分類器。 另一種可能性是使用這些功能來(lái)創(chuàng)建播放列表/推薦系統(tǒng)回歸分析可用于預(yù)測(cè)歌曲/樂(lè)隊(duì)的受歡迎程度。 最后但并非最不重要的一點(diǎn)是,進(jìn)行探索性數(shù)據(jù)分析可以從數(shù)學(xué)上顯示出每種流派之間的差異。 你想看見(jiàn)什么?

For the complete code, please visit the GitHub repo below. Also, I actively use LinkedIn. Come say hi!

有關(guān)完整的代碼,請(qǐng)?jiān)L問(wèn)下面的GitHub存儲(chǔ)庫(kù)。 另外,我積極使用LinkedIn 。 快打個(gè)招呼吧!

翻譯自: https://medium.com/swlh/no-data-no-problem-how-to-collect-heavy-metal-data-from-wikipedia-and-spotify-f879762046ff

spotify 數(shù)據(jù)分析

總結(jié)

以上是生活随笔為你收集整理的spotify 数据分析_没有数据? 没问题! 如何从Wikipedia和Spotify收集重金属数据的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 另类综合网 | 中文在线字幕免费观看 | 亚洲视频在线视频 | 国产一级全黄 | 国产美女在线观看 | 国产日本精品 | 狠狠操狠狠操狠狠操 | 亚洲女人天堂成人av在线 | 国产成人在线视频免费观看 | 成人h动漫在线 | 美女又大又黄 | 一区二区三区四区精品 | 欧美视频日韩视频 | 精品一区二区免费视频 | 色就是色亚洲色图 | 性一交一乱一透一a级 | 免费福利小视频 | 久久五月网 | a级免费视频| 国产高清免费在线 | 亚洲AV成人无码久久精品同性 | 视频在线观看免费大片 | 亚洲人无码成www久久 | 娇小tube性极品娇小 | 亚洲一级一区 | www.日本在线 | 茄子av在线 | 日韩少妇一区二区三区 | 国产精品久久综合 | xxxx性视频 | 久久久久久18 | 中文免费视频 | xxxx日本免费 | 黄色大片av| 国产综合精品一区二区三区 | 久久精品大片 | 都市激情男人天堂 | 成人性视频网站 | 精品国产一区二区三区久久久蜜月 | 又爽又黄视频 | 无码少妇一区二区 | 无码av免费毛片一区二区 | 一区二区三区在线观看 | 亚洲桃色av | 日韩无码精品一区二区三区 | 久久一区国产 | 国产在线拍揄自揄拍 | 欧美日韩国产亚洲沙发 | 天天高潮夜夜爽 | 天堂在线观看免费视频 | 丰满女人又爽又紧又丰满 | 9色在线视频 | 抱着老师的嫩臀猛然挺进视频 | 521av在线 | 8x国产一区二区三区精品推荐 | 天天操天天艹 | 欧美一级鲁丝片 | 国产aaa级片 | 91国内精品久久久久 | 欧美高清日韩 | 日韩欧美亚洲综合 | 无套内谢大学处破女www小说 | 久久天堂电影 | 91麻豆精品一区二区三区 | caopeng视频| 亚洲淫欲 | 欧美性爱精品在线 | 国产精品美女网站 | 最新中文字幕在线观看视频 | 医生强烈淫药h调教小说视频 | 大粗鳮巴久久久久久久久 | 中文字幕在线观看第二页 | 久久国产福利 | 99精品视频免费在线观看 | 久久性 | 中文字幕在线字幕中文 | 亚洲无线看 | 国产精品蜜臀av | 丰满的女人性猛交 | 在线色资源 | 极品少妇xxxx精品少妇偷拍 | av在线不卡网站 | 黄色一级片久久 | 国产真实伦对白全集 | 亚洲乱色熟女一区二区三区 | 91麻豆免费视频 | 人妻互换一区二区激情偷拍 | 高hnp视频 | 经典杯子蛋糕日剧在线观看免费 | 久久黄色一级视频 | 成人免费一区二区 | 艳情五月 | 精品97人妻无码中文永久在线 | 黄色片视频免费在线观看 | 久久这里只有精品6 | 少女国产免费观看 | 国产欧美视频在线观看 | 美女av网址 | 国产精品网站免费 |