机器学习特征构建_使用Streamlit构建您的基础机器学习Web应用
機器學習特征構建
Data scientist and ML experts often find it difficult to showcase their findings/result to others. Mostly ,power point or any web development tools are required to explain the results. With the introduction of Streamlit , it has become easier to develop web apps with python. We can also create multiple models and show the results based on the user selection. This comes as an easy-go hand in library specially for analyst or persons who would like to show a POC kind of solutions to the clients or to other team members. In this article, I have not detailed on the machine learning algorithms used as it is not the scope.
數據科學家和ML專家通常很難向他人展示他們的發現/結果。 通常,需要使用Power Point或任何Web開發工具來解釋結果。 隨著Streamlit的引入,使用python開發Web應用程序變得更加容易。 我們還可以創建多個模型并根據用戶選擇顯示結果。 這是易于使用的資料庫,專門供分析人員或想向客戶或其他團隊成員展示POC解決方案的人員使用。 在本文中,我沒有詳細介紹所使用的機器學習算法,因為它不是范圍。
Streamlit is an open-source python framework that allows us to create interactive websites for machine learning and data science related requirements[1]. In this article, we will develop a web app for classification algorithm where the user will be able to select the algorithm on which the model should be built, the model parameters and visualize the corresponding results of the model.
Streamlit是一個開放源代碼的python框架,允許我們創建用于機器學習和數據科學相關要求的交互式網站[1]。 在本文中,我們將開發一個用于分類算法的Web應用程序,在該應用程序中,用戶將能夠選擇應在其上構建模型的算法,模型參數并可視化模型的相應結果。
1. Data Set :
1.數據集:
For demonstration purpose, I have taken a smaller diabetes dataset from the following link (Kaggle). The objective of the dataset is to predict whether a patient is diabetic or non-diabetic. Personally speaking, I have only explored with smaller and medium size datasets using streamlit.
為了演示,我從下面的鏈接( Kaggle )中獲取了一個較小的糖尿病數據集。 數據集的目的是預測患者是糖尿病患者還是非糖尿病患者。 就個人而言,我僅使用streamlit探索了中小型數據集。
2. Installing Streamlit :
2.安裝Streamlit:
Let us begin by installing Streamlit using the command :
讓我們開始使用以下命令安裝Streamlit:
pip install streamlit
點安裝streamlit
Run the following command to ensure that the installation is working,
運行以下命令以確保安裝正常進行,
streamlit hello
流光打招呼
To run a web app, run the command,
要運行網絡應用,請運行以下命令,
streamlit run <filaname.py>
流式運行<filaname.py>
This command will open a browser, where the web app will be displayed. If any changes are made to the source file, we can dynamically observe the changes in the app by using the re-run option.
此命令將打開一個瀏覽器,將在其中顯示Web應用程序。 如果對源文件進行了任何更改,我們可以使用re-run選項動態觀察應用程序中的更改。
3. Streamlit Components:
3. Streamlit組件:
This article will discuss about the following components and how they are used in our machine learning web app,
本文將討論以下組件以及它們在我們的機器學習網絡應用中的使用方式,
? Checkbox
?復選框
? Title
?標題
? Sidebar
?側邊欄
? Markdown
?降價促銷
? Selectbox (drop-box)
?選擇框(下拉框)
? Multi-select
? 多選
? Radio (radio buttons)
?單選(單選按鈕)
? Number Input box
?數字輸入框
? Slider
?滑桿
? Caching
?緩存
? Button
?按鈕
Let us now import the streamilt and other necessary libraries for our machine learning model,
現在,讓我們為我們的機器學習模型導入streamilt和其他必要的庫,
import streamlit as stimport pandas as pdimport numpy as npfrom sklearn.svm import SVCfrom sklearn.linear_model import LogisticRegressionfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.preprocessing import LabelEncoderfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curvefrom sklearn.metrics import precision_score, recall_scoreTitle, Sidebar and Markdown
標題,邊欄和減價
To start with, let us add a title and sidebar to out app as below,
首先,讓我們如下向out應用添加標題和側邊欄,
st.title(“Predicting Diabetes Web App”)st.sidebar.title(“Model Selection Panel”)
st.markdown(“Affected by Diabetes or not ?”)
st.sidebar.markdown(“Choose your model and its parameters”)
The title function adds a title to our web app, the sidebar creates a side-panel like component for the app. Streamlit also provides with a facility of markdown using markdown function.
標題功能為我們的網絡應用程序添加了一個標題, 側邊欄為該應用程序創建了類似于側面板的組件。 Streamlit還使用markdown功能提供了markdown功能。
Checkbox
選框
Let us now learn how to add a checkbox at the sidebar to the app. In this app, we use the checkbox to load the data from our csv file . The syntax for adding checkbox is : checkbox(<label for our checkbox>,<True/False>).
現在讓我們學習如何在側邊欄為應用程序添加一個復選框。 在此應用中,我們使用復選框從c??sv文件中加載數據。 添加復選框的語法是: checkbox(<我們復選框的標簽>,<True / False>) 。
def load_data(): data = pd.read_csv(“diabetes.csv”)return datadf=load_data()if st.sidebar.checkbox(“Show raw data”, False): st.subheader(“Diabetes Raw Dataset”) st.write(df)Since the option is for the checkbox is “False”, the checkbox will be unchecked while the web app loads. After running the web app , the app can be visualized as below,
由于復選框的選項為“ False”,因此在加載Web應用程序時將取消選中該復選框。 運行網絡應用后,該應用可以如下顯示,
Caching
快取
Streamlit provides a functionality called caching, where data is not loaded each time when the app is loaded. Unless the data source is changed , the data is loaded from the cache and thus saving cpu cycles and memory time. i.e. Caching mechanism that allows your app to stay performant even when loading data from the web, manipulating large datasets, or performing expensive computations[1]. This is done with the help of the decorator @st.cache which is added before the function that requires the caching mechanism. In our case, it is added at the start of the data load function.
Streamlit提供了一種稱為緩存的功能,該功能在每次加載應用程序時都不會加載數據。 除非更改數據源,否則將從緩存中加載數據,從而節省CPU周期和內存時間。 即緩存機制,即使從Web加載數據,處理大型數據集或執行昂貴的計算,也可以使您的應用保持高性能。[1] 這是在裝飾器@ st.cache的幫助下完成的,該裝飾器在需要緩存機制的函數之前添加。 在我們的情況下,它是在數據加載功能開始時添加的。
#@st.cache(allow_output_mutation=True)@st.cache(persist=True)
def load_data():
data = pd.read_csv(“diabetes.csv”)
return data
The streamlit expects that the functions decorated with cache is not mutated within the function body.i.e. if the data that to be cached should not be changed during the course of the app. To by pass this , st.cache provides an option with allow_output_mutation=True and many such options which can be referred from their official site.
精打細算的人希望用緩存裝飾的功能不會在函數體內發生突變,即,如果不應在應用程序運行期間更改要緩存的數據。 為了繞過這個問題,st.cache提供了一個allow_output_mutation = True的選項以及許多這樣的選項,可以從其官方站點引用它們。
Drop-down
落下
The selectbox helps in adding a drop-down box for the app. For our app, we have used this feature to aid us in selecting the different classifiers that will be used in creating the machine learning model. The syntax is : selectbox(“<name for the selectbox”, (“<options to go into drop-down”>)).
選擇框有助于為應用添加一個下拉框。 對于我們的應用程序,我們已經使用此功能來幫助我們選擇將用于創建機器學習模型的不同分類器。 語法為: selectbox(“ <選擇框的名稱”,(“ <要進入下拉菜單的選項”>)) 。
st.sidebar.subheader(“Select your Classifier”)classifier = st.sidebar.selectbox(“Classifier”, (“Decision Tree”,”Support Vector Machine (SVM)”, “Logistic Regression”, “Random Forest”))
Multiselect
多選
Multiselect widget allows the user to make multiple selections in a list box. In our app, we use this widget to choose the metrics for evaluating our machine learning model. Let us use Confusion Matrix, ROC and Precision-Recall curve to evaluate our model. The multiselect follows the syntax very similar to select box as : multiselect(“<label for the multiselect>“, (‘<options to get displayed>’)). Now, let us include it in our app and re-run our app to see the changes.
Multiselect小部件允許用戶在列表框中進行多項選擇。 在我們的應用程序中,我們使用此小部件選擇評估我們的機器學習模型的指標。 讓我們使用混淆矩陣,ROC和Precision-Recall曲線評估模型。 多重選擇遵循與選擇框非常相似的語法: multiselect(“ <多重選擇的標簽>”,(('<要顯示的選項>'))) 。 現在,讓我們將其包含在我們的應用程序中,然后重新運行我們的應用程序以查看更改。
metrics = st.sidebar.multiselect(“Select your metrics : “, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))Radio
無線電
This widget adds radio button to our app. It can be done with the help of the syntax : radio(<label for the radio button>,<options to be displayed> , <index of the option for pre-selection on first render>, <function to change the label for the radio button>,key=<unique key for the widget that can be used to refer in other parts of our app>). In our app, one such place where we use them in selecting the criteria on which the decision tree algorithm works.
此小部件將單選按鈕添加到我們的應用程序。 可以借助以下語法完成此操作: radio (<單選按鈕的標簽>,<要顯示的選項>,<用于第一次渲染的預選選項的索引>,<用于更改標簽的功能單選按鈕>,key = <窗口小部件的唯一鍵,可用于在我們應用程序的其他部分中引用>) 。 在我們的應用程序中,我們使用它們來選擇決策樹算法所依據的標準。
criterion= st.sidebar.radio(“Criterion(measures the quality of split)”, (“gini”, “entropy”), key=’criterion’)Slider
滑桿
Let us discuss on how to add a sliding widget to our app. The syntax is as follows : slider(<label for the slider>,<minimum value in which the slider should start>,<maximum value till which the slider can be used>,<the values that is to be displayed on its first render>,step=<step interval at which the slider should increase/decrease >,key=<unique key for the slider>). In our app, we have used slider in adjusting the regularization parameter for logistic regression.
讓我們討論如何向我們的應用添加滑動小部件。 語法如下: slider ( < slider 標簽>,<滑塊應在其中開始的最小值>,<可以使用該滑塊之前的最大值>,<要在其第一次渲染時顯示的值>,step = <滑塊應增加/減少的步長>,key = <滑塊的唯一鍵> )。 在我們的應用程序中,我們已使用滑塊調整邏輯回歸的正則化參數。
Number Input box
號碼輸入框
Displays a numeric input widget where the users can input their numbers .
顯示一個數字輸入小部件,用戶可以在其中輸入數字。
number_input(<label for the widget>,<minimum value for the widget>,<maximum value for the widget>, <Default value of the widget when in it is rendered for the first time>, step=<time interval in which the value should increase/decrease>, format=<the format in which the widget should display numbers>, key=<unique key for the widget>)
number_input (<widget的標簽>,<widget的最小值>,<widget的最大值>,<首次呈現widget時widget的默認值>,step = <值應增加/減少>,格式= <小部件顯示數字的格式>,鍵= <小部件的唯一鍵>)
Eg :
例如:
n_estimators = st.sidebar.number_input(“The number of trees in the forest”, 100, 5000, step=10, key=’n_estimators’)Button
紐扣
This displays a clickable button on our app which can be added by the syntax : button(<Description about the button purpose>,key=<unique id for the button>).
這會在我們的應用程序上顯示一個可單擊的按鈕,該按鈕可以通過以下語法添加: button ( <按鈕用途的描述>,key = <按鈕的唯一ID> )。
button(“Classify”, key=’classify’)I have discussed few widgets, you can find more like progress bar, text_input and so on from Streamlit official documentation. The entire python script for the app is below, which yo can directly execute after installing streamlit to your python environment.
我討論了一些小部件,您可以從Streamlit官方文檔中找到更多類似進度條,text_input等的小部件。 該應用程序的整個python腳本在下面,您可以在將streamlit安裝到python環境后直接執行。
############ Import the required Libraries ##################import streamlit as st
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curve
from sklearn.metrics import precision_score, recall_scoredef main():
st.title(“Predicting Diabetes Web App”)
st.sidebar.title(“Model Selection Panel”)
st.markdown(“Affected by Diabetes or not ?”)
st.sidebar.markdown(“Choose your model and its parameters”)#@st.cache(allow_output_mutation=True)
@st.cache(persist=True)
def load_data():# Function to load our dataset
data = pd.read_csv(“diabetes.csv”)
return data
def split(df):# Split the data to ‘train and test’ sets
req_cols = [‘Pregnancies’, ‘Insulin’, ‘BMI’, ‘Age’,’Glucose’,’BloodPressure’,’DiabetesPedigreeFunction’]
x = df[req_cols] # Features for our algorithm
y = df.Outcome
x = df.drop(columns=[‘Outcome’])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
return x_train, x_test, y_train, y_test
def plot_metrics(metrics_list):
if ‘Confusion Matrix’ in metrics_list:
st.subheader(“Confusion Matrix”)
plot_confusion_matrix(model, x_test, y_test, display_labels=class_names)
st.pyplot()if ‘ROC Curve’ in metrics_list:
st.subheader(“ROC Curve”)
plot_roc_curve(model, x_test, y_test)
st.pyplot()
if ‘Precision-Recall Curve’ in metrics_list:
st.subheader(‘Precision-Recall Curve’)
plot_precision_recall_curve(model, x_test, y_test)
st.pyplot()
df=load_data()
class_names = [‘Diabetec’, ‘Non-Diabetic’]
x_train, x_test, y_train, y_test = split(df)
st.sidebar.subheader(“Select your Classifier”)
classifier = st.sidebar.selectbox(“Classifier”, (“Decision Tree”, “Logistic Regression”, “Random Forest”))if classifier == ‘Decision Tree’:
st.sidebar.subheader(“Model parameters”)
#choose parameters
criterion= st.sidebar.radio(“Criterion(measures the quality of split)”, (“gini”, “entropy”), key=’criterion’)
splitter = st.sidebar.radio(“Splitter (How to split at each node?)”, (“best”, “random”), key=’splitter’)
metrics = st.sidebar.multiselect(“Select your metrics : “, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))
if st.sidebar.button(“Classify”, key=’classify’):
st.subheader(“Decision Tree Results”)
model = DecisionTreeClassifier(criterion=criterion, splitter=splitter)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write(“Accuracy: “, accuracy.round(2)*100,”%”)
st.write(“Precision: “, precision_score(y_test, y_pred, labels=class_names).round(2))
st.write(“Recall: “, recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)
if classifier == ‘Logistic Regression’:
st.sidebar.subheader(“Model Parameters”)
C = st.sidebar.number_input(“C (Regularization parameter)”, 0.01, 10.0, step=0.01, key=’C_LR’)
max_iter = st.sidebar.slider(“Maximum number of iterations”, 100, 500, key=’max_iter’)metrics = st.sidebar.multiselect(“Select your metrics?”, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))if st.sidebar.button(“Classify”, key=’classify’):
st.subheader(“Logistic Regression Results”)
model = LogisticRegression(C=C, penalty=’l2', max_iter=max_iter)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write(“Accuracy: “, accuracy.round(2)*100,”%”)
st.write(“Precision: “, precision_score(y_test, y_pred, labels=class_names).round(2))
st.write(“Recall: “, recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)
if classifier == ‘Random Forest’:
st.sidebar.subheader(“Model Hyperparameters”)
n_estimators = st.sidebar.number_input(“The number of trees in the forest”, 100, 5000, step=10, key=’n_estimators’)
max_depth = st.sidebar.number_input(“The maximum depth of the tree”, 1, 20, step=1, key=’n_estimators’)
bootstrap = st.sidebar.radio(“Bootstrap samples when building trees”, (‘True’, ‘False’), key=’bootstrap’)
metrics = st.sidebar.multiselect(“What metrics to plot?”, (‘Confusion Matrix’, ‘ROC Curve’, ‘Precision-Recall Curve’))if st.sidebar.button(“Classify”, key=’classify’):
st.subheader(“Random Forest Results”)
model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, bootstrap=bootstrap, n_jobs=-1)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write(“Accuracy: “, accuracy.round(2)*100,”%”)
st.write(“Precision: “, precision_score(y_test, y_pred, labels=class_names).round(2))
st.write(“Recall: “, recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)
if st.sidebar.checkbox(“Show raw data”, False):
st.subheader(“Diabetes Raw Dataset”)
st.write(df)
if __name__ == ‘__main__’:
main()
https://docs.streamlit.io/en/stable/getting_started.html.
https://docs.streamlit.io/en/stable/getting_started.html 。
https://www.datacamp.com/community/tutorials/decision-tree-classification-python.
https://www.datacamp.com/community/tutorials/decision-tree-classification-python 。
https://www.kaggle.com/uciml/pima-indians-diabetes-database?select=diabetes.csv.
https://www.kaggle.com/uciml/pima-indians-diabetes-database?select=diabetes.csv 。
https://www.coursera.org/projects/data-science-streamlit-python.
https://www.coursera.org/projects/data-science-streamlit-python 。
https://www.coursera.org/projects/machine-learning-streamlit-python.
https://www.coursera.org/projects/machine-learning-streamlit-python 。
翻譯自: https://medium.com/analytics-vidhya/build-your-basic-machine-learning-web-app-with-streamlit-60e29e43f5f7
機器學習特征構建
總結
以上是生活随笔為你收集整理的机器学习特征构建_使用Streamlit构建您的基础机器学习Web应用的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: lpr浮动利率定价方式是什么意思
- 下一篇: 数学建模算法:支持向量机_从零开始的算法