當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python开发的模型部署_使用Python部署机器学习模型的10个实践经验

發(fā)布時(shí)間：2025/3/20 python 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 python开发的模型部署_使用Python部署机器学习模型的10个实践经验小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

以下文章來源于AI公園，作者ronghuaiyang

導(dǎo)讀

使用python部署ML項(xiàng)目的一些經(jīng)驗(yàn)。

有時(shí)候，作為數(shù)據(jù)科學(xué)家，我們會(huì)忘記公司付錢讓我們干什么。我們首先是開發(fā)人員，然后是研究人員，然后可能是數(shù)學(xué)家。我們的首要責(zé)任是快速開發(fā)無bug的解決方案。我們能做模型并不意味著我們就是神。它沒有給我們寫垃圾代碼的自由。

從一開始，我就犯了很多錯(cuò)誤，我想和大家分享一下我所看到的ML工程中最常見的技能。在我看來，這也是目前這個(gè)行業(yè)最缺乏的技能。我稱他們?yōu)椤败浖拿ぁ?#xff0c;因?yàn)樗麄冎械暮芏嗳硕际欠怯?jì)算機(jī)科學(xué)課程學(xué)習(xí)平臺(tái)(Coursera)的工程師。我自己曾經(jīng)就是

如果要在一個(gè)偉大的數(shù)據(jù)科學(xué)家和一個(gè)偉大的ML工程師之間招聘，我會(huì)選擇后者。讓我們開始吧。

1. 學(xué)會(huì)寫抽象類

一旦你開始編寫抽象類，你就會(huì)知道它能給你的代碼庫帶來多大的清晰度。它們執(zhí)行相同的方法和方法名稱。如果很多人都在同一個(gè)項(xiàng)目上工作，每個(gè)人都會(huì)開始使用不同的方法。這可能會(huì)造成無效率的混亂。

import os

from abc import ABCMeta, abstractmethod

class DataProcessor(metaclass=ABCMeta):

"""Base processor to be used for all preparation."""

def __init__(self, input_directory, output_directory):

self.input_directory = input_directory

self.output_directory = output_directory

@abstractmethod

def read(self):

"""Read raw data."""

@abstractmethod

def process(self):

"""Processes raw data. This step should create the raw dataframe with all the required features. Shouldn't implement statistical or text cleaning."""

@abstractmethod

def save(self):

"""Saves processed data."""

class Trainer(metaclass=ABCMeta):

"""Base trainer to be used for all models."""

def __init__(self, directory):

self.directory = directory

self.model_directory = os.path.join(directory, 'models')

@abstractmethod

def preprocess(self):

"""This takes the preprocessed data and returns clean data. This is more about statistical or text cleaning."""

@abstractmethod

def set_model(self):

"""Define model here."""

@abstractmethod

def fit_model(self):

"""This takes the vectorised data and returns a trained model."""

@abstractmethod

def generate_metrics(self):

"""Generates metric with trained model and test data."""

@abstractmethod

def save_model(self, model_name):

"""This method saves the model in our required format."""

class Predict(metaclass=ABCMeta):

"""Base predictor to be used for all models."""

def __init__(self, directory):

self.directory = directory

self.model_directory = os.path.join(directory, 'models')

@abstractmethod

def load_model(self):

"""Load model here."""

@abstractmethod

def preprocess(self):

"""This takes the raw data and returns clean data for prediction."""

@abstractmethod

def predict(self):

"""This is used for prediction."""

class BaseDB(metaclass=ABCMeta):

""" Base database class to be used for all DB connectors."""

@abstractmethod

def get_connection(self):

"""This creates a new DB connection."""

@abstractmethod

def close_connection(self):

"""This closes the DB connection."""

2. 在最前面設(shè)置你的隨機(jī)數(shù)種子

實(shí)驗(yàn)的可重復(fù)性是非常重要的，而種子是我們的敵人。抓住它，否則會(huì)導(dǎo)致不同的訓(xùn)練/測(cè)試數(shù)據(jù)分割和不同的權(quán)值初始化神經(jīng)網(wǎng)絡(luò)。這導(dǎo)致了不一致的結(jié)果。

def set_seed(args):

random.seed(args.seed)

np.random.seed(args.seed)

torch.manual_seed(args.seed)

if args.n_gpu > 0:

torch.cuda.manual_seed_all(args.seed)

3. 從幾行數(shù)據(jù)開始

如果你的數(shù)據(jù)太大，而你的工作是代碼的后面的部分，如清理數(shù)據(jù)或建模，那么可以使用nrows來避免每次加載巨大的數(shù)據(jù)。當(dāng)你只想測(cè)試代碼而不實(shí)際運(yùn)行整個(gè)代碼時(shí)，請(qǐng)使用此方法。當(dāng)你的本地PC配置無法加載所有的數(shù)據(jù)的時(shí)候，但你又喜歡在本地開發(fā)時(shí)，這是非常適用的，

df_train = pd.read_csv(‘train.csv’, nrows=1000)

4. 預(yù)見失敗(成熟開發(fā)人員的標(biāo)志)

一定要檢查數(shù)據(jù)中的NA，因?yàn)檫@些會(huì)給你以后帶來問題。即使你當(dāng)前的數(shù)據(jù)沒有，這并不意味著它不會(huì)在未來的再訓(xùn)練循環(huán)中發(fā)生。所以無論如何繼續(xù)檢查。

print(len(df))

df.isna().sum()

df.dropna()

print(len(df))

5. 顯示處理進(jìn)度

當(dāng)你在處理大數(shù)據(jù)時(shí)，知道它將花費(fèi)多少時(shí)間以及我們?cè)谡麄€(gè)處理過程中的位置肯定會(huì)讓你感覺很好。

選項(xiàng) 1 — tqdm

from tqdm import tqdm

import time

tqdm.pandas()

df['col'] = df['col'].progress_apply(lambda x: x**2)

text = ""

for char in tqdm(["a", "b", "c", "d"]):

time.sleep(0.25)

text = text + char

選項(xiàng) 2 — fastprogress

from fastprogress.fastprogress import master_bar, progress_bar

from time import sleep

mb = master_bar(range(10))

for i in mb:

for j in progress_bar(range(100), parent=mb):

sleep(0.01)

mb.child.comment = f'second bar stat'

mb.first_bar.comment = f'first bar stat'

mb.write(f'Finished loop {i}.')

6. Pandas很慢

如果你使用過pandas，你就會(huì)知道有時(shí)它有多慢 —— 尤其是groupby。不用打破頭尋找“偉大的”解決方案加速，只需使用modin改變一行代碼就可以了。

import modin.pandas as pd

7. 統(tǒng)計(jì)函數(shù)的時(shí)間不是所有的函數(shù)都是生而平等的

即使整個(gè)代碼都能工作，也不意味著你寫的代碼很棒。一些軟件bug實(shí)際上會(huì)使你的代碼變慢，所以有必要找到它們。使用這個(gè)裝飾器來記錄函數(shù)的時(shí)間。

import time

def timing(f):

"""Decorator for timing functions

Usage:

@timing

def function(a):

pass

"""

@wraps(f)

def wrapper(*args, **kwargs):

start = time.time()

result = f(*args, **kwargs)

end = time.time()

print('function:%r took: %2.2f sec' % (f.__name__, end - start))

return result

return wrapper

8. 不要在云上燒錢沒有人喜歡浪費(fèi)云資源的工程師。

我們的一些實(shí)驗(yàn)可以持續(xù)幾個(gè)小時(shí)。很難跟蹤它并在它完成時(shí)關(guān)閉云實(shí)例。我自己也犯過錯(cuò)誤，也見過有人把實(shí)例開了好幾天。這種情況發(fā)生在星期五，離開后，周一才意識(shí)到

只要在執(zhí)行結(jié)束時(shí)調(diào)用這個(gè)函數(shù)，你的屁股就再也不會(huì)著火了!!

但是將主代碼包裝在try中，此方法也包裝在except中 —— 這樣如果發(fā)生錯(cuò)誤，服務(wù)器就不會(huì)繼續(xù)運(yùn)行。是的，我也處理過這些情況讓我們更負(fù)責(zé)任一點(diǎn)，不要產(chǎn)生二氧化碳。

import os

def run_command(cmd):

return os.system(cmd)

def shutdown(seconds=0, os='linux'):

"""Shutdown system after seconds given. Useful for shutting EC2 to save costs."""

if os == 'linux':

run_command('sudo shutdown -h -t sec %s' % seconds)

elif os == 'windows':

run_command('shutdown -s -t %s' % seconds)

9. 創(chuàng)建和保存報(bào)告

在建模的某個(gè)特定點(diǎn)之后，所有偉大的見解都只來自錯(cuò)誤和度量分析。確保為自己和你的管理層創(chuàng)建和保存格式良好的報(bào)告。管理層喜歡報(bào)告，對(duì)嗎？

import json

import os

from sklearn.metrics import (accuracy_score, classification_report,

confusion_matrix, f1_score, fbeta_score)

def get_metrics(y, y_pred, beta=2, average_method='macro', y_encoder=None):

if y_encoder:

y = y_encoder.inverse_transform(y)

y_pred = y_encoder.inverse_transform(y_pred)

return {

'accuracy': round(accuracy_score(y, y_pred), 4),

'f1_score_macro': round(f1_score(y, y_pred, average=average_method), 4),

'fbeta_score_macro': round(fbeta_score(y, y_pred, beta, average=average_method), 4),

'report': classification_report(y, y_pred, output_dict=True),

'report_csv': classification_report(y, y_pred, output_dict=False).replace('\n','\r\n')

}

def save_metrics(metrics: dict, model_directory, file_name):

path = os.path.join(model_directory, file_name + '_report.txt')

classification_report_to_csv(metrics['report_csv'], path)

metrics.pop('report_csv')

path = os.path.join(model_directory, file_name + '_metrics.json')

json.dump(metrics, open(path, 'w'), indent=4)

10. 寫好APIs所有的結(jié)果都是壞的。All that ends bad is bad.

你可以進(jìn)行很好的數(shù)據(jù)清理和建模，但最終仍可能造成巨大的混亂。我與人打交道的經(jīng)驗(yàn)告訴我，許多人不清楚如何編寫好的api、文檔和服務(wù)器設(shè)置。我很快會(huì)寫另一篇關(guān)于這個(gè)的文章，但是讓我開始吧。下面是在不太高的負(fù)載下(比如1000/min)部署經(jīng)典的ML和DL的好方法。

fasbut + uvicornFastest — 使用fastapi編寫API，因?yàn)樗芸臁?/p>

Documentation — 用fastapi寫API讓我們不用操心文檔。

Workers — 使用uvicorn部署API

使用4個(gè)worker運(yùn)行這些命令進(jìn)行部署。通過負(fù)載測(cè)試優(yōu)化workers的數(shù)量。

—END—

總結(jié)

以上是生活随笔為你收集整理的python开发的模型部署_使用Python部署机器学习模型的10个实践经验的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： ios 获取一个枚举的所有值_Java
下一篇：中fuse_一个Fanotify和FUS