為了訓(xùn)練一個(gè)模型,你需要遵循以下兩步,第一步是使用symbol來構(gòu)造,然后調(diào)用model.Feedforward.create這個(gè)方法來創(chuàng)建一個(gè)model。下面的代碼創(chuàng)建了一個(gè)兩層的神經(jīng)網(wǎng)絡(luò)。
# configure a two layer neuralnetworkdata=mx.symbol.Variable('data')fc1=mx.symbol.FullyConnected(data,name='fc1',num_hidden=128)act1=mx.symbol.Activation(fc1,name='relu1',act_type='relu')fc2=mx.symbol.FullyConnected(act1,name='fc2',num_hidden=64)softmax=mx.symbol.SoftmaxOutput(fc2,name='sm')# create a modelmodel=mx.model.FeedForward.create(softmax,X=data_set,num_epoch=num_epoch,learning_rate=0.01) 你還可以使用scikit-learn一樣的風(fēng)格來構(gòu)造和擬合一個(gè)模型
# create a model using sklearn-style two step waymodel=mx.model.FeedForward(softmax,num_epoch=num_epoch,learning_rate=0.01)model.fit(X=data_set) 你如果想看更多的功能,請看Model API Reference
保存模型
# save a model to mymodel-symbol.json and mymodel-0100.paramsprefix='mymodel'iteration=100model.save(prefix,iteration)# load model backmodel_loaded=mx.model.FeedForward.load(prefix,iteration) 我們往往用一個(gè)腳本進(jìn)行對數(shù)據(jù)的訓(xùn)練,往往以前綴加序號(hào)的形式如mymodel-0100.params這樣的形式保存,然后用另一個(gè)腳本加載模型,并進(jìn)行預(yù)測來完成相應(yīng)的功能。
Model class of MXNet for training and predicting feedforward nets. This class is designed for a single-data single output supervised network.
Parameters:
symbol?(Symbol) – The symbol configuration of computation network.
ctx?(Context or list of Context, optional) – The device context of training and prediction. To use multi GPU training, pass in a list of gpu contexts.
num_epoch?(int, optional) – Training parameter, number of training epochs(epochs).
epoch_size?(int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size)
optimizer?(str or Optimizer, optional) – Training parameter, name or optimizer object for training.
initializer?(initializer function, optional) – Training parameter, the initialization scheme used.
numpy_batch_size?(int, optional) – The batch size of training data. Only needed when input array is numpy.
arg_params?(dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s weights.
aux_params?(dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s auxiliary states.
allow_extra_params?(boolean, optional) – Whether allow extra parameters that are not needed by symbol to be passed by aux_params and arg_params. If this is True, no error will be thrown when aux_params and arg_params contain extra parameters than needed.
begin_epoch?(int, optional) – The begining training epoch.
kwargs?(dict) – The additional keyword arguments passed to optimizer.
類功能:
MXNet的用來訓(xùn)練和預(yù)測前向傳播網(wǎng)絡(luò)的模型類。這個(gè)類設(shè)計(jì)來是為了得到一個(gè)單一輸出的監(jiān)督網(wǎng)絡(luò)。
參數(shù):
symbol(Symbol)-計(jì)算網(wǎng)絡(luò)的symbol構(gòu)造。
ctx(Context or list of Context,optional)-用來訓(xùn)練和預(yù)測的設(shè)備。如果要使用多個(gè)GPU,請傳入gpu上下文。
num_epoch(int,optional)-訓(xùn)練epoches的個(gè)數(shù)。
epoch_size(int,optional)- 一個(gè)epoch里面的batch的個(gè)數(shù)。默認(rèn)ceil(num_train_examples/batch_size)即訓(xùn)練的樣本的個(gè)數(shù)/batch的大小然后取整。
optimizer(str or Optimizer,optional)-訓(xùn)練參數(shù),名字或者相應(yīng)的優(yōu)化類用來訓(xùn)練的。
initializer(initializer function,optional)-訓(xùn)練參數(shù),用來初始化的組合。
numpy_batch_size(int,optional)-訓(xùn)練集的batch尺寸。只有當(dāng)輸入的數(shù)組是numpy的時(shí)候需要。
arg_params(一個(gè)NDArray的字符字典)-模型參數(shù),以及網(wǎng)絡(luò)權(quán)重字典。 aux_params(一個(gè)NDArray的字符字典)-模型參數(shù),以及一些附加狀態(tài)的字典。 allow_extra_params(boolean,optional)-是否需要一些額外的參數(shù),aux_params和arg_params不需要的。如果這是真的,那么就不會(huì)拋出錯(cuò)誤當(dāng)參數(shù)的個(gè)數(shù)超出所需要的參數(shù)的時(shí)候。
begin_epoch(int,optional)-開始訓(xùn)練的epoch,也就是說這一epoch后面的epoch都會(huì)重新訓(xùn)練。
kwargs(dict)-額外的關(guān)鍵參數(shù)被傳到optimizer里面的。
predict(X,?num_batch=None,?return_data=False,?reset=True)?
Run the prediction, always only use one device. :param X: :type X: mxnet.DataIter :param num_batch: the number of batch to run. Go though all batches if None :type num_batch: int or None
Returns:
y?– The predicted value of the output.
Return type:
numpy.ndarray or a list of numpy.ndarray if the network has multiple outputs.
Run the model on X and calculate the score with eval_metric :param X: :type X: mxnet.DataIter :param eval_metric: The metric for calculating score :type eval_metric: metric.metric :param num_batch: the number of batch to run. Go though all batches if None :type num_batch: int or None
X?(DataIter, or numpy.ndarray/NDArray) – Training data. If X is an DataIter, the name or, if not available, position, of its outputs should match the corresponding variable names defined in the symbolic graph.
y?(numpy.ndarray/NDArray, optional) – Training set label. If X is numpy.ndarray/NDArray, y is required to be set. While y can be 1D or 2D (with 2nd dimension as 1), its 1st dimension must be the same as X, i.e. the number of data points and labels should be equal.
eval_data?(DataIter or numpy.ndarray/list/NDArray pair) – If eval_data is numpy.ndarray/list/NDArray pair, it should be (valid_data, valid_label).
eval_metric?(metric.EvalMetric or str or callable) – The evaluation metric, name of evaluation metric. Or a customize evaluation function that returns the statistics based on minibatch.
epoch_end_callback?(callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch.
batch_end_callback?(callable(epoch)) – A callback that is invoked at end of each batch For print purpose
kvstore?(KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dist_async’ In default uses ‘local’, often no need to change for single machiine.
logger?(logging logger, optional) – When not specified, default logger will be used.
work_load_list?(float or int, optional) – The list of work load for different devices, in the same order as ctx
Checkpoint the model checkpoint into file. You can also use pickle to do the job if you only work on python. The advantage of load/save is the file is language agnostic. This means the file saved using save can be loaded by other language binding of mxnet. You also get the benefit being able to directly load/save from cloud storage(S3, HDFS)
Parameters:
prefix?(str) – Prefix of model name.
Notes
prefix-symbol.json?will be saved for symbol.
prefix-epoch.params?will be saved for parameters.
static?load(prefix,?epoch,?ctx=None,?**kwargs)
Load model checkpoint from file.
Parameters:
prefix?(str) – Prefix of model name.
epoch?(int) – epoch number of model we would like to load.
ctx?(Context or list of Context, optional) – The device context of training and prediction.
kwargs?(dict) – other parameters for model, including num_epoch, optimizer and numpy_batch_size
Returns:
model?– The loaded model that can be used for prediction.
Functional style to create a model. This function will be more consistent with functional languages such as R, where mutation is not allowed.
Parameters:
symbol?(Symbol) – The symbol configuration of computation network.
X?(DataIter) – Training data
y?(numpy.ndarray, optional) – If X is numpy.ndarray y is required to set
ctx?(Context or list of Context, optional) – The device context of training and prediction. To use multi GPU training, pass in a list of gpu contexts.
num_epoch?(int, optional) – Training parameter, number of training epochs(epochs).
epoch_size?(int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size)
optimizer?(str or Optimizer, optional) – Training parameter, name or optimizer object for training.
initializier?(initializer function, optional) – Training parameter, the initialization scheme used.
eval_data?(DataIter or numpy.ndarray pair) – If eval_set is numpy.ndarray pair, it should be (valid_data, valid_label)
eval_metric?(metric.EvalMetric or str or callable) – The evaluation metric, name of evaluation metric. Or a customize evaluation function that returns the statistics based on minibatch.
epoch_end_callback?(callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch.
batch_end_callback?(callable(epoch)) – A callback that is invoked at end of each batch For print purpose
kvstore?(KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dis_async’ In default uses ‘local’, often no need to change for single machiine.
logger?(logging logger, optional) – When not specified, default logger will be used.
work_load_list?(list of float or int, optional) – The list of work load for different devices, in the same order as ctx
allow_extra_outputs?(bool) – If true, the prediction outputs can have extra outputs. This is useful in RNN, where the states are also produced in outputs for forwarding.
allow_extra_outputs?(bool) – If true, the prediction outputs can have extra outputs. This is useful in RNN, where the states are also produced in outputs for forwarding.
mxnet.metric.create(metric,?**kwargs)
Create an evaluation metric.
Parameters:
metric?(str or callable) – The name of the metric, or a function providing statistics given pred, label NDArray
優(yōu)化API
Common Optimization algorithms with regularizations.
name?(str) – Name of required optimizer. Should be the name of a subclass of Optimizer. Case insensitive.
rescale_grad?(float) – Rescaling factor on gradient.
kwargs?(dict) – Parameters for optimizer
Returns:
opt?– The result optimizer.
Return type:
Optimizer
create_state(index,?weight)
Create additional optimizer state such as momentum. override in implementations.
update(index,?weight,?grad,?state)
Update the parameters. override in implementations
set_lr_scale(args_lrscale)
set lr scale is deprecated. Use set_lr_mult instead.
set_lr_mult(args_lr_mult)
Set individual learning rate multipler for parameters
Parameters:
args_lr_mult?(dict of string/int to float) – set the lr multipler for name/index to float. setting multipler by index is supported for backward compatibility, but we recommend using name and symbol.
set_wd_mult(args_wd_mult)
Set individual weight decay multipler for parameters. By default wd multipler is 0 for all params whose name doesn’t end with _weight, if param_idx2name is provided.
Parameters:
args_wd_mult?(dict of string/int to float) – set the wd multipler for name/index to float. setting multipler by index is supported for backward compatibility, but we recommend using name and symbol.
mxnet.optimizer.register(klass)
Register optimizers to the optimizer factory
class?mxnet.optimizer.SGD(momentum=0.0,?**kwargs)
A very simple SGD optimizer with momentum and weight regularization.
Parameters:
learning_rate?(float, optional) – learning_rate of SGD
momentum?(float, optional) – momentum value
wd?(float, optional) – L2 regularization coefficient add to all the weights
rescale_grad?(float, optional) – rescaling factor of gradient.
clip_gradient?(float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
param_idx2name?(dict of string/int to float, optional) – special treat weight decay in parameter ends with bias, gamma, and beta
create_state(index,?weight)
Create additional optimizer state such as momentum.
Parameters:
weight?(NDArray) – The weight data
update(index,?weight,?grad,?state)
Update the parameters.
Parameters:
index?(int) – An unique integer key used to index the parameters
weight?(NDArray) – weight ndarray
grad?(NDArray) – grad ndarray
state?(NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class?mxnet.optimizer.NAG(**kwargs)
SGD with nesterov It is implemented according to?https://github.com/torch/optim/blob/master/sgd.lua
update(index,?weight,?grad,?state)
Update the parameters.
Parameters:
index?(int) – An unique integer key used to index the parameters
weight?(NDArray) – weight ndarray
grad?(NDArray) – grad ndarray
state?(NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class?mxnet.optimizer.SGLD(**kwargs)
Stochastic Langevin Dynamics Updater to sample from a distribution.
Parameters:
learning_rate?(float, optional) – learning_rate of SGD
wd?(float, optional) – L2 regularization coefficient add to all the weights
rescale_grad?(float, optional) – rescaling factor of gradient.
clip_gradient?(float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
param_idx2name?(dict of string/int to float, optional) – special treat weight decay in parameter ends with bias, gamma, and beta
create_state(index,?weight)
Create additional optimizer state such as momentum.
Parameters:
weight?(NDArray) – The weight data
update(index,?weight,?grad,?state)
Update the parameters.
Parameters:
index?(int) – An unique integer key used to index the parameters
weight?(NDArray) – weight ndarray
grad?(NDArray) – grad ndarray
state?(NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
This code follows the version in?http://arxiv.org/pdf/1212.5701v1.pdf?Eq(5) by Matthew D. Zeiler, 2012. AdaGrad will help the network to converge faster in some cases.
Parameters:
learning_rate?(float, optional) – Step size. Default value is set to 0.05.
wd?(float, optional) – L2 regularization coefficient add to all the weights
rescale_grad?(float, optional) – rescaling factor of gradient.
eps?(float, optional) – A small float number to make the updating processing stable Default value is set to 1e-7.
clip_gradient?(float, optional) – clip gradient in range [-clip_gradient, clip_gradient]