當(dāng)前位置：首頁(yè) > 人工智能 > pytorch >内容正文

pytorch

Keras:基于Theano和TensorFlow的深度学习库

發(fā)布時(shí)間：2025/3/15 pytorch 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 Keras:基于Theano和TensorFlow的深度学习库小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

原文鏈接：https://www.cnblogs.com/littlehann/p/6442161.html

catalogue

引言

一些基本概念

Sequential模型

泛型模型

常用層

卷積層

池化層

遞歸層Recurrent

嵌入層 Embedding

1. 引言

Keras是一個(gè)高層神經(jīng)網(wǎng)絡(luò)庫(kù)，Keras由純Python編寫(xiě)而成并基Tensorflow或Theano

簡(jiǎn)易和快速的原型設(shè)計(jì)（keras具有高度模塊化，極簡(jiǎn)，和可擴(kuò)充特性）
支持CNN和RNN，或二者的結(jié)合
支持任意的鏈接方案（包括多輸入和多輸出訓(xùn)練）
無(wú)縫CPU和GPU切換

0x1: Keras設(shè)計(jì)原則

模塊性: 模型可理解為一個(gè)獨(dú)立的序列或圖，完全可配置的模塊以最少的代價(jià)自由組合在一起。具體而言，網(wǎng)絡(luò)層、損失函數(shù)、優(yōu)化器、初始化策略、激活函數(shù)、正則化方法都是獨(dú)立的模塊，我們可以使用它們來(lái)構(gòu)建自己的模型

極簡(jiǎn)主義: 每個(gè)模塊都應(yīng)該盡量的簡(jiǎn)潔。每一段代碼都應(yīng)該在初次閱讀時(shí)都顯得直觀(guān)易懂。沒(méi)有黑魔法，因?yàn)樗鼘⒔o迭代和創(chuàng)新帶來(lái)麻煩

易擴(kuò)展性: 添加新模塊超級(jí)簡(jiǎn)單的容易，只需要仿照現(xiàn)有的模塊編寫(xiě)新的類(lèi)或函數(shù)即可。創(chuàng)建新模塊的便利性使得Keras更適合于先進(jìn)的研究工作

與Python協(xié)作: Keras沒(méi)有單獨(dú)的模型配置文件類(lèi)型，模型由python代碼描述，使其更緊湊和更易debug，并提供了擴(kuò)展的便利性

0x2: 快速開(kāi)始

sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran pip install scipy

Keras的核心數(shù)據(jù)結(jié)構(gòu)是“模型”，模型是一種組織網(wǎng)絡(luò)層的方式。Keras中主要的模型是Sequential模型，Sequential是一系列網(wǎng)絡(luò)層按順序構(gòu)成的棧

from keras.models import Sequentialmodel = Sequential()

將一些網(wǎng)絡(luò)層通過(guò).add()堆疊起來(lái)，就構(gòu)成了一個(gè)模型：

from keras.layers import Dense, Activationmodel.add(Dense(output_dim=64, input_dim=100)) model.add(Activation("relu")) model.add(Dense(output_dim=10)) model.add(Activation("softmax"))

完成模型的搭建后，我們需要使用.compile()方法來(lái)編譯模型：

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

編譯模型時(shí)必須指明損失函數(shù)和優(yōu)化器，如果你需要的話(huà)，也可以自己定制損失函數(shù)。Keras的一個(gè)核心理念就是簡(jiǎn)明易用同時(shí)，保證用戶(hù)對(duì)Keras的絕對(duì)控制力度，用戶(hù)可以根據(jù)自己的需要定制自己的模型、網(wǎng)絡(luò)層，甚至修改源代碼

from keras.optimizers import SGD model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))

完成模型編譯后，我們?cè)谟?xùn)練數(shù)據(jù)上按batch進(jìn)行一定次數(shù)的迭代訓(xùn)練，以擬合網(wǎng)絡(luò)

model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)

當(dāng)然，我們也可以手動(dòng)將一個(gè)個(gè)batch的數(shù)據(jù)送入網(wǎng)絡(luò)中訓(xùn)練，這時(shí)候需要使用

model.train_on_batch(X_batch, Y_batch)

隨后，我們可以使用一行代碼對(duì)我們的模型進(jìn)行評(píng)估，看看模型的指標(biāo)是否滿(mǎn)足我們的要求

loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=32)

或者，我們可以使用我們的模型，對(duì)新的數(shù)據(jù)進(jìn)行預(yù)測(cè)

classes = model.predict_classes(X_test, batch_size=32) proba = model.predict_proba(X_test, batch_size=32)

Relevant Link:

https://github.com/fchollet/keras
http://playground.tensorflow.org/#activation=tanh&regularization=L1&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0.001&noise=45&networkShape=4,5&seed=0.75320&showTestData=true&discretize=true&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification&initZero=false&hideText=false

2. 一些基本概念

0x1: 符號(hào)計(jì)算

Keras的底層庫(kù)使用Theano或TensorFlow，這兩個(gè)庫(kù)也稱(chēng)為Keras的后端。無(wú)論是Theano還是TensorFlow，都是一個(gè)"符號(hào)主義"的庫(kù)。
因此，這也使得Keras的編程與傳統(tǒng)的Python代碼有所差別?；\統(tǒng)的說(shuō)，符號(hào)主義的計(jì)算首先定義各種變量，然后建立一個(gè)“計(jì)算圖”，計(jì)算圖規(guī)定了各個(gè)變量之間的計(jì)算關(guān)系。建立好的計(jì)算圖需要編譯已確定其內(nèi)部細(xì)節(jié)，然而，此時(shí)的計(jì)算圖還是一個(gè)"空殼子"，里面沒(méi)有任何實(shí)際的數(shù)據(jù)，只有當(dāng)你把需要運(yùn)算的輸入放進(jìn)去后，才能在整個(gè)模型中形成數(shù)據(jù)流，從而形成輸出值。
Keras的模型搭建形式就是這種方法，在你搭建Keras模型完畢后，你的模型就是一個(gè)空殼子，只有實(shí)際生成可調(diào)用的函數(shù)后(K.function)，輸入數(shù)據(jù)，才會(huì)形成真正的數(shù)據(jù)流

0x2: 張量

使用這個(gè)詞匯的目的是為了表述統(tǒng)一，張量可以看作是向量、矩陣的自然推廣，我們用張量來(lái)表示廣泛的數(shù)據(jù)類(lèi)型
規(guī)模最小的張量是0階張量，即標(biāo)量，也就是一個(gè)數(shù)
當(dāng)我們把一些數(shù)有序的排列起來(lái)，就形成了1階張量，也就是一個(gè)向量
如果我們繼續(xù)把一組向量有序的排列起來(lái)，就形成了2階張量，也就是一個(gè)矩陣
把矩陣摞起來(lái)，就是3階張量，我們可以稱(chēng)為一個(gè)立方體，具有3個(gè)顏色通道的彩色圖片就是一個(gè)這樣的立方體
張量的階數(shù)有時(shí)候也稱(chēng)為維度，或者軸，軸這個(gè)詞翻譯自英文axis。譬如一個(gè)矩陣[[1,2],[3,4]]，是一個(gè)2階張量，有兩個(gè)維度或軸，沿著第0個(gè)軸（為了與python的計(jì)數(shù)方式一致，本文檔維度和軸從0算起）你看到的是[1,2]，[3,4]兩個(gè)向量，沿著第1個(gè)軸你看到的是[1,3]，[2,4]兩個(gè)向量。

import numpy as npa = np.array([[1,2],[3,4]]) sum0 = np.sum(a, axis=0) sum1 = np.sum(a, axis=1)print(sum0) print(sum1)

0x3: 泛型模型

在原本的Keras版本中，模型其實(shí)有兩種

一種叫Sequential，稱(chēng)為序貫?zāi)Ｐ?#xff0c;也就是單輸入單輸出，一條路通到底，層與層之間只有相鄰關(guān)系，跨層連接統(tǒng)統(tǒng)沒(méi)有。這種模型編譯速度快，操作上也比較簡(jiǎn)單

第二種模型稱(chēng)為Graph，即圖模型，這個(gè)模型支持多輸入多輸出，層與層之間想怎么連怎么連，但是編譯速度慢?？梢钥吹?#xff0c;Sequential其實(shí)是Graph的一個(gè)特殊情況

在現(xiàn)在這版Keras中，圖模型被移除，而增加了了“functional model API”，這個(gè)東西，更加強(qiáng)調(diào)了Sequential是特殊情況這一點(diǎn)。一般的模型就稱(chēng)為Model，然后如果你要用簡(jiǎn)單的Sequential，OK，那還有一個(gè)快捷方式Sequential。

Relevant Link:

http://keras-cn.readthedocs.io/en/latest/getting_started/concepts/

3. Sequential模型

Sequential是多個(gè)網(wǎng)絡(luò)層的線(xiàn)性堆疊
可以通過(guò)向Sequential模型傳遞一個(gè)layer的list來(lái)構(gòu)造該模型

from keras.models import Sequential from keras.layers import Dense, Activationmodel = Sequential([ Dense(32, input_dim=784), Activation('relu'), Dense(10), Activation('softmax'), ])

也可以通過(guò).add()方法一個(gè)個(gè)的將layer加入模型中：

model = Sequential() model.add(Dense(32, input_dim=784)) model.add(Activation('relu'))

0x1: 指定輸入數(shù)據(jù)的shape

模型需要知道輸入數(shù)據(jù)的shape，因此，Sequential的第一層需要接受一個(gè)關(guān)于輸入數(shù)據(jù)shape的參數(shù)，后面的各個(gè)層則可以自動(dòng)的推導(dǎo)出中間數(shù)據(jù)的shape，因此不需要為每個(gè)層都指定這個(gè)參數(shù)。有幾種方法來(lái)為第一層指定輸入數(shù)據(jù)的shape

傳遞一個(gè)input_shape的關(guān)鍵字參數(shù)給第一層，input_shape是一個(gè)tuple類(lèi)型的數(shù)據(jù)，其中也可以填入None，如果填入None則表示此位置可能是任何正整數(shù)。數(shù)據(jù)的batch大小不應(yīng)包含在其中。

傳遞一個(gè)batch_input_shape的關(guān)鍵字參數(shù)給第一層，該參數(shù)包含數(shù)據(jù)的batch大小。該參數(shù)在指定固定大小batch時(shí)比較有用，例如在stateful RNNs中。事實(shí)上，Keras在內(nèi)部會(huì)通過(guò)添加一個(gè)None將input_shape轉(zhuǎn)化為batch_input_shape

有些2D層，如Dense，支持通過(guò)指定其輸入維度input_dim來(lái)隱含的指定輸入數(shù)據(jù)shape。一些3D的時(shí)域?qū)又С滞ㄟ^(guò)參數(shù)input_dim和input_length來(lái)指定輸入shape

下面的三個(gè)指定輸入數(shù)據(jù)shape的方法是嚴(yán)格等價(jià)的

model = Sequential() model.add(Dense(32, input_shape=(784,)))model = Sequential() model.add(Dense(32, batch_input_shape=(None, 784))) # note that batch dimension is "None" here, # so the model will be able to process batches of any size.</pre>model = Sequential() model.add(Dense(32, input_dim=784))

下面三種方法也是嚴(yán)格等價(jià)的：

model = Sequential() model.add(LSTM(32, input_shape=(10, 64)))model = Sequential() model.add(LSTM(32, batch_input_shape=(None, 10, 64)))model = Sequential() model.add(LSTM(32, input_length=10, input_dim=64))

0x2: Merge層

多個(gè)Sequential可經(jīng)由一個(gè)Merge層合并到一個(gè)輸出。Merge層的輸出是一個(gè)可以被添加到新 Sequential的層對(duì)象。下面這個(gè)例子將兩個(gè)Sequential合并到一起(activation得到最終結(jié)果矩陣)

from keras.layers import Mergeleft_branch = Sequential() left_branch.add(Dense(32, input_dim=784))right_branch = Sequential() right_branch.add(Dense(32, input_dim=784))merged = Merge([left_branch, right_branch], mode='concat')final_model = Sequential() final_model.add(merged) final_model.add(Dense(10, activation='softmax'))

Merge層支持一些預(yù)定義的合并模式，包括：

sum(defualt):逐元素相加
concat:張量串聯(lián)，可以通過(guò)提供concat_axis的關(guān)鍵字參數(shù)指定按照哪個(gè)軸進(jìn)行串聯(lián)
mul：逐元素相乘
ave：張量平均
dot：張量相乘，可以通過(guò)dot_axis關(guān)鍵字參數(shù)來(lái)指定要消去的軸
cos：計(jì)算2D張量（即矩陣）中各個(gè)向量的余弦距離

這個(gè)兩個(gè)分支的模型可以通過(guò)下面的代碼訓(xùn)練:

final_model.compile(optimizer='rmsprop', loss='categorical_crossentropy') final_model.fit([input_data_1, input_data_2], targets) # we pass one data array per model input

也可以為Merge層提供關(guān)鍵字參數(shù)mode，以實(shí)現(xiàn)任意的變換，例如：

merged = Merge([left_branch, right_branch], mode=lambda x: x[0] - x[1])

對(duì)于不能通過(guò)Sequential和Merge組合生成的復(fù)雜模型，可以參考泛型模型API

0x3: 編譯

在訓(xùn)練模型之前，我們需要通過(guò)compile來(lái)對(duì)學(xué)習(xí)過(guò)程進(jìn)行配置。compile接收三個(gè)參數(shù)：

優(yōu)化器optimizer：該參數(shù)可指定為已預(yù)定義的優(yōu)化器名，如rmsprop、adagrad，或一個(gè)Optimizer類(lèi)的對(duì)象

損失函數(shù)loss：該參數(shù)為模型試圖最小化的目標(biāo)函數(shù)，它可為預(yù)定義的損失函數(shù)名，如categorical_crossentropy、mse，也可以為一個(gè)損失函數(shù)

指標(biāo)列表metrics：對(duì)分類(lèi)問(wèn)題，我們一般將該列表設(shè)置為metrics=[‘a(chǎn)ccuracy’]。指標(biāo)可以是一個(gè)預(yù)定義指標(biāo)的名字,也可以是一個(gè)用戶(hù)定制的函數(shù).指標(biāo)函數(shù)應(yīng)該返回單個(gè)張量,或一個(gè)完成metric_name - > metric_value映射的字典

指標(biāo)列表就是用來(lái)生成最后的判斷結(jié)果的：

# for a multi-class classification problem model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])# for a binary classification problem model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])# for a mean squared error regression problem model.compile(optimizer='rmsprop', loss='mse')# for custom metrics import keras.backend as Kdef mean_pred(y_true, y_pred):return K.mean(y_pred)def false_rates(y_true, y_pred):false_neg = ...false_pos = ...return {'false_neg': false_neg,'false_pos': false_pos,}model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy', mean_pred, false_rates])

0x4: 訓(xùn)練

Keras以Numpy數(shù)組作為輸入數(shù)據(jù)和標(biāo)簽的數(shù)據(jù)類(lèi)型。訓(xùn)練模型一般使用fit函數(shù)：

# for a single-input model with 2 classes (binary): model = Sequential() model.add(Dense(1, input_dim=784, activation='sigmoid')) model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])# generate dummy data import numpy as np data = np.random.random((1000, 784)) labels = np.random.randint(2, size=(1000, 1))# train the model, iterating on the data in batches # of 32 samples model.fit(data, labels, nb_epoch=10, batch_size=32)

另一個(gè)栗子：

# for a multi-input model with 10 classes:left_branch = Sequential() left_branch.add(Dense(32, input_dim=784))right_branch = Sequential() right_branch.add(Dense(32, input_dim=784))merged = Merge([left_branch, right_branch], mode='concat')model = Sequential() model.add(merged) model.add(Dense(10, activation='softmax'))model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])# generate dummy data import numpy as np from keras.utils.np_utils import to_categorical data_1 = np.random.random((1000, 784)) data_2 = np.random.random((1000, 784))# these are integers between 0 and 9 labels = np.random.randint(10, size=(1000, 1)) # we convert the labels to a binary matrix of size (1000, 10) # for use with categorical_crossentropy labels = to_categorical(labels, 10)# train the model # note that we are passing a list of Numpy arrays as training data # since the model has 2 inputs model.fit([data_1, data_2], labels, nb_epoch=10, batch_size=32)

0x5: 一些栗子

1. 基于多層感知器的softmax多分類(lèi)

from keras.models import Sequential from keras.layers import Dense, Dropout, Activation from keras.optimizers import SGDmodel = Sequential() # Dense(64) is a fully-connected layer with 64 hidden units. # in the first layer, you must specify the expected input data shape: # here, 20-dimensional vectors. model.add(Dense(64, input_dim=20, init='uniform')) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Dense(64, init='uniform')) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Dense(10, init='uniform')) model.add(Activation('softmax'))sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])model.fit(X_train, y_train,nb_epoch=20,batch_size=16) score = model.evaluate(X_test, y_test, batch_size=16)

2. 相似MLP的另一種實(shí)現(xiàn)

model = Sequential() model.add(Dense(64, input_dim=20, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(64, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax'))model.compile(loss='categorical_crossentropy',optimizer='adadelta',metrics=['accuracy'])

3. 用于二分類(lèi)的多層感知器

model = Sequential() model.add(Dense(64, input_dim=20, init='uniform', activation='relu')) model.add(Dropout(0.5)) model.add(Dense(64, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid'))model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])

4. 類(lèi)似VGG的卷積神經(jīng)網(wǎng)絡(luò)

from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Convolution2D, MaxPooling2D from keras.optimizers import SGDmodel = Sequential() # input: 100x100 images with 3 channels -> (3, 100, 100) tensors. # this applies 32 convolution filters of size 3x3 each. model.add(Convolution2D(32, 3, 3, border_mode='valid', input_shape=(3, 100, 100))) model.add(Activation('relu')) model.add(Convolution2D(32, 3, 3)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))model.add(Convolution2D(64, 3, 3, border_mode='valid')) model.add(Activation('relu')) model.add(Convolution2D(64, 3, 3)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))model.add(Flatten()) # Note: Keras does automatic shape inference. model.add(Dense(256)) model.add(Activation('relu')) model.add(Dropout(0.5))model.add(Dense(10)) model.add(Activation('softmax'))sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd)model.fit(X_train, Y_train, batch_size=32, nb_epoch=1)

5. 使用LSTM的序列分類(lèi)

from keras.models import Sequential from keras.layers import Dense, Dropout, Activation from keras.layers import Embedding from keras.layers import LSTMmodel = Sequential() model.add(Embedding(max_features, 256, input_length=maxlen)) model.add(LSTM(output_dim=128, activation='sigmoid', inner_activation='hard_sigmoid')) model.add(Dropout(0.5)) model.add(Dense(1)) model.add(Activation('sigmoid'))model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])model.fit(X_train, Y_train, batch_size=16, nb_epoch=10) score = model.evaluate(X_test, Y_test, batch_size=16)

6. 用于序列分類(lèi)的棧式LSTM

在該模型中，我們將三個(gè)LSTM堆疊在一起，是該模型能夠?qū)W習(xí)更高層次的時(shí)域特征表示。
開(kāi)始的兩層LSTM返回其全部輸出序列，而第三層LSTM只返回其輸出序列的最后一步結(jié)果，從而其時(shí)域維度降低（即將輸入序列轉(zhuǎn)換為單個(gè)向量）

from keras.models import Sequential from keras.layers import LSTM, Dense import numpy as npdata_dim = 16 timesteps = 8 nb_classes = 10# expected input data shape: (batch_size, timesteps, data_dim) model = Sequential() model.add(LSTM(32, return_sequences=True,input_shape=(timesteps, data_dim))) # returns a sequence of vectors of dimension 32 model.add(LSTM(32, return_sequences=True)) # returns a sequence of vectors of dimension 32 model.add(LSTM(32)) # return a single vector of dimension 32 model.add(Dense(10, activation='softmax'))model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])# generate dummy training data x_train = np.random.random((1000, timesteps, data_dim)) y_train = np.random.random((1000, nb_classes))# generate dummy validation data x_val = np.random.random((100, timesteps, data_dim)) y_val = np.random.random((100, nb_classes))model.fit(x_train, y_train,batch_size=64, nb_epoch=5,validation_data=(x_val, y_val))

7. 采用狀態(tài)LSTM的相同模型

狀態(tài)（stateful）LSTM的特點(diǎn)是，在處理過(guò)一個(gè)batch的訓(xùn)練數(shù)據(jù)后，其內(nèi)部狀態(tài)（記憶）會(huì)被作為下一個(gè)batch的訓(xùn)練數(shù)據(jù)的初始狀態(tài)。狀態(tài)LSTM使得我們可以在合理的計(jì)算復(fù)雜度內(nèi)處理較長(zhǎng)序列

from keras.models import Sequential from keras.layers import LSTM, Dense import numpy as npdata_dim = 16 timesteps = 8 nb_classes = 10 batch_size = 32# expected input batch shape: (batch_size, timesteps, data_dim) # note that we have to provide the full batch_input_shape since the network is stateful. # the sample of index i in batch k is the follow-up for the sample i in batch k-1. model = Sequential() model.add(LSTM(32, return_sequences=True, stateful=True,batch_input_shape=(batch_size, timesteps, data_dim))) model.add(LSTM(32, return_sequences=True, stateful=True)) model.add(LSTM(32, stateful=True)) model.add(Dense(10, activation='softmax'))model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])# generate dummy training data x_train = np.random.random((batch_size * 10, timesteps, data_dim)) y_train = np.random.random((batch_size * 10, nb_classes))# generate dummy validation data x_val = np.random.random((batch_size * 3, timesteps, data_dim)) y_val = np.random.random((batch_size * 3, nb_classes))model.fit(x_train, y_train,batch_size=batch_size, nb_epoch=5,validation_data=(x_val, y_val))

8. 將兩個(gè)LSTM合并作為編碼端來(lái)處理兩路序列的分類(lèi)

兩路輸入序列通過(guò)兩個(gè)LSTM被編碼為特征向量
兩路特征向量被串連在一起，然后通過(guò)一個(gè)全連接網(wǎng)絡(luò)得到結(jié)果

from keras.models import Sequential from keras.layers import Merge, LSTM, Dense import numpy as npdata_dim = 16 timesteps = 8 nb_classes = 10encoder_a = Sequential() encoder_a.add(LSTM(32, input_shape=(timesteps, data_dim)))encoder_b = Sequential() encoder_b.add(LSTM(32, input_shape=(timesteps, data_dim)))decoder = Sequential() decoder.add(Merge([encoder_a, encoder_b], mode='concat')) decoder.add(Dense(32, activation='relu')) decoder.add(Dense(nb_classes, activation='softmax'))decoder.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])# generate dummy training data x_train_a = np.random.random((1000, timesteps, data_dim)) x_train_b = np.random.random((1000, timesteps, data_dim)) y_train = np.random.random((1000, nb_classes))# generate dummy validation data x_val_a = np.random.random((100, timesteps, data_dim)) x_val_b = np.random.random((100, timesteps, data_dim)) y_val = np.random.random((100, nb_classes))decoder.fit([x_train_a, x_train_b], y_train,batch_size=64, nb_epoch=5,validation_data=([x_val_a, x_val_b], y_val))

Relevant Link:

http://www.jianshu.com/p/9dc9f41f0b29
http://keras-cn.readthedocs.io/en/latest/getting_started/sequential_model/

4. 泛型模型

Keras泛型模型接口是用戶(hù)定義多輸出模型、非循環(huán)有向模型或具有共享層的模型等復(fù)雜模型的途徑

層對(duì)象接受張量為參數(shù)，返回一個(gè)張量。張量在數(shù)學(xué)上只是數(shù)據(jù)結(jié)構(gòu)的擴(kuò)充，一階張量就是向量，二階張量就是矩陣，三階張量就是立方體。在這里張量只是廣義的表達(dá)一種數(shù)據(jù)結(jié)構(gòu)，例如一張彩色圖像其實(shí)就是一個(gè)三階張量(每一階都是one-hot向量)，它由三個(gè)通道的像素值堆疊而成。而10000張彩色圖構(gòu)成的一個(gè)數(shù)據(jù)集合則是四階張量。

輸入是張量，輸出也是張量的一個(gè)框架就是一個(gè)模型

這樣的模型可以被像Keras的Sequential一樣被訓(xùn)練

例如這個(gè)全連接網(wǎng)絡(luò)：

from keras.layers import Input, Dense from keras.models import Model# this returns a tensor inputs = Input(shape=(784,))# a layer instance is callable on a tensor, and returns a tensor x = Dense(64, activation='relu')(inputs) x = Dense(64, activation='relu')(x) predictions = Dense(10, activation='softmax')(x)# this creates a model that includes # the Input layer and three Dense layers model = Model(input=inputs, output=predictions) model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy']) model.fit(data, labels) # starts training

0x1: 所有的模型都是可調(diào)用的，就像層一樣

利用泛型模型的接口，我們可以很容易的重用已經(jīng)訓(xùn)練好的模型：你可以把模型當(dāng)作一個(gè)層一樣，通過(guò)提供一個(gè)tensor來(lái)調(diào)用它。注意當(dāng)你調(diào)用一個(gè)模型時(shí)，你不僅僅重用了它的結(jié)構(gòu)，也重用了它的權(quán)重

x = Input(shape=(784,)) # this works, and returns the 10-way softmax we defined above. y = model(x)

這種方式可以允許你快速的創(chuàng)建能處理序列信號(hào)的模型，你可以很快將一個(gè)圖像分類(lèi)的模型變?yōu)橐粋€(gè)對(duì)視頻分類(lèi)的模型，只需要一行代碼：

from keras.layers import TimeDistributed# input tensor for sequences of 20 timesteps, # each containing a 784-dimensional vector input_sequences = Input(shape=(20, 784))# this applies our previous model to every timestep in the input sequences. # the output of the previous model was a 10-way softmax, # so the output of the layer below will be a sequence of 20 vectors of size 10. processed_sequences = TimeDistributed(model)(input_sequences)

0x2: 多輸入和多輸出模型

使用泛型模型的一個(gè)典型場(chǎng)景是搭建多輸入、多輸出的模型。
考慮這樣一個(gè)模型。我們希望預(yù)測(cè)Twitter上一條新聞會(huì)被轉(zhuǎn)發(fā)和點(diǎn)贊多少次。模型的主要輸入是新聞本身，也就是一個(gè)詞語(yǔ)的序列。但我們還可以擁有額外的輸入，如新聞發(fā)布的日期等。這個(gè)模型的損失函數(shù)將由兩部分組成，輔助的損失函數(shù)評(píng)估僅僅基于新聞本身做出預(yù)測(cè)的情況，主損失函數(shù)評(píng)估基于新聞和額外信息的預(yù)測(cè)的情況，即使來(lái)自主損失函數(shù)的梯度發(fā)生彌散，來(lái)自輔助損失函數(shù)的信息也能夠訓(xùn)練Embeddding和LSTM層。在模型中早點(diǎn)使用主要的損失函數(shù)是對(duì)于深度網(wǎng)絡(luò)的一個(gè)良好的正則方法?？偠灾?#xff0c;該模型框圖如下：

讓我們用泛型模型來(lái)實(shí)現(xiàn)這個(gè)框圖
主要的輸入接收新聞本身，即一個(gè)整數(shù)的序列（每個(gè)整數(shù)編碼了一個(gè)詞）。這些整數(shù)位于1到10，000之間（即我們的字典有10，000個(gè)詞）。這個(gè)序列有100個(gè)單詞

from keras.layers import Input, Embedding, LSTM, Dense, merge from keras.models import Model# headline input: meant to receive sequences of 100 integers, between 1 and 10000. # note that we can name any layer by passing it a "name" argument. main_input = Input(shape=(100,), dtype='int32', name='main_input')# this embedding layer will encode the input sequence # into a sequence of dense 512-dimensional vectors. x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)# a LSTM will transform the vector sequence into a single vector, # containing information about the entire sequence lstm_out = LSTM(32)(x)

然后，我們插入一個(gè)額外的損失，使得即使在主損失很高的情況下，LSTM和Embedding層也可以平滑的訓(xùn)練

auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

再然后，我們將LSTM與額外的輸入數(shù)據(jù)串聯(lián)起來(lái)組成輸入，送入模型中

auxiliary_input = Input(shape=(5,), name='aux_input') x = merge([lstm_out, auxiliary_input], mode='concat')# we stack a deep fully-connected network on top x = Dense(64, activation='relu')(x) x = Dense(64, activation='relu')(x) x = Dense(64, activation='relu')(x)# and finally we add the main logistic regression layer main_output = Dense(1, activation='sigmoid', name='main_output')(x)

最后，我們定義整個(gè)2輸入，2輸出的模型：

model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])

模型定義完畢，下一步編譯模型。我們給額外的損失賦0.2的權(quán)重。我們可以通過(guò)關(guān)鍵字參數(shù)loss_weights或loss來(lái)為不同的輸出設(shè)置不同的損失函數(shù)或權(quán)值。這兩個(gè)參數(shù)均可為Python的列表或字典。這里我們給loss傳遞單個(gè)損失函數(shù)，這個(gè)損失函數(shù)會(huì)被應(yīng)用于所有輸出上

model.compile(optimizer='rmsprop', loss='binary_crossentropy',loss_weights=[1., 0.2])

編譯完成后，我們通過(guò)傳遞訓(xùn)練數(shù)據(jù)和目標(biāo)值訓(xùn)練該模型：

model.fit([headline_data, additional_data], [labels, labels],nb_epoch=50, batch_size=32)

因?yàn)槲覀冚斎牒洼敵鍪潜幻^(guò)的（在定義時(shí)傳遞了“name”參數(shù)），我們也可以用下面的方式編譯和訓(xùn)練模型：

model.compile(optimizer='rmsprop',loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},loss_weights={'main_output': 1., 'aux_output': 0.2})# and trained it via: model.fit({'main_input': headline_data, 'aux_input': additional_data},{'main_output': labels, 'aux_output': labels},nb_epoch=50, batch_size=32)

0x3: 共享層

另一個(gè)使用泛型模型的場(chǎng)合是使用共享層的時(shí)候
考慮微博數(shù)據(jù)，我們希望建立模型來(lái)判別兩條微博是否是來(lái)自同一個(gè)用戶(hù)，這個(gè)需求同樣可以用來(lái)判斷一個(gè)用戶(hù)的兩條微博的相似性。
一種實(shí)現(xiàn)方式是，我們建立一個(gè)模型，它分別將兩條微博的數(shù)據(jù)映射到兩個(gè)特征向量上，然后將特征向量串聯(lián)并加一個(gè)logistic回歸層，輸出它們來(lái)自同一個(gè)用戶(hù)的概率。這種模型的訓(xùn)練數(shù)據(jù)是一對(duì)對(duì)的微博。
因?yàn)檫@個(gè)問(wèn)題是對(duì)稱(chēng)的，所以處理第一條微博的模型當(dāng)然也能重用于處理第二條微博。所以這里我們使用一個(gè)共享的LSTM層來(lái)進(jìn)行映射。
首先，我們將微博的數(shù)據(jù)轉(zhuǎn)為（140，256）的矩陣，即每條微博有140個(gè)字符，每個(gè)單詞的特征由一個(gè)256維的詞向量表示，向量的每個(gè)元素為1表示某個(gè)字符出現(xiàn)，為0表示不出現(xiàn)，這是一個(gè)one-hot編碼

from keras.layers import Input, LSTM, Dense, merge from keras.models import Modeltweet_a = Input(shape=(140, 256)) tweet_b = Input(shape=(140, 256))

若要對(duì)不同的輸入共享同一層，就初始化該層一次，然后多次調(diào)用它

# this layer can take as input a matrix and will return a vector of size 64 shared_lstm = LSTM(64)# when we reuse the same layer instance # multiple times, the weights of the layer # are also being reused # (it is effectively *the same* layer) encoded_a = shared_lstm(tweet_a) encoded_b = shared_lstm(tweet_b)# we can then concatenate the two vectors: merged_vector = merge([encoded_a, encoded_b], mode='concat', concat_axis=-1)# and add a logistic regression on top predictions = Dense(1, activation='sigmoid')(merged_vector)# we define a trainable model linking the # tweet inputs to the predictions model = Model(input=[tweet_a, tweet_b], output=predictions)model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy']) model.fit([data_a, data_b], labels, nb_epoch=10)

0x4: 層“節(jié)點(diǎn)”的概念

無(wú)論何時(shí)，當(dāng)你在某個(gè)輸入上調(diào)用層時(shí)，你就創(chuàng)建了一個(gè)新的張量（即該層的輸出），同時(shí)你也在為這個(gè)層增加一個(gè)“（計(jì)算）節(jié)點(diǎn)”。這個(gè)節(jié)點(diǎn)將輸入張量映射為輸出張量。當(dāng)你多次調(diào)用該層時(shí)，這個(gè)層就有了多個(gè)節(jié)點(diǎn)，其下標(biāo)分別為0，1，2…

0x5: 依舊是一些栗子

1. inception模型

from keras.layers import merge, Convolution2D, MaxPooling2D, Inputinput_img = Input(shape=(3, 256, 256))tower_1 = Convolution2D(64, 1, 1, border_mode='same', activation='relu')(input_img) tower_1 = Convolution2D(64, 3, 3, border_mode='same', activation='relu')(tower_1)tower_2 = Convolution2D(64, 1, 1, border_mode='same', activation='relu')(input_img) tower_2 = Convolution2D(64, 5, 5, border_mode='same', activation='relu')(tower_2)tower_3 = MaxPooling2D((3, 3), strides=(1, 1), border_mode='same')(input_img) tower_3 = Convolution2D(64, 1, 1, border_mode='same', activation='relu')(tower_3)output = merge([tower_1, tower_2, tower_3], mode='concat', concat_axis=1)

2. 卷積層的殘差連接(Residual Network)

from keras.layers import merge, Convolution2D, Input# input tensor for a 3-channel 256x256 image x = Input(shape=(3, 256, 256)) # 3x3 conv with 3 output channels(same as input channels) y = Convolution2D(3, 3, 3, border_mode='same')(x) # this returns x + y. z = merge([x, y], mode='sum')

3. 共享視覺(jué)模型

該模型在兩個(gè)輸入上重用了圖像處理的模型，用來(lái)判別兩個(gè)MNIST數(shù)字是否是相同的數(shù)字

from keras.layers import merge, Convolution2D, MaxPooling2D, Input, Dense, Flatten from keras.models import Model# first, define the vision modules digit_input = Input(shape=(1, 27, 27)) x = Convolution2D(64, 3, 3)(digit_input) x = Convolution2D(64, 3, 3)(x) x = MaxPooling2D((2, 2))(x) out = Flatten()(x)vision_model = Model(digit_input, out)# then define the tell-digits-apart model digit_a = Input(shape=(1, 27, 27)) digit_b = Input(shape=(1, 27, 27))# the vision model will be shared, weights and all out_a = vision_model(digit_a) out_b = vision_model(digit_b)concatenated = merge([out_a, out_b], mode='concat') out = Dense(1, activation='sigmoid')(concatenated)classification_model = Model([digit_a, digit_b], out)

4. 視覺(jué)問(wèn)答模型(問(wèn)題性圖像驗(yàn)證碼)

在針對(duì)一幅圖片使用自然語(yǔ)言進(jìn)行提問(wèn)時(shí)，該模型能夠提供關(guān)于該圖片的一個(gè)單詞的答案
這個(gè)模型將自然語(yǔ)言的問(wèn)題和圖片分別映射為特征向量，將二者合并后訓(xùn)練一個(gè)logistic回歸層，從一系列可能的回答中挑選一個(gè)。

from keras.layers import Convolution2D, MaxPooling2D, Flatten from keras.layers import Input, LSTM, Embedding, Dense, merge from keras.models import Model, Sequential# first, let's define a vision model using a Sequential model. # this model will encode an image into a vector. vision_model = Sequential() vision_model.add(Convolution2D(64, 3, 3, activation='relu', border_mode='same', input_shape=(3, 224, 224))) vision_model.add(Convolution2D(64, 3, 3, activation='relu')) vision_model.add(MaxPooling2D((2, 2))) vision_model.add(Convolution2D(128, 3, 3, activation='relu', border_mode='same')) vision_model.add(Convolution2D(128, 3, 3, activation='relu')) vision_model.add(MaxPooling2D((2, 2))) vision_model.add(Convolution2D(256, 3, 3, activation='relu', border_mode='same')) vision_model.add(Convolution2D(256, 3, 3, activation='relu')) vision_model.add(Convolution2D(256, 3, 3, activation='relu')) vision_model.add(MaxPooling2D((2, 2))) vision_model.add(Flatten())# now let's get a tensor with the output of our vision model: image_input = Input(shape=(3, 224, 224)) encoded_image = vision_model(image_input)# next, let's define a language model to encode the question into a vector. # each question will be at most 100 word long, # and we will index words as integers from 1 to 9999. question_input = Input(shape=(100,), dtype='int32') embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input) encoded_question = LSTM(256)(embedded_question)# let's concatenate the question vector and the image vector: merged = merge([encoded_question, encoded_image], mode='concat')# and let's train a logistic regression over 1000 words on top: output = Dense(1000, activation='softmax')(merged)# this is our final model: vqa_model = Model(input=[image_input, question_input], output=output)# the next stage would be training this model on actual data.

5. 視頻問(wèn)答模型

在做完圖片問(wèn)答模型后，我們可以快速將其轉(zhuǎn)為視頻問(wèn)答的模型。在適當(dāng)?shù)挠?xùn)練下，你可以為模型提供一個(gè)短視頻（如100幀）然后向模型提問(wèn)一個(gè)關(guān)于該視頻的問(wèn)題，如“what sport is the boy playing？”->“football”

from keras.layers import TimeDistributedvideo_input = Input(shape=(100, 3, 224, 224)) # this is our video encoded via the previously trained vision_model (weights are reused) encoded_frame_sequence = TimeDistributed(vision_model)(video_input) # the output will be a sequence of vectors encoded_video = LSTM(256)(encoded_frame_sequence) # the output will be a vector# this is a model-level representation of the question encoder, reusing the same weights as before: question_encoder = Model(input=question_input, output=encoded_question)# let's use it to encode the question: video_question_input = Input(shape=(100,), dtype='int32') encoded_video_question = question_encoder(video_question_input)# and this is our video question answering model: merged = merge([encoded_video, encoded_video_question], mode='concat') output = Dense(1000, activation='softmax')(merged) video_qa_model = Model(input=[video_input, video_question_input], output=output)

Relevant Link:

http://wiki.jikexueyuan.com/project/tensorflow-zh/resources/dims_types.html

5. 常用層

0x1: Dense層

Dense就是常用的全連接層

keras.layers.core.Dense(output_dim, init='glorot_uniform', activation='linear', weights=None, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None )

output_dim：大于0的整數(shù)，代表該層的輸出維度。模型中非首層的全連接層其輸入維度可以自動(dòng)推斷，因此非首層的全連接定義時(shí)不需要指定輸入維度。

init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。該參數(shù)僅在不傳遞weights參數(shù)時(shí)才有意義。

activation：激活函數(shù)，為預(yù)定義的激活函數(shù)名（參考激活函數(shù)），或逐元素（element-wise）的Theano函數(shù)。如果不指定該參數(shù)，將不會(huì)使用任何激活函數(shù)（即使用線(xiàn)性激活函數(shù)：a(x)=x）

weights：權(quán)值，為numpy array的list。該list應(yīng)含有一個(gè)形如（input_dim,output_dim）的權(quán)重矩陣和一個(gè)形如(output_dim,)的偏置向量。

W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象

b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象

activity_regularizer：施加在輸出上的正則項(xiàng)，為ActivityRegularizer對(duì)象

W_constraints：施加在權(quán)重上的約束項(xiàng)，為Constraints對(duì)象

b_constraints：施加在偏置上的約束項(xiàng)，為Constraints對(duì)象

bias：布爾值，是否包含偏置向量（即層對(duì)輸入做線(xiàn)性變換還是仿射變換）

input_dim：整數(shù)，輸入數(shù)據(jù)的維度。當(dāng)Dense層作為網(wǎng)絡(luò)的第一層時(shí)，必須指定該參數(shù)或input_shape參數(shù)。

after the first layer, you don’t need to specify the size of the input anymore

0x2: Activation層

激活層對(duì)一個(gè)層的輸出施加激活函數(shù)

keras.layers.core.Activation(activation) activation：將要使用的激活函數(shù)，為預(yù)定義激活函數(shù)名或一個(gè)Tensorflow/Theano的函數(shù)

0x3: Dropout層

為輸入數(shù)據(jù)施加Dropout。Dropout將在訓(xùn)練過(guò)程中每次更新參數(shù)時(shí)隨機(jī)斷開(kāi)一定百分比（p）的輸入神經(jīng)元連接，Dropout層用于防止過(guò)擬合

keras.layers.core.Dropout(p) p：0~1的浮點(diǎn)數(shù)，控制需要斷開(kāi)的鏈接的比例

0x4: Flatten層

Flatten層用來(lái)將輸入“壓平”，即把多維的輸入一維化，常用在從卷積層到全連接層的過(guò)渡。Flatten不影響batch的大小

keras.layers.core.Flatten() model = Sequential() model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 32, 32))) # now: model.output_shape == (None, 64, 32, 32)model.add(Flatten()) # now: model.output_shape == (None, 65536)

0x5: Reshape層

Reshape層用來(lái)將輸入shape轉(zhuǎn)換為特定的shape

keras.layers.core.Reshape(target_shape) target_shape：目標(biāo)shape，為整數(shù)的tuple，不包含樣本數(shù)目的維度（batch大小） # as first layer in a Sequential model model = Sequential() model.add(Reshape((3, 4), input_shape=(12,))) # now: model.output_shape == (None, 3, 4) # note: `None` is the batch dimension# as intermediate layer in a Sequential model model.add(Reshape((6, 2))) # now: model.output_shape == (None, 6, 2)

0x6: Permute層

Permute層將輸入的維度按照給定模式進(jìn)行重排，例如，當(dāng)需要將RNN和CNN網(wǎng)絡(luò)連接時(shí)，可能會(huì)用到該層

keras.layers.core.Permute(dims)

dims：整數(shù)tuple，指定重排的模式，不包含樣本數(shù)的維度。重排模式的下標(biāo)從1開(kāi)始。例如（2，1）代表將輸入的第二個(gè)維度重拍到輸出的第一個(gè)維度，而將輸入的第一個(gè)維度重排到第二個(gè)維度

model = Sequential() model.add(Permute((2, 1), input_shape=(10, 64))) # now: model.output_shape == (None, 64, 10) # note: `None` is the batch dimension

0x7: RepeatVector層

RepeatVector層將輸入重復(fù)n次

keras.layers.core.RepeatVector(n) n：整數(shù)，重復(fù)的次數(shù) model = Sequential() model.add(Dense(32, input_dim=32)) # now: model.output_shape == (None, 32) # note: `None` is the batch dimensionmodel.add(RepeatVector(3)) # now: model.output_shape == (None, 3, 32)

0x8: Merge層

Merge層根據(jù)給定的模式，將一個(gè)張量列表中的若干張量合并為一個(gè)單獨(dú)的張量

keras.engine.topology.Merge(layers=None, mode='sum', concat_axis=-1, dot_axes=-1, output_shape=None, node_indices=None, tensor_indices=None, name=None )

layers：該參數(shù)為Keras張量的列表，或Keras層對(duì)象的列表。該列表的元素?cái)?shù)目必須大于1。

mode：合并模式，為預(yù)定義合并模式名的字符串或lambda函數(shù)或普通函數(shù)，如果為lambda函數(shù)或普通函數(shù)，則該函數(shù)必須接受一個(gè)張量的list作為輸入，并返回一個(gè)張量。如果為字符串，則必須是下列值之一：
“sum”，“mul”，“concat”，“ave”，“cos”，“dot”

concat_axis：整數(shù)，當(dāng)mode=concat時(shí)指定需要串聯(lián)的軸

dot_axes：整數(shù)或整數(shù)tuple，當(dāng)mode=dot時(shí)，指定要消去的軸

output_shape：整數(shù)tuple或lambda函數(shù)/普通函數(shù)（當(dāng)mode為函數(shù)時(shí)）。如果output_shape是函數(shù)時(shí)，該函數(shù)的輸入值應(yīng)為一一對(duì)應(yīng)于輸入shape的list，并返回輸出張量的shape。

node_indices：可選，為整數(shù)list，如果有些層具有多個(gè)輸出節(jié)點(diǎn)（node）的話(huà)，該參數(shù)可以指定需要merge的那些節(jié)點(diǎn)的下標(biāo)。如果沒(méi)有提供，該參數(shù)的默認(rèn)值為全0向量，即合并輸入層0號(hào)節(jié)點(diǎn)的輸出值。

tensor_indices：可選，為整數(shù)list，如果有些層返回多個(gè)輸出張量的話(huà)，該參數(shù)用以指定需要合并的那些張量

在進(jìn)行merge的時(shí)候需要仔細(xì)思考采用哪種連接方式，以及將哪個(gè)軸進(jìn)行merge，因?yàn)檫@會(huì)很大程度上影響神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過(guò)程

0x9: Lambda層

本函數(shù)用以對(duì)上一層的輸出施以任何Theano/TensorFlow表達(dá)式

keras.layers.core.Lambda(function, output_shape=None, arguments={} )

function：要實(shí)現(xiàn)的函數(shù)，該函數(shù)僅接受一個(gè)變量，即上一層的輸出

output_shape：函數(shù)應(yīng)該返回的值的shape，可以是一個(gè)tuple，也可以是一個(gè)根據(jù)輸入shape計(jì)算輸出shape的函數(shù)

arguments：可選，字典，用來(lái)記錄向函數(shù)中傳遞的其他關(guān)鍵字參數(shù)

0x10: ActivityRegularizer層

經(jīng)過(guò)本層的數(shù)據(jù)不會(huì)有任何變化，但會(huì)基于其激活值更新?lián)p失函數(shù)值

keras.layers.core.ActivityRegularization(l1=0.0, l2=0.0) l1：1范數(shù)正則因子（正浮點(diǎn)數(shù)） l2：2范數(shù)正則因子（正浮點(diǎn)數(shù)）

0x11: Masking層

使用給定的值對(duì)輸入的序列信號(hào)進(jìn)行“屏蔽”，用以定位需要跳過(guò)的時(shí)間步
對(duì)于輸入張量的時(shí)間步，即輸入張量的第1維度（維度從0開(kāi)始算），如果輸入張量在該時(shí)間步上都等于mask_value，則該時(shí)間步將在模型接下來(lái)的所有層（只要支持masking）被跳過(guò)（屏蔽）。
如果模型接下來(lái)的一些層不支持masking，卻接受到masking過(guò)的數(shù)據(jù)，則拋出異常

考慮輸入數(shù)據(jù)x是一個(gè)形如(samples,timesteps,features)的張量，現(xiàn)將其送入LSTM層。因?yàn)槟闳鄙贂r(shí)間步為3和5的信號(hào)，所以你希望將其掩蓋。這時(shí)候應(yīng)該：賦值x[:,3,:] = 0.，x[:,5,:] = 0. 在LSTM層之前插入mask_value=0.的Masking層 model = Sequential() model.add(Masking(mask_value=0., input_shape=(timesteps, features))) model.add(LSTM(32))

0x12: Highway層

Highway層建立全連接的Highway網(wǎng)絡(luò)，這是LSTM在前饋神經(jīng)網(wǎng)絡(luò)中的推廣

keras.layers.core.Highway(init='glorot_uniform', transform_bias=-2, activation='linear', weights=None, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None )

init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。該參數(shù)僅在不傳遞weights參數(shù)時(shí)有意義。

weights：權(quán)值，為numpy array的list。該list應(yīng)含有一個(gè)形如（input_dim,output_dim）的權(quán)重矩陣和一個(gè)形如(output_dim,)的偏置向量。

W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象

b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象

activity_regularizer：施加在輸出上的正則項(xiàng)，為ActivityRegularizer對(duì)象

W_constraints：施加在權(quán)重上的約束項(xiàng)，為Constraints對(duì)象

b_constraints：施加在偏置上的約束項(xiàng)，為Constraints對(duì)象

bias：布爾值，是否包含偏置向量（即層對(duì)輸入做線(xiàn)性變換還是仿射變換）

input_dim：整數(shù)，輸入數(shù)據(jù)的維度。當(dāng)該層作為網(wǎng)絡(luò)的第一層時(shí)，必須指定該參數(shù)或input_shape參數(shù)。

transform_bias：用以初始化傳遞參數(shù)，默認(rèn)為-2（請(qǐng)參考文獻(xiàn)理解本參數(shù)的含義）

0x13: MaxoutDense層

全連接的Maxout層。MaxoutDense層以nb_features個(gè)Dense(input_dim,output_dim)線(xiàn)性層的輸出的最大值為輸出。MaxoutDense可對(duì)輸入學(xué)習(xí)出一個(gè)凸的、分段線(xiàn)性的激活函數(shù)

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/core_layer/

6. 卷積層

數(shù)據(jù)輸入層: 對(duì)數(shù)據(jù)做一些處理，比如去均值(把輸入數(shù)據(jù)各個(gè)維度都中心化為0，避免數(shù)據(jù)過(guò)多偏差，影響訓(xùn)練效果)、歸一化(把所有的數(shù)據(jù)都?xì)w一到同樣的范圍)、PCA/白化等等
中間是
CONV: 卷積計(jì)算層，線(xiàn)性乘積求和(內(nèi)積)
RELU: 激勵(lì)層(激活函數(shù))，用于把向量轉(zhuǎn)化為一個(gè)"量值"，用于評(píng)估本輪參數(shù)的分類(lèi)效果
POOL: 池化層，簡(jiǎn)言之，即取區(qū)域平均或最大
最右邊是
FC: 全連接層

0x0: CNN之卷積計(jì)算層

1. CNN核心概念: 濾波

在通信領(lǐng)域中，濾波(Wave filtering)指的是將信號(hào)中特定波段頻率濾除的操作，是抑制和防止干擾的一項(xiàng)重要措施。在CNN圖像識(shí)別領(lǐng)域，指的是對(duì)圖像(不同的數(shù)據(jù)窗口數(shù)據(jù))和濾波矩陣(一組固定的權(quán)重：因?yàn)槊總€(gè)神經(jīng)元的多個(gè)權(quán)重固定，所以又可以看做一個(gè)恒定的濾波器filter)做內(nèi)積(逐個(gè)元素相乘再求和)的操作就是所謂的"卷積"操作，也是卷積神經(jīng)網(wǎng)絡(luò)的名字來(lái)源。
直觀(guān)上理解就是從一個(gè)區(qū)域(區(qū)域的大小就是filter濾波器的size)中抽取出"重要的細(xì)節(jié)"，而抽取的方法就是建立"區(qū)域權(quán)重"，根據(jù)區(qū)域權(quán)重把一個(gè)區(qū)域中的重點(diǎn)細(xì)節(jié)過(guò)濾出來(lái)
再直觀(guān)一些理解就是例如上圖的汽車(chē)圖像，濾波器要做的就是把其中的輪胎、車(chē)后視鏡、前臉輪廓、A柱形狀過(guò)濾出來(lái)，從邊緣細(xì)節(jié)的角度來(lái)看待一張非格式化的圖像
這種技術(shù)的理論基礎(chǔ)是學(xué)術(shù)界認(rèn)為人眼對(duì)圖像的識(shí)別也是分層的，人眼第一眼接收到的就是一個(gè)物理的輪廓細(xì)節(jié)，然后傳輸給大腦皮層，然后在輪廓細(xì)節(jié)的基礎(chǔ)上進(jìn)一步抽象建立起對(duì)一個(gè)物理的整體感知

非嚴(yán)格意義上來(lái)講，上圖中紅框框起來(lái)的部分便可以理解為一個(gè)濾波器，即帶著一組固定權(quán)重的神經(jīng)元。多個(gè)濾波器疊加便成了卷積層

2. 圖像上的卷積

在下圖對(duì)應(yīng)的計(jì)算過(guò)程中，輸入是一定區(qū)域大小(width*height)的數(shù)據(jù)，和濾波器filter（帶著一組固定權(quán)重的神經(jīng)元）做內(nèi)積后等到新的二維數(shù)據(jù)。
具體來(lái)說(shuō)，左邊是圖像輸入，中間部分就是濾波器filter（帶著一組固定權(quán)重的神經(jīng)元），不同的濾波器filter會(huì)得到不同的輸出數(shù)據(jù)，比如顏色深淺、輪廓。相當(dāng)于如果想提取圖像的不同特征，則用不同的濾波器filter，提取想要的關(guān)于圖像的特定信息：顏色深淺或輪廓

3. CNN濾波器

在CNN中，濾波器filter（帶著一組固定權(quán)重的神經(jīng)元）對(duì)局部輸入數(shù)據(jù)進(jìn)行卷積計(jì)算。每計(jì)算完一個(gè)數(shù)據(jù)窗口內(nèi)的局部數(shù)據(jù)后，數(shù)據(jù)窗口不斷平移滑動(dòng)，直到計(jì)算完所有數(shù)據(jù)

可以看到：

兩個(gè)神經(jīng)元，即depth=2，意味著有兩個(gè)濾波器。
數(shù)據(jù)窗口每次移動(dòng)兩個(gè)步長(zhǎng)取3*3的局部數(shù)據(jù)，即stride=2。
zero-padding=1

然后分別以?xún)蓚€(gè)濾波器filter為軸滑動(dòng)數(shù)組進(jìn)行卷積計(jì)算，得到兩組不同的結(jié)果。通過(guò)這種滑動(dòng)窗口的濾波過(guò)程，逐步把圖像的各個(gè)細(xì)節(jié)信息提取出來(lái)(邊緣輪廓、圖像深淺)。值得注意的是

局部感知機(jī)制
左邊數(shù)據(jù)在變化，每次濾波器都是針對(duì)某一局部的數(shù)據(jù)窗口進(jìn)行卷積，這就是所謂的CNN中的局部感知機(jī)制。
打個(gè)比方，濾波器就像一雙眼睛，人類(lèi)視角有限，一眼望去，只能看到這世界的局部。如果一眼就看到全世界，你會(huì)累死，而且一下子接受全世界所有信息，你大腦接收不過(guò)來(lái)。當(dāng)然，即便是看局部，針對(duì)局部里的信息人類(lèi)雙眼也是有偏重、偏好的。比如看美女，對(duì)臉、胸、腿是重點(diǎn)關(guān)注，所以這3個(gè)輸入的權(quán)重相對(duì)較大

參數(shù)(權(quán)重)共享機(jī)制
數(shù)據(jù)窗口滑動(dòng)，導(dǎo)致輸入濾波器的數(shù)據(jù)在變化，但中間濾波器Filter w0的權(quán)重(即每個(gè)神經(jīng)元連接數(shù)據(jù)窗口的權(quán)重)是固定不變的，這個(gè)權(quán)重不變即所謂的CNN中的參數(shù)(權(quán)重)共享機(jī)制。
再打個(gè)比方，某人環(huán)游全世界，所看到的信息在變，但采集信息的雙眼不變。一個(gè)人對(duì)景物的認(rèn)知在一定時(shí)間段內(nèi)是保持不變的，但是需要注意的是，這些權(quán)重也不是永遠(yuǎn)不變的，隨著訓(xùn)練的進(jìn)行，權(quán)重會(huì)根據(jù)激活函數(shù)的判斷結(jié)果不斷調(diào)整網(wǎng)絡(luò)中的權(quán)重(這就是所謂的BP反向傳播算法)

4. CNN激勵(lì)層

常用的非線(xiàn)性激活函數(shù)有sigmoid、tanh、relu等等，前兩者sigmoid/tanh比較常見(jiàn)于全連接層，后者relu常見(jiàn)于卷積層

4. 1 sigmoid

激活函數(shù)sigmoid： $g(z)=11+exp(?z)g(z)=\frac{1}{1+exp(-z)}$ ，其中 $z$ 是一個(gè)線(xiàn)性組合，比如 $z$ 可以等于： $b + w 1 ? b 1 + w 2 ? b 2$

橫軸表示定義域z，縱軸表示值域g(z)。sigmoid函數(shù)的功能是相當(dāng)于把一個(gè)實(shí)數(shù)壓縮至0到1之間。當(dāng)z是非常大的正數(shù)時(shí)，g(z)會(huì)趨近于1，而z是非常大的負(fù)數(shù)時(shí)，則g(z)會(huì)趨近于0
這樣一來(lái)便可以把激活函數(shù)看作一種“分類(lèi)的概率”，比如激活函數(shù)的輸出為0.9的話(huà)便可以解釋為90%的概率為正樣本

4.2 ReLU激勵(lì)層

ReLU的優(yōu)點(diǎn)是收斂快，求梯度簡(jiǎn)單

5. CNN池化層

池化，簡(jiǎn)言之，即取區(qū)域平均或最大

接下來(lái)拿一個(gè)真實(shí)的CNN網(wǎng)絡(luò)來(lái)解釋CNN的構(gòu)造原理

Input layer of NxN pixels (N=32).

Convolutional layer (64 filter maps of size 11x11).

Max-pooling layer.

Densely-connected layer (4096 neurons)

Output layer. 9 neurons.

輸入圖像是一個(gè)32*32的圖像集，下面分別解釋數(shù)據(jù)在各層的維度變化

input layer: 32x32 neurons

convolutional layer(64 filters, size 11x11): (32?11+1)?(32?11+1) = 22?22 = 484 for each feature map. As a result, the total output of the convolutional layer is 22?22?64 = 30976.

pooling layer(2x2 regions): reduced to 11?11?64 = 7744.

fully-connected layer: 4096 neurons

output layer

The number of learnable parameters P of this network is:

P = 1024?(11?11?64)+64+(11?11?64)?4096+4096+4096?9+9 = 39690313

我們注意看你第二層的CNN層，它實(shí)際上可以理解為我們對(duì)同一幅圖，根據(jù)不同的觀(guān)察重點(diǎn)(濾波窗口移動(dòng))得到的不同細(xì)節(jié)視角的圖像

0x1: Convolution1D層

一維卷積層，用以在一維輸入信號(hào)上進(jìn)行鄰域?yàn)V波。當(dāng)使用該層作為首層時(shí)，需要提供關(guān)鍵字參數(shù)input_dim或input_shape。例如input_dim=128長(zhǎng)為128的向量序列輸入，而input_shape=(10,128)代表一個(gè)長(zhǎng)為10的128向量序列(對(duì)于byte詞頻的代碼段特征向量來(lái)說(shuō)就是input_shape=(15000, 256))

keras.layers.convolutional.Convolution1D(nb_filter, filter_length, init='uniform', activation='linear', weights=None, border_mode='valid', subsample_length=1, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True, input_dim=None, input_length=None )1. nb_filter：卷積核的數(shù)目(即輸出的維度)(我們可以利用filter來(lái)減少CNN輸入層的維度，降低計(jì)算量) 2. filter_length：卷積核的空域或時(shí)域長(zhǎng)度 3. init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。該參數(shù)僅在不傳遞weights參數(shù)時(shí)有意義。 4. activation：激活函數(shù)，為預(yù)定義的激活函數(shù)名（參考激活函數(shù)），或逐元素（element-wise）的Theano函數(shù)。如果不指定該參數(shù)，將不會(huì)使用任何激活函數(shù)（即使用線(xiàn)性激活函數(shù)：a(x)=x） 5. weights：權(quán)值，為numpy array的list。該list應(yīng)含有一個(gè)形如（input_dim,output_dim）的權(quán)重矩陣和一個(gè)形如(output_dim,)的偏置向量。 6. border_mode：邊界模式，為“valid”, “same” 或“full”，full需要以theano為后端 7. subsample_length：輸出對(duì)輸入的下采樣因子 8. W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 9. b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象 10. activity_regularizer：施加在輸出上的正則項(xiàng)，為ActivityRegularizer對(duì)象 11. W_constraints：施加在權(quán)重上的約束項(xiàng)，為Constraints對(duì)象 12. b_constraints：施加在偏置上的約束項(xiàng)，為Constraints對(duì)象 13. bias：布爾值，是否包含偏置向量（即層對(duì)輸入做線(xiàn)性變換還是仿射變換） 14. input_dim：整數(shù)，輸入數(shù)據(jù)的維度。當(dāng)該層作為網(wǎng)絡(luò)的第一層時(shí)，必須指定該參數(shù)或input_shape參數(shù)。 15. input_length：當(dāng)輸入序列的長(zhǎng)度固定時(shí)，該參數(shù)為輸入序列的長(zhǎng)度。當(dāng)需要在該層后連接Flatten層，然后又要連接Dense層時(shí)，需要指定該參數(shù)，否則全連接的輸出無(wú)法計(jì)算出來(lái)

example

# apply a convolution 1d of length 3 to a sequence with 10 timesteps, # with 64 output filters model = Sequential() model.add(Convolution1D(64, 3, border_mode='same', input_shape=(10, 32))) # now model.output_shape == (None, 10, 64)# add a new conv1d on top model.add(Convolution1D(32, 3, border_mode='same')) # now model.output_shape == (None, 10, 32)

可以將Convolution1D看作Convolution2D的快捷版，對(duì)例子中（10，32）的信號(hào)進(jìn)行1D卷積相當(dāng)于對(duì)其進(jìn)行卷積核為（filter_length, 32）的2D卷積

0x2: AtrousConvolution1D層

AtrousConvolution1D層用于對(duì)1D信號(hào)進(jìn)行濾波，是膨脹/帶孔洞的卷積。當(dāng)使用該層作為首層時(shí)，需要提供關(guān)鍵字參數(shù)input_dim或input_shape。例如input_dim=128長(zhǎng)為128的向量序列輸入，而input_shape=(10,128)代表一個(gè)長(zhǎng)為10的128向量序列.

keras.layers.convolutional.AtrousConvolution1D(nb_filter, filter_length, init='uniform', activation='linear', weights=None, border_mode='valid', subsample_length=1, atrous_rate=1, W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True )1. nb_filter：卷積核的數(shù)目（即輸出的維度） 2. filter_length：卷積核的空域或時(shí)域長(zhǎng)度 3. init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。該參數(shù)僅在不傳遞weights參數(shù)時(shí)有意義。 5. activation：激活函數(shù)，為預(yù)定義的激活函數(shù)名（參考激活函數(shù)），或逐元素（element-wise）的Theano函數(shù)。如果不指定該參數(shù)，將不會(huì)使用任何激活函數(shù)（即使用線(xiàn)性激活函數(shù)：a(x)=x） 6. weights：權(quán)值，為numpy array的list。該list應(yīng)含有一個(gè)形如（input_dim,output_dim）的權(quán)重矩陣和一個(gè)形如(output_dim,)的偏置向量。 7. border_mode：邊界模式，為“valid”，“same”或“full”，full需要以theano為后端 8. subsample_length：輸出對(duì)輸入的下采樣因子 9. atrous_rate:卷積核膨脹的系數(shù)，在其他地方也被稱(chēng)為'filter_dilation' 10. W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 11. b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象 12. activity_regularizer：施加在輸出上的正則項(xiàng)，為ActivityRegularizer對(duì)象 13. W_constraints：施加在權(quán)重上的約束項(xiàng)，為Constraints對(duì)象 14. b_constraints：施加在偏置上的約束項(xiàng)，為Constraints對(duì)象 15. bias：布爾值，是否包含偏置向量（即層對(duì)輸入做線(xiàn)性變換還是仿射變換） 16. input_dim：整數(shù)，輸入數(shù)據(jù)的維度。當(dāng)該層作為網(wǎng)絡(luò)的第一層時(shí)，必須指定該參數(shù)或input_shape參數(shù)。 17. input_length：當(dāng)輸入序列的長(zhǎng)度固定時(shí)，該參數(shù)為輸入序列的長(zhǎng)度。當(dāng)需要在該層后連接Flatten層，然后又要連接Dense層時(shí)，需要指定該參數(shù)，否則全連接的輸出無(wú)法計(jì)算出來(lái)。

example

# apply an atrous convolution 1d with atrous rate 2 of length 3 to a sequence with 10 timesteps, # with 64 output filters model = Sequential() model.add(AtrousConvolution1D(64, 3, atrous_rate=2, border_mode='same', input_shape=(10, 32))) # now model.output_shape == (None, 10, 64)# add a new atrous conv1d on top model.add(AtrousConvolution1D(32, 3, atrous_rate=2, border_mode='same')) # now model.output_shape == (None, 10, 32)

0x3: Convolution2D層

二維卷積層對(duì)二維輸入進(jìn)行滑動(dòng)窗卷積，當(dāng)使用該層作為第一層時(shí)，應(yīng)提供input_shape參數(shù)。例如input_shape = (3,128,128)代表128*128的彩色RGB圖像

keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation='linear', weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='th', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True )1. nb_filter：卷積核的數(shù)目 2. nb_row：卷積核的行數(shù) 3. nb_col：卷積核的列數(shù) 4. init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。該參數(shù)僅在不傳遞weights參數(shù)時(shí)有意義。 5. activation：激活函數(shù)，為預(yù)定義的激活函數(shù)名（參考激活函數(shù)），或逐元素（element-wise）的Theano函數(shù)。如果不指定該參數(shù)，將不會(huì)使用任何激活函數(shù)（即使用線(xiàn)性激活函數(shù)：a(x)=x） 6. weights：權(quán)值，為numpy array的list。該list應(yīng)含有一個(gè)形如（input_dim,output_dim）的權(quán)重矩陣和一個(gè)形如(output_dim,)的偏置向量。 7. border_mode：邊界模式，為“valid”，“same”或“full”，full需要以theano為后端 8. subsample：長(zhǎng)為2的tuple，輸出對(duì)輸入的下采樣因子，更普遍的稱(chēng)呼是“strides” 9. W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 10. b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象 11. activity_regularizer：施加在輸出上的正則項(xiàng)，為ActivityRegularizer對(duì)象 12. W_constraints：施加在權(quán)重上的約束項(xiàng)，為Constraints對(duì)象 13. b_constraints：施加在偏置上的約束項(xiàng)，為Constraints對(duì)象 14. dim_ordering：‘th’或‘tf’。‘th’模式中通道維（如彩色圖像的3通道）位于第1個(gè)位置（維度從0開(kāi)始算），而在‘tf’模式中，通道維位于第3個(gè)位置。例如128*128的三通道彩色圖片，在‘th’模式中input_shape應(yīng)寫(xiě)為（3，128，128），而在‘tf’模式中應(yīng)寫(xiě)為（128，128，3），注意這里3出現(xiàn)在第0個(gè)位置，因?yàn)閕nput_shape不包含樣本數(shù)的維度，在其內(nèi)部實(shí)現(xiàn)中，實(shí)際上是（None，3，128，128）和（None，128，128，3）。默認(rèn)是image_dim_ordering指定的模式，可在~/.keras/keras.json中查看，若沒(méi)有設(shè)置過(guò)則為'tf'。 15. bias：布爾值，是否包含偏置向量（即層對(duì)輸入做線(xiàn)性變換還是仿射變換）

example

# apply a 3x3 convolution with 64 output filters on a 256x256 image: model = Sequential() model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256))) # now model.output_shape == (None, 64, 256, 256)# add a 3x3 convolution on top, with 32 output filters: model.add(Convolution2D(32, 3, 3, border_mode='same')) # now model.output_shape == (None, 32, 256, 256)

0x4: AtrousConvolution2D層

該層對(duì)二維輸入進(jìn)行Atrous卷積，也即膨脹卷積或帶孔洞的卷積。當(dāng)使用該層作為第一層時(shí)，應(yīng)提供input_shape參數(shù)。例如input_shape = (3,128,128)代表128*128的彩色RGB圖像

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/convolutional_layer/
http://baike.baidu.com/item/%E6%BB%A4%E6%B3%A2
http://blog.csdn.net/v_july_v/article/details/51812459
http://cs231n.github.io/convolutional-networks/#overview
http://blog.csdn.net/stdcoutzyx/article/details/41596663

7. 池化層

0x1: MaxPooling1D層

對(duì)時(shí)域1D信號(hào)進(jìn)行最大值池化

keras.layers.convolutional.MaxPooling1D(pool_length=2, stride=None, border_mode='valid' )pool_length：下采樣因子，如取2則將輸入下采樣到一半長(zhǎng)度 stride：整數(shù)或None，步長(zhǎng)值 border_mode：‘valid’或者‘same’

0x2: MaxPooling2D層

為空域信號(hào)施加最大值池化

keras.layers.convolutional.MaxPooling2D(pool_size=(2, 2), strides=None, border_mode='valid', dim_ordering='th' ) 1. pool_size：長(zhǎng)為2的整數(shù)tuple，代表在兩個(gè)方向（豎直，水平）上的下采樣因子，如取（2，2）將使圖片在兩個(gè)維度上均變?yōu)樵L(zhǎng)的一半 2. strides：長(zhǎng)為2的整數(shù)tuple，或者None，步長(zhǎng)值。 3. border_mode：‘valid’或者‘same’ 4. dim_ordering：‘th’或‘tf’。‘th’模式中通道維（如彩色圖像的3通道）位于第1個(gè)位置（維度從0開(kāi)始算），而在‘tf’模式中，通道維位于第3個(gè)位置。例如128*128的三通道彩色圖片，在‘th’模式中input_shape應(yīng)寫(xiě)為（3，128，128），而在‘tf’模式中應(yīng)寫(xiě)為（128，128，3），注意這里3出現(xiàn)在第0個(gè)位置，因?yàn)閕nput_shape不包含樣本數(shù)的維度，在其內(nèi)部實(shí)現(xiàn)中，實(shí)際上是（None，3，128，128）和（None，128，128，3）。默認(rèn)是image_dim_ordering指定的模式，可在~/.keras/keras.json中查看，若沒(méi)有設(shè)置過(guò)則為'tf'

0x3: AveragePooling1D層

對(duì)時(shí)域1D信號(hào)進(jìn)行平均值池化

keras.layers.convolutional.AveragePooling1D(pool_length=2, stride=None, border_mode='valid' ) 1. pool_length：下采樣因子，如取2則將輸入下采樣到一半長(zhǎng)度 2. stride：整數(shù)或None，步長(zhǎng)值 3. border_mode：‘valid’或者‘same’ 注意，目前‘same’模式只能在TensorFlow作為后端時(shí)使用

0x4: GlobalMaxPooling1D層

對(duì)于時(shí)間信號(hào)的全局最大池化

keras.layers.pooling.GlobalMaxPooling1D()

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/pooling_layer/

8. 遞歸層Recurrent

0x1: Recurrent層

這是遞歸層的抽象類(lèi)，請(qǐng)不要在模型中直接應(yīng)用該層（因?yàn)樗浅橄箢?lèi)，無(wú)法實(shí)例化任何對(duì)象）。請(qǐng)使用它的子類(lèi)LSTM或SimpleRNN。
所有的遞歸層（LSTM,GRU,SimpleRNN）都服從本層的性質(zhì)，并接受本層指定的所有關(guān)鍵字參數(shù)

keras.layers.recurrent.Recurrent(weights=None, return_sequences=False, go_backwards=False, stateful=False, unroll=False, consume_less='cpu', input_dim=None, input_length=None )1. weights：numpy array的list，用以初始化權(quán)重。該list形如[(input_dim, output_dim),(output_dim, output_dim),(output_dim,)] 2. return_sequences：布爾值，默認(rèn)False，控制返回類(lèi)型。若為True則返回整個(gè)序列，否則僅返回輸出序列的最后一個(gè)輸出 3. go_backwards：布爾值，默認(rèn)為False，若為True，則逆向處理輸入序列 4. stateful：布爾值，默認(rèn)為False，若為True，則一個(gè)batch中下標(biāo)為i的樣本的最終狀態(tài)將會(huì)用作下一個(gè)batch同樣下標(biāo)的樣本的初始狀態(tài)。 5. unroll：布爾值，默認(rèn)為False，若為True，則遞歸層將被展開(kāi)，否則就使用符號(hào)化的循環(huán)。當(dāng)使用TensorFlow為后端時(shí)，遞歸網(wǎng)絡(luò)本來(lái)就是展開(kāi)的，因此該層不做任何事情。層展開(kāi)會(huì)占用更多的內(nèi)存，但會(huì)加速RNN的運(yùn)算。層展開(kāi)只適用于短序列。 6. consume_less：‘cpu’或‘mem’之一。若設(shè)為‘cpu’，則RNN將使用較少、較大的矩陣乘法來(lái)實(shí)現(xiàn)，從而在CPU上會(huì)運(yùn)行更快，但會(huì)更消耗內(nèi)存。如果設(shè)為‘mem’，則RNN將會(huì)較多的小矩陣乘法來(lái)實(shí)現(xiàn)，從而在GPU并行計(jì)算時(shí)會(huì)運(yùn)行更快（但在CPU上慢），并占用較少內(nèi)存。 7. input_dim：輸入維度，當(dāng)使用該層為模型首層時(shí)，應(yīng)指定該值（或等價(jià)的指定input_shape) 8. input_length：當(dāng)輸入序列的長(zhǎng)度固定時(shí)，該參數(shù)為輸入序列的長(zhǎng)度。當(dāng)需要在該層后連接Flatten層，然后又要連接Dense層時(shí)，需要指定該參數(shù)，否則全連接的輸出無(wú)法計(jì)算出來(lái)。注意，如果遞歸層不是網(wǎng)絡(luò)的第一層，你需要在網(wǎng)絡(luò)的第一層中指定序列的長(zhǎng)度，如通過(guò)input_shape指定。

0x2: SimpleRNN層

全連接RNN網(wǎng)絡(luò)，RNN的輸出會(huì)被回饋到輸入

keras.layers.recurrent.SimpleRNN(output_dim, init='glorot_uniform', inner_init='orthogonal', activation='tanh', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0 )output_dim：內(nèi)部投影和輸出的維度 init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。 inner_init：內(nèi)部單元的初始化方法 activation：激活函數(shù)，為預(yù)定義的激活函數(shù)名（參考激活函數(shù)） W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 U_regularizer：施加在遞歸權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象 dropout_W：0~1之間的浮點(diǎn)數(shù)，控制輸入單元到輸入門(mén)的連接斷開(kāi)比例 dropout_U：0~1之間的浮點(diǎn)數(shù)，控制輸入單元到遞歸連接的斷開(kāi)比例

0x3: GRU層

門(mén)限遞歸單元

keras.layers.recurrent.GRU(output_dim, init='glorot_uniform', inner_init='orthogonal', activation='tanh', inner_activation='hard_sigmoid', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0 )output_dim：內(nèi)部投影和輸出的維度 init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。 inner_init：內(nèi)部單元的初始化方法 activation：激活函數(shù)，為預(yù)定義的激活函數(shù)名（參考激活函數(shù)） inner_activation：內(nèi)部單元激活函數(shù) W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 U_regularizer：施加在遞歸權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象 dropout_W：0~1之間的浮點(diǎn)數(shù)，控制輸入單元到輸入門(mén)的連接斷開(kāi)比例 dropout_U：0~1之間的浮點(diǎn)數(shù)，控制輸入單元到遞歸連接的斷開(kāi)比例

0x4: LSTM層

Keras長(zhǎng)短期記憶模型

keras.layers.recurrent.LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one', activation='tanh', inner_activation='hard_sigmoid', W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0 )output_dim：內(nèi)部投影和輸出的維度 init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。 inner_init：內(nèi)部單元的初始化方法 forget_bias_init：遺忘門(mén)偏置的初始化函數(shù)，Jozefowicz et al.建議初始化為全1元素 activation：激活函數(shù)，為預(yù)定義的激活函數(shù)名（參考激活函數(shù)） inner_activation：內(nèi)部單元激活函數(shù) W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 U_regularizer：施加在遞歸權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 b_regularizer：施加在偏置向量上的正則項(xiàng)，為WeightRegularizer對(duì)象 dropout_W：0~1之間的浮點(diǎn)數(shù)，控制輸入單元到輸入門(mén)的連接斷開(kāi)比例 dropout_U：0~1之間的浮點(diǎn)數(shù)，控制輸入單元到遞歸連接的斷開(kāi)比例

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/recurrent_layer/

9. 嵌入層 Embedding

0x1: Embedding層

嵌入層將正整數(shù)（下標(biāo)）轉(zhuǎn)換為具有固定大小的向量，如[[4],[20]]->[[0.25,0.1],[0.6,-0.2]]。是一種數(shù)字化->向量化的編碼方式，使用Embedding需要輸入的特征向量具備空間關(guān)聯(lián)性
Embedding層只能作為模型的第一層

keras.layers.embeddings.Embedding(input_dim, output_dim, init='uniform', input_length=None, W_regularizer=None, activity_regularizer=None, W_constraint=None, mask_zero=False, weights=None, dropout=0.0 )input_dim：大或等于0的整數(shù)，字典長(zhǎng)度，即輸入數(shù)據(jù)最大下標(biāo)+1 output_dim：大于0的整數(shù)，代表全連接嵌入的維度 init：初始化方法，為預(yù)定義初始化方法名的字符串，或用于初始化權(quán)重的Theano函數(shù)。該參數(shù)僅在不傳遞weights參數(shù)時(shí)有意義。 weights：權(quán)值，為numpy array的list。該list應(yīng)僅含有一個(gè)如（input_dim,output_dim）的權(quán)重矩陣 W_regularizer：施加在權(quán)重上的正則項(xiàng)，為WeightRegularizer對(duì)象 W_constraints：施加在權(quán)重上的約束項(xiàng)，為Constraints對(duì)象 mask_zero：布爾值，確定是否將輸入中的‘0’看作是應(yīng)該被忽略的‘填充’（padding）值，該參數(shù)在使用遞歸層處理變長(zhǎng)輸入時(shí)有用。設(shè)置為True的話(huà)，模型中后續(xù)的層必須都支持masking，否則會(huì)拋出異常 input_length：當(dāng)輸入序列的長(zhǎng)度固定時(shí)，該值為其長(zhǎng)度。如果要在該層后接Flatten層，然后接Dense層，則必須指定該參數(shù)，否則Dense層的輸出維度無(wú)法自動(dòng)推斷。 dropout：0~1的浮點(diǎn)數(shù)，代表要斷開(kāi)的嵌入比例

Relevant Link:

https://keras-cn.readthedocs.io/en/latest/layers/embedding_layer/

總結(jié)

以上是生活随笔為你收集整理的Keras:基于Theano和TensorFlow的深度学习库的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Nat Genet | 杨俊/岳峰团队合
下一篇：同学，要开学了，你的导师也很焦虑