當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

贝叶斯优化神经网络参数_贝叶斯超参数优化：神经网络，TensorFlow，相预测示例

發布時間：2023/12/15 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了贝叶斯优化神经网络参数_贝叶斯超参数优化：神经网络，TensorFlow，相预测示例小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

貝葉斯優化神經網絡參數

The purpose of this work is to optimize the neural network model hyper-parameters to estimate facies classes from well logs. I will include some codes in this paper but for a full jupyter notebook file, you can visit my Github.

這項工作的目的是優化神經網絡模型的超參數，以從測井中估計相類。我將在本文中包括一些代碼，但要獲取完整的jupyter筆記本文件，您可以訪問我的Github 。

note: if you are new in TensorFlow, its installation elaborated by Jeff Heaton.

注意：如果您是TensorFlow的新手，那么其安裝將由Jeff Heaton進行詳細說明。

In machine learning, model parameters can be divided into two main categories:1- Trainable parameters: such as weights in neural networks learned by training algorithms and the user does not interfere in the process,2- Hyper-parameters: users can set them before training operation such as learning rate or the number of dense layers in the model.Selecting the best hyper-parameters can be a tedious task if you try it by hand and it is almost impossible to find the best ones if you are dealing with more than two parameters.One way is to divide each parameter into a valid evenly range and then simply ask the computer to loop for the combination of parameters and calculate the results. The method is called Grid Search. Although it is done by machine, it will be a time-consuming process. Suppose you have 3 hyper-parameters with 10 possible values in each. In this approach, you will run 103 neural network models (even with reasonable training datasets size, this task is huge).Another way is a random search approach. In fact, instead of using organized parameter searching, it will go through a random combination of parameters and look for the optimized ones. You may estimate that chance of success decreases to zero for larger hyper-parameter tunings.

在機器學習中，模型參數可以分為兩大類： 1-可 訓練參數 ：例如通過訓練算法學習的神經網絡權重，并且用戶不會干擾過程； 2- 超參數：用戶可以在設置參數之前訓練操作，例如學習率或模型中的密集層數。如果您手動嘗試選擇最佳超參數可能是一項繁瑣的任務，并且如果您要處理的參數過多，則幾乎找不到最佳參數兩個參數。一種方法是將每個參數平均劃分為有效范圍，然后簡單地讓計算機循環以獲取參數組合并計算結果。該方法稱為“ 網格搜索” 。盡管它是由機器完成的，但這將是一個耗時的過程。假設您有3個超參數，每個參數都有10個可能的值。在這種方法中，您將運行103神經網絡模型(即使具有合理的訓練數據集大小，此任務也非常艱巨)。另一種方法是隨機搜索方法。實際上，它會使用參數的隨機組合并尋找經過優化的參數，而不是使用有組織的參數搜索。您可能會估計，對于較大的超參數調整，成功的機會將減少為零。

Scikit-Optimize, skopt, which we will use here to the facies estimation task, is a simple and efficient library to minimize expensive noisy black-box functions. Bayesian optimization constructs another model of search-space for parameters. Gaussian Process is one kind of these models. This generates an estimate of how model performance varies with hyper-parameter changes.

Scikit-Optimize skopt是一個簡單而有效的庫，可最大程度地減少昂貴的嘈雜黑盒功能，我們將在此處將其用于相估計任務。貝葉斯優化為參數構造了另一種搜索空間模型。高斯過程就是這些模型中的一種。這樣就可以估算模型性能如何隨超參數變化而變化。

As we see in the picture, the true objective function(red dash line) is surrounded by noise (red shade). The red line shows how scikit optimize sampled the search space for hyper-parameters(one dimension). Scikit-optimize fills the area between sample points with the Gaussian process (green line) and estimates true real fitness value. In the areas with low samples or lack(like the left side of the picture between two red samples), there is great uncertainty (big difference between red and green lines causing big uncertainty green shade area such as two standard deviations uncertainty).In this process, then we ask a new set of hyper-parameter to explore more search space. In the initial steps, it goes with sparse accuracy but in later iterations, it focuses on where sampling points are more with the good agreement of fitness function with true objective function(trough area in the graph).For more study, you may refer to Scikit Optimize documentation.

如圖所示，真正的目標函數(紅色虛線)被噪聲(紅色陰影)包圍。紅線顯示scikit如何優化對超參數(一維)的搜索空間進行采樣。 Scikit優化使用高斯過程(綠線)填充采樣點之間的區域，并估算真實的實際適應度值。在樣本較少或不足的區域(例如兩個紅色樣本之間的圖片左側)，存在很大的不確定性(紅色和綠色線條之間的差異很大，導致綠色陰影區域的不確定性較大，例如兩個標準偏差不確定性)。過程，然后我們要求使用一組新的超參數來探索更多的搜索空間。在最初的步驟中，它具有稀疏的準確性，但是在以后的迭代中，它著重于采樣點更多，適應度函數與真實目標函數(圖中的谷值區域)具有良好一致性的地方。更多的研究，您可以參考Scikit優化文檔。

Data ReviewThe Council Grove gas reservoir is located in Kansas. From this carbonate reservoir, nine wells are available. Facies are studied from core samples in every half foot and matched with logging data in well location. Feature variables include five from wireline log measurements and two geologic constraining variables that are derived from geologic knowledge. For more detail refer here. For the dataset, you may download it from here. The seven variables are:

數據審查 Council Grove儲氣庫位于堪薩斯州。從該碳酸鹽巖儲層中可獲得九口井。從每半英尺的巖心樣本中研究巖相，并與井眼位置的測井數據相匹配。特征變量包括來自測井測井的五個變量和來自地質知識的兩個地質約束變量。有關更多詳細信息，請參見此處。對于數據集，您可以從此處下載。七個變量是：

GR: this wireline logging tools measure gamma emission

GR ：此電纜測井工具可測量伽馬輻射

ILD_log10: this is resistivity measurement

ILD_log10 ：這是電阻率測量

PE: photoelectric effect log

PE ：光電效應記錄

DeltaPHI: Phi is a porosity index in petrophysics.

DeltaPHI ：Phi是巖石物理學中的Kong隙度指數。

PNHIND: Average of neutron and density log.

PNHIND ：中子和密度對數的平均值。

NM_M:nonmarine-marine indicator

NM_M ：非海洋-海洋指示器

RELPOS: relative position

RELPOS ：相對位置

The nine discrete facies (classes of rocks) are:

九個離散相(巖石類別)為：

(SS) Nonmarine sandstone

(SS)淺海砂巖

(CSiS) Nonmarine coarse siltstone

(CSiS)淺海粗粉砂巖

(FSiS) Nonmarine fine siltstone

(FSiS)船用細粉砂巖

(SiSH) Marine siltstone and shale

(SiSH)海洋粉砂巖和頁巖

(MS) Mudstone (limestone)

(MS)泥巖(石灰石)

(WS) Wackestone (limestone)

(WS) Wackestone(石灰石)

(D) Dolomite

(D)白云石

(PS) Packstone-grainstone (limestone)

(PS) Packstone-grainstone(石灰石)

(BS) Phylloid-algal bafflestone (limestone)

(BS) Phylloid-alal擋板石(石灰石)

After reading the dataset into python, we can keep one well data as a blind set for future model performance examination. We also need to convert facies numbers into strings in the dataset. Refer to the full notebook.

將數據集讀入python后，我們可以保留一個井數據作為盲集，以供將來進行模型性能檢查。我們還需要將相序數字轉換為數據集中的字符串。請參閱完整的筆記本。

df = pd.read_csv(‘training_data.csv’)
blind = df[df['Well Name'] == 'SHANKLE']
training_data = df[df['Well Name'] != 'SHANKLE']

Feature EngineeringFacies classes should be converted to dummy variable in order to use in neural network:

為了將其用于神經網絡，應將特征工程相類轉換為虛擬變量：

dummies = pd.get_dummies(training_data[‘FaciesLabels’])
Facies_cat = dummies.columns
labels = dummies.values # target matirx# select predictors
features = training_data.drop(['Facies', 'Formation', 'Well Name', 'Depth','FaciesLabels'], axis=1)

預處理(使標準) (Preprocessing (make standard))

As we are dealing with various range of data, to make network efficient, let’s normalize it.

當我們處理各種數據時，為了使網絡高效，我們將其標準化。

from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(features)
scaled_features = scaler.transform(features)#Data split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
scaled_features, labels, test_size=0.2, random_state=42)

超參數 (Hyper-Parameters)

In this work, we will predict facies from well logs using deep learning in Tensorflow. There several hyper-parameters that we may adjust for deep learning. I will try to find out the optimized parameters for:

在這項工作中，我們將使用Tensorflow中的深度學習來預測測井相。我們可以為深度學習調整一些超參數。我將嘗試找出以下方面的優化參數：

Learning rate

學習率

Number of dense layers

致密層數

Number of nodes for each layer

每層的節點數

Which activation function: ‘relu’ or sigmoid

哪個激活功能：“ relu”或Sigmoid

To elaborate in this search dimension, we will use scikit-optimize(skopt) library. From skopt, real function will define our favorite range(lower bound = 1e-6, higher bound = 1e-1) for learning rate and will use logarithmic transformation. The search dimension for the number of layers (we look between 1 to 5) and each layer’s node amounts(between 5 to 512) can be implemented with Integer function of skopt.

為了詳細說明此搜索維度，我們將使用scikit-optimize(skopt)庫。從skopt中，實函數將定義我們喜歡的范圍(下限= 1e-6，上界= 1e-1)以提高學習率，并將使用對數轉換。可以使用skopt的Integer函數實現搜索層數(我們在1到5之間)和每個層的節點數量(在5到512之間)的搜索維度。

dim_learning_rate = Real(low=1e-6, high=1e-1, prior='log-uniform',
name='learning_rate')dim_num_dense_layers = Integer(low=1, high=10, name='num_dense_layers')dim_num_dense_nodes = Integer(low=5, high=512, name='num_dense_nodes')

For activation algorithms, we should use categorical function for optimization.

對于激活算法，我們應該使用分類函數進行優化。

dim_activation = Categorical(categories=['relu', 'sigmoid'],
name='activation')

Bring all search-dimensions into a single list:

將所有搜索維度合并到一個列表中：

dimensions = [dim_learning_rate,
dim_num_dense_layers,
dim_num_dense_nodes,
dim_activation]

If you already worked with deep learning for a specific project and found your hyper-parameters by hand for that project, you know how hard it is to optimize. You may also use your own guess (like mine as default) to compare the results with the Bayesian tuning approach.

如果您已經為特定項目進行了深度學習，并且手動找到了該項目的超參數，那么您就會知道優化的難度。您也可以使用自己的猜測(例如默認為我的)將結果與貝葉斯調整方法進行比較。

default_parameters = [1e-5, 1, 16, ‘relu’]

超參數優化 (Hyper-Parameter Optimization)

建立模型 (Create Model)

Like some examples developed by Tneseflow, we also need to define a model function first. After defining the type of model(Sequential here), we need to introduce the data dimension (data shape) in the first line. The number of layers and activation types are those two hyper-parameters that we are looking for to optimize. Softmax activation should be used for classification problems. Then another hyper-parameter is the learning rate which should be defined in the Adam function. The model should be compiled considering that loss function should be ‘categorical_crossentropy’ as we are dealing with the classification problems (facies prediction).

像Tneseflow開發的一些示例一樣，我們還需要首先定義一個模型函數。定義模型的類型(此處為順序)后，我們需要在第一行中介紹數據維度(數據形狀)。層數和激活類型是我們要優化的那兩個超參數。 Softmax激活應用于分類問題。然后另一個超參數是學習率，該學習率應在亞當函數中定義。在處理分類問題(相預測)時，應考慮損失函數應為“ categorical_crossentropy”，對模型進行編譯。

def create_model(learning_rate, num_dense_layers,
num_dense_nodes, activation):

model = Sequential()
model.add(InputLayer(input_shape=(scaled_features.shape[1])))

for i in range(num_dense_layers):
name = 'layer_dense_{0}'.format(i+1)
# add dense layer
model.add(Dense(num_dense_nodes,
activation=activation,
name=name))
# use softmax-activation for classification.
model.add(Dense(labels.shape[1], activation='softmax'))

# Use the Adam method for training the network.
optimizer = Adam(lr=learning_rate)

#compile the model so it can be trained.
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['accuracy'])

return model

訓練和評估模型 (Train and Evaluate the Model)

This function aims to create and train a network with given hyper-parameters and then evaluate model performance with the validation dataset. It returns fitness value, negative classification accuracy on the dataset. It is negative because skopt performs minimization rather than maximization.

此功能旨在創建和訓練具有給定超參數的網絡，然后使用驗證數據集評估模型性能。它在數據集上返回適應度值，負分類精度。這是負面的，因為skopt會執行最小化而不是最大化。

@use_named_args(dimensions=dimensions)def fitness(learning_rate, num_dense_layers,
num_dense_nodes, activation):
""" Hyper-parameters: learning_rate: Learning-rate for the optimizer. num_dense_layers: Number of dense layers. num_dense_nodes: Number of nodes in each dense layer. activation: Activation function for all layers. """
# Print the hyper-parameters.
print('learning rate: {0:.1e}'.format(learning_rate))
print('num_dense_layers:', num_dense_layers)
print('num_dense_nodes:', num_dense_nodes)
print('activation:', activation)
print()

# Create the neural network with these hyper-parameters.
model = create_model(learning_rate=learning_rate,
num_dense_layers=num_dense_layers,
num_dense_nodes=num_dense_nodes,
activation=activation)
# Dir-name for the TensorBoard log-files.
log_dir = log_dir_name(learning_rate, num_dense_layers,
num_dense_nodes, activation)

# Create a callback-function for Keras which will be
# run after each epoch has ended during training.
# This saves the log-files for TensorBoard.
# Note that there are complications when histogram_freq=1.
# It might give strange errors and it also does not properly
# support Keras data-generators for the validation-set.
callback_log = TensorBoard(
log_dir=log_dir,
histogram_freq=0,
write_graph=True,
write_grads=False,
write_images=False)

# Use Keras to train the model.
history = model.fit(x= X_train,
y= y_train,
epochs=3,
batch_size=128,
validation_data=validation_data,
callbacks=[callback_log])
# Get the classification accuracy on the validation-set
# after the last training-epoch.
accuracy = history.history['val_accuracy'][-1]
# Print the classification accuracy.
print()
print("Accuracy: {0:.2%}".format(accuracy))
print()
# Save the model if it improves on the best-found performance.
# We use the global keyword so we update the variable outside
# of this function.
global best_accuracy
# If the classification accuracy of the saved model is improved ...
if accuracy > best_accuracy:
# Save the new model to harddisk.
model.save(path_best_model)

# Update the classification accuracy.
best_accuracy = accuracy
# Delete the Keras model with these hyper-parameters from memory.
del model

# Clear the Keras session, otherwise it will keep adding new
# models to the same TensorFlow graph each time we create
# a model with a different set of hyper-parameters.
K.clear_session()

# NOTE: Scikit-optimize does minimization so it tries to
# find a set of hyper-parameters with the LOWEST fitness-value.
# Because we are interested in the HIGHEST classification
# accuracy, we need to negate this number so it can be minimized.
return -accuracy# This function exactly comes from :Hvass-Labs, TensorFlow-Tutorials

run this:

運行這個：

fitness(x= default_parameters)

運行超參數優化 (Run Hyper-Parameter Optimization)

We already checked the default hyper-parameter performance. Now we can examine Bayesian optimization from scikit-optimize library. Here we use 40 runs for fitness function, though it is an expensive operation and needs to used carefully with datasets.

我們已經檢查了默認的超參數性能。現在我們可以從scikit-optimize庫檢查貝葉斯優化。在這里，我們使用40個運行來進行適應度函數計算，盡管這是一項昂貴的操作，并且需要謹慎使用數據集。

search_result = gp_minimize(func=fitness,
dimensions=dimensions,
acq_func='EI', # Expected Improvement.
n_calls=40,
x0=default_parameters)

just some last runs shows below:

下面是一些最后的運行：

進度可視化 (Progress visualization)

Using plot_convergence function of skopt, we may see the optimization progress and the best fitness value found on y-axis.

使用skopt的plot_convergence函數，我們可以在y軸上看到優化進度和最佳適應度值。

plot_convergence(search_result) # plt.savefig("Converge.png", dpi=400)

最佳超參數 (Optimal Hyper-Parameters)

Using the serach_result function, we can see the best hyper-parameter that Bayesian-optimizer generated.

使用serach_result函數，我們可以看到貝葉斯優化器生成的最佳超參數。

search_result.x

Optimized hyper-parameters are in order: Learning rate, number of dense layers, number of nodes in each layer, and the best activation function.

優化的超參數按順序排列：學習率，密集層數，每層中的節點數以及最佳激活功能。

We can see all results for 40 calls with corresponding hyper-parameters and fitness values.

我們可以看到40個帶有相應超參數和適用性值的呼叫的所有結果。

sorted(zip(search_result.func_vals, search_result.x_iters))

An interesting point is that the ‘relu’ activation function is almost dominant.

有趣的一點是，“ relu”激活功能幾乎占主導地位。

情節 (Plots)

First, let’s look at 2D plot of two optimized parameters. Here we made landscape-plot of estimated fitness values for learning rate and number of nodes in each layer.The Bayesian optimizer builds a surrogate model of search space and searches inside this dimension rather than real search-space, that is why it is faster. In the plot, the yellow regions are better and blue regions are worse. Balck dots are the optimizer’s sampling location and the red star is the best parameter found.

首先，讓我們看一下兩個優化參數的二維圖。在這里，我們對學習率和每層節點數進行了適合度估計值的景觀圖。貝葉斯優化器建立了搜索空間的替代模型，并在此維度內進行搜索，而不是在實際搜索空間內進行搜索，這就是為什么它更快的原因。在該圖中，黃色區域較好，藍色區域較差。 Balck點是優化程序的采樣位置，紅色星號是找到的最佳參數。

from skopt.plots import plot_objective_2D
fig = plot_objective_2D(result=search_result,
dimension_identifier1='learning_rate',
dimension_identifier2='num_dense_nodes',
levels=50)# plt.savefig("Lr_numnods.png", dpi=400)

Some points:

一些要點：

The surrogate model can be inaccurate because it is built from only 40 samples of calls to the fitness function

替代模型可能不準確，因為它僅由對適應性函數的40個調用樣本構建而成

The plot may change in each time of optimization re-run because of random noise and training process in NN

由于NN中的隨機噪聲和訓練過程，該圖可能在每次優化重新運行時都發生變化

This is 2D plot, while we optimized 4 parameters and could be imagined 4 dimensions.

這是2D圖，我們優化了4個參數，可以想象得到4個維度。

# create a list for plotting
dim_names = ['learning_rate', 'num_dense_layers', 'num_dense_nodes', 'activation' ]fig, ax = plot_objective(result=search_result, dimensions=dim_names)
plt.savefig("all_dimen.png", dpi=400)

In these plots, we can see how the optimization happened. The Bayesian approach tries to fit model parameters with prior info at the points with a higher density of sampling. Gathering all four parameters into a scikit-optimization approach will introduce the best results in this run if the learning rate is about 0.003, the number of dense layers 6, the number of nodes in each layer about 327, and activation function is ‘relu’.

在這些圖中，我們可以看到優化是如何發生的。貝葉斯方法試圖在具有較高采樣密度的點上使模型參數具有先驗信息。如果學習率約為0.003，密集層數為6，每層中的節點數為327，激活函數為“ relu”，則將所有四個參數收集到scikit優化方法中將在此運行中引入最佳結果。。

使用帶有盲數據的優化超參數評估模型 (Evaluate the model with optimized hyper-parameters with blind data)

The same steps of data preparation are required here as well. We skip repeating here. Now we can make a model with optimized parameters to see the prediction.

這里也需要相同的數據準備步驟。我們在這里跳過重復。現在我們可以建立一個具有優化參數的模型以查看預測。

opt_par = search_result.x
# use hyper-parameters from optimization
learning_rate = opt_par[0]
num_layers = opt_par[1]
num_nodes = opt_par[2]
activation = opt_par[3]

create model:

創建模型：

import numpy as npimport tensorflow.kerasfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense, Activationfrom tensorflow.keras.callbacks import EarlyStopping
model = Sequential()
model.add(InputLayer(input_shape=(scaled_features.shape[1])))
model.add(Dense(num_nodes, activation=activation, kernel_initializer='random_normal'))
model.add(Dense(labels.shape[1], activation='softmax', kernel_initializer='random_normal'))
optimizer = Adam(lr=learning_rate)

model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=20,
verbose=1, mode='auto', restore_best_weights=True)
histories = model.fit(X_train,y_train, validation_data=(X_test,y_test),
callbacks=[monitor],verbose=2,epochs=100)

let’s see the model accuracy development:

讓我們看看模型準確性的發展：

plt.plot(histories.history['accuracy'], 'bo')
plt.plot(histories.history['val_accuracy'],'b' )
plt.title('Training and validation accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.savefig("accu.png", dpi=400)
plt.show()

Training and validation accuracy plot shows that almost after 80% accuracy (iteration 10), the model starts to overfit because we can not see improvement in test data prediction accuracy.

訓練和驗證準確性圖顯示，幾乎在80％的準確性(迭代10)之后，該模型就開始過擬合，因為我們看不到測試數據預測準確性的提高。

Let’s evaluate model performance with a dataset that has not seen yet (blind well). We always predict that Machine Learning models will predict with blind data by less accuracy than training process if dataset is small or features are not big enough to cover all complexity of data dimensions.

讓我們用一個尚未見過的數據集評估模型性能(很好)。我們總是預測，如果數據集很小或特征不足以涵蓋所有數據維度的復雜性，則機器學習模型將使用盲數據進行預測，其準確性低于訓練過程。

result = model.evaluate(scaled_features_blind, labels_blind)
print("{0}: {1:.2%}".format(model.metrics_names[1], result[1]))

預測盲井數據和圖表 (Predict Blind Well Data and Plot)

y_pred = model.predict(scaled_features_blind) # result is probability arrayy_pred_idx = np.argmax(y_pred, axis=1) + 1# +1 becuase facies starts from 1 not zero like indexblind['Pred_Facies']= y_pred_idx

function to plot:

繪圖功能：

def compare_facies_plot(logs, compadre, facies_colors):
#make sure logs are sorted by depth
logs = logs.sort_values(by='Depth')
cmap_facies = colors.ListedColormap(
facies_colors[0:len(facies_colors)], 'indexed')

ztop=logs.Depth.min(); zbot=logs.Depth.max()

cluster1 = np.repeat(np.expand_dims(logs['Facies'].values,1), 100, 1)
cluster2 = np.repeat(np.expand_dims(logs[compadre].values,1), 100, 1)

f, ax = plt.subplots(nrows=1, ncols=7, figsize=(12, 6))
ax[0].plot(logs.GR, logs.Depth, '-g', alpha=0.8, lw = 0.9)
ax[1].plot(logs.ILD_log10, logs.Depth, '-b', alpha=0.8, lw = 0.9)
ax[2].plot(logs.DeltaPHI, logs.Depth, '-k', alpha=0.8, lw = 0.9)
ax[3].plot(logs.PHIND, logs.Depth, '-r', alpha=0.8, lw = 0.9)
ax[4].plot(logs.PE, logs.Depth, '-c', alpha=0.8, lw = 0.9)
im1 = ax[5].imshow(cluster1, interpolation='none', aspect='auto',
cmap=cmap_facies,vmin=1,vmax=9)
im2 = ax[6].imshow(cluster2, interpolation='none', aspect='auto',
cmap=cmap_facies,vmin=1,vmax=9)

divider = make_axes_locatable(ax[6])
cax = divider.append_axes("right", size="20%", pad=0.05)
cbar=plt.colorbar(im2, cax=cax)
cbar.set_label((5*' ').join([' SS ', 'CSiS', 'FSiS',
'SiSh', ' MS ', ' WS ', ' D ',
' PS ', ' BS ']))
cbar.set_ticks(range(0,1)); cbar.set_ticklabels('')

for i in range(len(ax)-2):
ax[i].set_ylim(ztop,zbot)
ax[i].invert_yaxis()
ax[i].grid()
ax[i].locator_params(axis='x', nbins=3)

ax[0].set_xlabel("GR")
ax[0].set_xlim(logs.GR.min(),logs.GR.max())
ax[1].set_xlabel("ILD_log10")
ax[1].set_xlim(logs.ILD_log10.min(),logs.ILD_log10.max())
ax[2].set_xlabel("DeltaPHI")
ax[2].set_xlim(logs.DeltaPHI.min(),logs.DeltaPHI.max())
ax[3].set_xlabel("PHIND")
ax[3].set_xlim(logs.PHIND.min(),logs.PHIND.max())
ax[4].set_xlabel("PE")
ax[4].set_xlim(logs.PE.min(),logs.PE.max())
ax[5].set_xlabel('Facies')
ax[6].set_xlabel(compadre)

ax[1].set_yticklabels([]); ax[2].set_yticklabels([]); ax[3].set_yticklabels([])
ax[4].set_yticklabels([]); ax[5].set_yticklabels([]); ax[6].set_yticklabels([])
ax[5].set_xticklabels([])
ax[6].set_xticklabels([])
f.suptitle('Well: %s'%logs.iloc[0]['Well Name'], fontsize=14,y=0.94)

Run:

跑：

compare_facies_plot(blind, 'Pred_Facies', facies_colors)
plt.savefig("Compo.png", dpi=400)

結論 (Conclusion)

In this work, we optimized hyper-parameters using a Bayesian approach with a scikit-learn library called skopt. This approach is superior to a random search and grid search, especially in complex datasets. Using this method, we can get rid of the hand-tuning of hyper-parameters for the neural networks, although in each run, you will face new parameters.

在這項工作中，我們使用貝葉斯方法和一個名為skopt的scikit-learn庫優化了超參數。這種方法優于隨機搜索和網格搜索，尤其是在復雜數據集中。使用這種方法，我們可以擺脫神經網絡超參數的手動調整，盡管在每次運行中，您都將面臨新的參數。

翻譯自: https://towardsdatascience.com/bayesian-hyper-parameter-optimization-neural-networks-tensorflow-facies-prediction-example-f9c48d21f795