當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

信号处理深度学习机器学习_机器学习和信号处理如何融合？

發布時間：2023/12/15 pytorch 31 豆豆

生活随笔收集整理的這篇文章主要介紹了信号处理深度学习机器学习_机器学习和信号处理如何融合？小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

信號處理深度學習機器學習

As a design engineer, I am increasingly exposed to more complex engineering challenges — some of which require a blend of multidisciplinary expertise. The Advanced Machine Learning and Signal Processing course provided me with the window to understand how machine learning and signal processing can be integrated and applied together.

作為一名設計工程師，我越來越多地面臨更復雜的工程挑戰，其中一些挑戰需要綜合多學科的專業知識。高級機器學習和信號處理課程為我提供了一個窗口，以了解如何將機器學習和信號處理集成在一起并一起應用。

Although the title of the course sounded daunting at first, it is not difficult to follow. However, prior knowledge of statistics and calculus will come in handy during the later parts of the course. This course is delivered by IBM Chief Data Scientist Romeo Kienzler and Nickolay Manchev and is offered by Coursera as part of a 4-course IBM Advanced Data Science Specialization.

盡管課程的名稱起初聽起來令人生畏，但并不難遵循。但是，在課程的后半部分將很容易掌握統計學和微積分的先驗知識。該課程由IBM首席數據科學家Romeo Kienzler和Nickolay Manchev提供，Coursera作為4門課程IBM Advanced Data Science Specialization的一部分提供。

The course is structured into 4 weeks, whereby Week 1 to Week 3 covers machine learning concepts and algorithms. On Week 4, the signal processing part is covered and we are shown how to integrate signal processing and machine learning.

該課程分為四個星期，第1周至第3周涵蓋了機器學習的概念和算法。在第4周，介紹了信號處理部分，并向我們展示了如何集成信號處理和機器學習。

第一周 (Week 1)

This week sets the stage for the upcoming weeks and we started off exploring linear algebra used in machine learning. We are first introduced to different data objects: scalar, vector, matrix, and tensor; as well as the mathematical operations that can be applied to them (dot product, vector-matrix multiplication).

牛逼了為期一周設置舞臺為即將到來的星期，我們開始了探索機器學習使用線性代數。首先介紹不同的數據對象：標量，向量，矩陣和張量；以及可以應用于它們的數學運算(點積，向量矩陣乘法)。

Data objectsA scalar is one dimensional and can be any integer, ie:

數據對象標量是一維的，可以是任何整數，即：

1, 5, -18

A vector is a group of scalars and can only contain one datatype, ie:

向量是一組標量，并且只能包含一個數據類型，即：

(0,11,135)

A tuple is similar to a vector, but it can contain multiple data types, ie:

元組類似于向量，但是它可以包含多種數據類型，即：

(0,1,7.8) #notice the integer and float datatypes

Matrices are a list of equal-sized vectors and tensors are matrixes in 3 or higher dimensions. Tensors are useful for image processing (for example, 1 dimension is used for width, another for color, and another for height).

矩陣是等長向量的列表，張量是3維或更高維的矩陣。張量對于圖像處理很有用(例如，一維用于寬度，另一維用于顏色，另一維用于高度)。

We then learned about higher-dimensional vector spaces and the intuition to picture and split them.

然后，我們了解了高維向量空間以及圖片的直觀認識并對其進行了拆分。

To split in 1-dimension, we use a point.
要分割成一維，我們使用一個點。
To split in 2-dimensions, we use a line.
要分成二維，我們使用一條線。
To split in 3-dimensions, we use a plane.
為了分成3維，我們使用一個平面。
To split higher than 3-dimensions, we use a hyperplane.
為了分割高于3維的圖像，我們使用一個超平面。

There are 2 types of machine learning:

機器學習有兩種類型：

Supervised Learning: Trained on datasets with output labels, i.e. a flag is assigned to a data point. The algorithm aims to solve the Y = f(x) equation by predicting what Y will be when we change x. Supervised learning is broken down into:

監督學習：在帶有輸出標簽的數據集上進行訓練，即將標志分配給數據點。該算法旨在解決Y = F(X)的預測，當我們改變X Y是什么方程。監督學習可分為：

Classification — predicts a discrete value i.e.: “True/False”
分類 -預測離散值，即：“ True / False”
Regression — predicts a continuous value ie: “Price of mobile phones in the future”
回歸 -預測一個連續的值，即：“未來手機的價格”

2. Unsupervised Learning: Trained on datasets without output labels. The algorithm attempts to discover patterns from the input data by grouping similar data together. An example of unsupervised learning is Clustering where data are segregated into clusters (More on this during Week 3).

2. 無監督學習：對沒有輸出標簽的數據集進行訓練。該算法嘗試通過將相似數據分組在一起來從輸入數據中發現模式。無監督學習的一個示例是“ 聚類” ，其中將數據隔離到聚類中(第3周中有更多相關內容)。

Photo by Maria Shanina on Unsplash Maria Shanina在Unsplash上拍攝的照片

PySpark allows us to modularize data pipelines. By using the Spark libraries of StringIndexer, OneHotEncoder, VectorAssembler, and Normalizer, we can ensure all the pre-processing is fully aligned and taken care of before the ML model takes place. The idea is to condense these pre-processing parts into “Pipelines” and take advantage of this flexibility to quickly switch between different training parameters to optimize for the best machine learning model.

PySpark允許我們對數據管道進行模塊化。通過使用StringIndexer星火庫，OneHotEncoder，VectorAssembler 和規范化器，我們可以確保在ML模型發生之前，所有預處理均已完全對齊并得到妥善處理。想法是將這些預處理部分壓縮為“管道”，并利用這種靈活性在不同的訓練參數之間快速切換，以優化最佳的機器學習模型。

An ML pipeline exampleML管道示例 from pyspark.ml import Pipeline
pipeline = Pipeline(stages = (string_indexer, encoder, vectorAssembler, normaliser))model = pipeline.fit(df)

prediction = model.transform(df)

All the coding assignments are completed in IBM Watson Studio and this week’s hands-on approach builds the foundation for the steps required to set up and run a machine learning project.

所有編碼任務均在IBM Watson Studio中完成，本周的動手方法為設置和運行機器學習項目所需的步驟奠定了基礎。

第二周 (Week 2)

Common machine learning models and concepts are presented this week and the lectures are a mixture of explaining theories and demonstrating how to use an algorithm with Apache SparkML. The course presenters did a great job to explain the following concepts that are crucial for a solid machine learning foundation.

?ommon機器學習模型和概念本周介紹和講座是解釋理論和演示如何使用與Apache SparkML的算法的混合物。課程演示者出色地解釋了以下對于扎實的機器學習基礎至關重要的概念。

Linear Regression and Batch Gradient Descent A Linear Regression model is a model based on supervised learning and can be described by the following function:

線性回歸和批次梯度下降線性回歸模型是基于監督學習的模型，可以通過以下函數進行描述：

y = w? + w?x? + w?x? +w?x?...+w?x? ; whereby w is the feature weight

Its main function is to predict a dependent variable (y) based on the independent variables (x) that are supplied. Its counterpart, Logistic Regression is used to predict a discrete value. The feature weights are determined using the Gradient Descent strategy and Batch Gradient Descent happens to be one of these optimization strategies.

它的主要功能是根據提供的自變量(x)預測因變量(y)。它的對應項Logistic回歸用于預測離散值。使用權重下降策略確定特征權重，而批次梯度下降恰好是這些優化策略之一。

Batch Gradient Descent to find a global minima批量梯度下降以找到全局最小值

Batch Gradient Descent is guaranteed to find the global minimum of the cost function — given there is enough time and the learning rate is not too high. (imagine trying to reach the bottom of a valley). It is slow over a large dataset as the model is only updated after evaluating the error for every sample in the dataset.

保證有足夠的時間并且學習率不太高，可以保證批次梯度下降找到成本函數的全局最小值。 (想象一下要到達山谷的底部)。在大型數據集上速度較慢，因為僅在評估數據集中每個樣本的誤差后才更新模型。

We then briefly learned about splitting our dataset into training and validation datasets, as well as the method to evaluate if the prediction results are underfitting or overfitting.

然后，我們簡要了解了將我們的數據集分為訓練和驗證數據集以及評估預測結果是擬合不足還是擬合的方法。

The Naive Bayes TheoremEven though this topic is challenging, Nickolay managed to summarize it into a digestible yet, in-depth manner. We were shown the intuitions and mathematics behind the Naive Bayes theorem such as:

樸素貝葉斯定理即使這個話題具有挑戰性，尼克利仍然設法將其概括為易于理解但深入的方式。我們看到了樸素貝葉斯定理背后的直覺和數學，例如：

Sum Rule and Product Rule — allows us to solve most probabilities problems
求和規則和乘積規則-使我們能夠解決大多數概率問題
The Gaussian Distribution, or more commonly known as the Normal Distribution — data that is symmetrically distributed by the mean.
高斯分布，或更通常稱為正態分布-通過均值對稱分布的數據。
The Central Limit Theorem — all the means of a sufficiently large enough sample will end up in a Gaussian Distribution
中心極限定理-足夠大樣本的所有均值將最終達到高斯分布
The Bayesian Interference — the probability of a hypothesis is updated as new evidence becomes available
貝葉斯干擾-假設的概率隨著新證據的出現而更新

Up until this point, there is already a lot of information for Week 2, but we’re not quite done yet!

到目前為止，第二周已經有很多信息，但是我們還沒有完成！

Support Vector MachinesWe moved on to a quick introduction of Support Vector Machines (SVMs) and learned how they lost ground to Gradient Boosting in terms of popularity. SVMs are linear classifiers and to turn them into a non-linear classifier, the “kernel” method can be deployed. This method, or “trick” that Romeo calls it, transforms training data onto the “feature” space and allows a hyperplane to cleanly separate data points. The examples in this part of the course were intuitive, beginner-friendly, and easy to remember.

支持向量機我們繼續快速介紹了支持向量機(SVM)，并了解了它們如何在普及度方面不及Gradient Boosting。 SVM是線性分類器，為了將它們變成非線性分類器，可以部署“內核”方法。羅密歐稱之為“特技”的這種方法將訓練數據轉換為“特征”空間，并允許超平面干凈地分離數據點。本部分課程的示例直觀，對初學者友好且易于記憶。

The Kernel method in SVMSVM中的內核方法

This week was nicely wrapped up with Decision Trees, RandomForests, and Gradient Boosted Trees. We were shown the methods of sampling and the pros and cons of each model. There is also a coding exercise that ties up what we learned in Week 1 and Week 2 which demonstrates the application and prediction accuracy of a GBTClassifier using SparkML.

本周的決策樹，RandomForests和梯度增強樹非常好。我們獲得了抽樣方法以及每種模型的利弊。還有一個編碼練習與我們在第1周和第2周中學到的內容聯系在一起，它演示了使用SparkML的GBTClassifier的應用和預測準確性。

第三周 (Week 3)

Ifelt stretched the most this week with the introduction of more advanced topics such as Clustering and Principal Component Analysis (PCA). We learn about unsupervised machine learning where we try to fathom the distances between point clouds. There are various methods to measure distance i.e. subtracting between 2 points; Pythagoras’ Theorem, Euclidean Distance, and Manhattan Distance.

在本周，我感到最緊張的是引入了更高級的主題，例如聚類和主成分分析(PCA) 。我們了解了無監督的機器學習，我們試圖了解點云之間的距離。有多種測量距離的方法，即在2個點之間減去。畢達哥拉斯定理，歐幾里得距離和曼哈頓距離。

We learned about clustering using K-means algorithm (the ‘Hello World’ clustering algorithm) and the Hierarchical algorithm. K-means works in a way that we specify the number of expected clusters, then the algorithm will draw a hypersphere in the 3D space and tries to group each data point from the point clouds into clusters based on the nearest center.

我們了解了使用K-means算法 (“ Hello World”聚類算法)和Hierarchical算法進行聚類的知識。 K-means的工作方式是指定期望的簇數，然后該算法將在3D空間中繪制一個超球面，并嘗試根據最近的中心將點云中的每個數據點分組為簇。

K-means ClusteringK均值聚類

The Curse of Dimensionality We often have to choose the trade-off between having faster computing time and having more dimensions (features) while maintaining a certain level of accuracy. The number of samples becomes increasingly sparse as we add more dimensions and distances between the data points lose meaning. To solve this problem, PCA is used to reduce the number of dimensions while maintaining the distances between the original data points as much as possible. These distances are crucial in classification as they allow us to predict data-points into different classes.

維數的詛咒我們通常必須在保持更快的計算速度和擁有更多維數(特征)之間進行權衡，同時保持一定的準確性。隨著我們添加更多維度和數據點之間的距離失去意義，樣本數量變得越來越稀疏。為了解決此問題，PCA用于減少維數，同時盡可能保持原始數據點之間的距離。這些距離對于分類至關重要，因為它們使我們能夠將數據點預測為不同的類別。

There are a couple of quizzes this week, but they are sufficient to test my understanding of the material. Similar to the previous weeks, there is a coding exercise to consolidate the material learned so far.

這周有幾個測驗，但足以測試我對這些材料的理解。與前幾周相似，我們進行了編碼練習，以鞏固到目前為止所學的材料。

第四周 (Week 4)

I was looking forward to this week the most as we finally cover the signal processing part — a topic that I am very keen to explore.

我最期待本周的內容，因為我們最終將介紹信號處理部分，這是我非常想探索的話題。

SignalsThere are 3 variables to describe a signal: 1. Frequency — how many occurrences of the signal2. Amplitude — the strength of the signal 3. Phase shift — how much horizontal offset a signal moved from its original position.

信號有3個變量來描述信號：1.頻率-信號2的出現次數。幅度-信號的強度3.相移-信號從其原始位置移動了多少水平偏移。

Variables that describe continuous-time signals描述連續時間信號的變量

Most signals can be generated or described using the following formula:

可以使用以下公式生成或描述大多數信號：

y(t) = Asin(2*pi*ft + φ) whereby, A = amplitude, f = frequency, t = time, φ = phase shift

Fourier transform Fourier Transform allows us to decompose a complex signal — be they music, speech, or an image into their constituent signals via a series of sinusoids.

傅立葉變換傅立葉變換使我們可以通過一系列正弦波將復雜的信號(無論是音樂，語音還是圖像)分解為它們的組成信號。

In terms of images, we can decompose an image’s colors into its RGB constituents.

在圖像方面，我們可以將圖像的顏色分解為RGB成分。

Photo by Andriyko Podilnyk on Unsplash照片由Andriyko Podilnyk在Unsplash上拍攝 RGB decomposition of an image圖像的RGB分解

Nickolay used the example of a piano chord to identify its individual notes using Fourier transform. Sound waves are generated when vibrations travel across a medium. When we have different levels of vibrations, we will obtain different waveform amplitudes. These amplitudes can be summed, or superpositioned to obtain the net response of each individual amplitude.

尼古拉(Nickolay)以鋼琴和弦為例，通過傅立葉變換來識別其各個音符。振動在介質上傳播時會產生聲波。當我們有不同程度的振動時，我們將獲得不同的波形幅度。可以將這些幅度相加或疊加以獲得每個單獨幅度的凈響應。

We can study waveforms from two domains —the time domain and the frequency domain. Fourier Transform and Inverse Fourier Transform translates between these domains. (The former from time to frequency domain and the latter from frequency to time domain).

我們可以從兩個域( 時域和頻域)研究波形。傅立葉變換和逆傅立葉變換在這些域之間轉換。 (前者從時域到頻域，后者從時域到頻域)。

The chapter is concluded by linking signal processing and machine learning through the Wavelets topic. Fourier Transform works well on stationary signals but in real life, we constantly work with non-stationary signals. Fourier transform cannot provide information on when specific frequencies occur in these signals. Hence, wavelet transform is required to overcome this limitation using a scaleogram, which is a visual representation of a wavelet transform. To fully understand wavelet and Fourier transform, Nickolay explained the maths behind them.

通過小波主題將信號處理和機器學習聯系起來，結束本章。傅立葉變換在固定信號上效果很好，但在現實生活中，我們一直在處理非固定信號。傅立葉變換無法提供有關這些信號何時出現特定頻率的信息。因此，需要小波變換以使用比例圖克服此限制，比例圖是小波變換的視覺表示。為了完全理解小波和傅立葉變換，尼克拉伊解釋了它們背后的數學原理。

The final programming assignment is an exercise on how to classify a signal using machine learning. For someone looking into how sensor data and machine learning works, the topics and exercises for this week are truly fascinating and interesting.

最終的編程任務是關于如何使用機器學習對信號進行分類的練習。對于那些研究傳感器數據和機器學習如何工作的人來說，本周的主題和練習確實令人著迷且有趣。

結論 (Conclusion)

I find the products I am working on are getting smarter and fitted with more sensors. Often, having multi-disciplinary skills is advantageous to deliver a project. This course “kills 2 birds” for me by providing some of the important topics of signal processing and machine learning.

我發現我正在研究的產品越來越智能，并配備了更多的傳感器。通常，具有多學科技能對于交付項目是有利的。通過提供一些信號處理和機器學習的重要主題，本課程為我“殺了兩只鳥”。

The delivery pace is good, and the explanations are decent enough for a professional or a curious casual learner. For some of the topics that are taught — especially the Bayesian theorem, PCA, and Wavelet transform parts — further study and research are important to gain a deeper understanding.

交付速度很好，而且對于專業人士或好奇的臨時學習者來說，解釋也足夠不錯。對于所講授的某些主題，尤其是貝葉斯定理，PCA和小波變換部分，進行深入的研究和研究對于加深理解至關重要。

There are other courses that cover more in-depth about machine learning and signal processing but what I truly enjoyed about this course was the connection and ‘blending’ of these 2 topics that are so interrelated to each other.

還有其他課程涵蓋了機器學習和信號處理方面的更深入信息，但是我真正喜歡這門課程的是這兩個主題之間的聯系和“融合”，這兩個主題相互關聯。

With IBM Watson Studio being used in the assignments of this course, there is another added advantage to be exposed to IBM’s cloud tools and services. The course seems to be providing the learner to choose their own difficulty for the assignments. It is not challenging to pass the assignments as there is a lot of hand-holding and guidance. But there is always an option to really understand the code that is shown and that takes time for beginner programmers.

由于本課程的作業中使用了IBM Watson Studio，因此使用IBM的云工具和服務還有另一個附加的優勢。該課程似乎為學習者提供了自己選擇作業的難度。通過作業并不困難，因為需要大量的指導和指導。但是總有一個選項可以真正理解所顯示的代碼，這對于初學者來說很費時間。

I am looking forward to applying what I learned from this course and I hope by writing this summary, I solidify my learning and you have learned something as well.

我期待著應用從本課程中學到的知識，并希望通過撰寫此摘要，鞏固我的學習經驗，并且您也學到了一些東西。

翻譯自: https://towardsdatascience.com/how-do-machine-learning-and-signal-processing-blend-4f48afbb6dce

信號處理深度學習機器學習

總結

以上是生活随笔為你收集整理的信号处理深度学习机器学习_机器学习和信号处理如何融合？的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：可持久化数据结构加扫描线_结构化光扫描
下一篇：打响“新年第一炮”？李想称小鹏P7日行灯