當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

5.贝叶斯算法、单词拼写错误案例

發布時間：2024/9/27 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 5.贝叶斯算法、单词拼写错误案例小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

5.貝葉斯算法
5.1.單詞拼寫錯誤案例

5.貝葉斯算法

貝葉斯簡介

貝葉斯(約1701-1761) Thomas Bayes，英國數學家。
貝葉斯方法源于他生前解決一個”逆概”問題寫的一篇文章。
生不逢時，死后它的作品才被世人認可。

貝葉斯要解決的問題：

正向概率：假設袋子里面有N個白球，M個黑球，你伸手進去摸一把，摸出黑球的概率是多大
逆向概率：如果我們事先并不知道袋子里面黑白球的比例，而是閉著眼睛摸出一個（或好幾個）球，觀察這些取出來的球的顏色之后，那么我們可以就此對袋子里面的黑白球的比例作出什么樣的推測

Why貝葉斯？
?現實世界本身就是不確定的，人類的觀察能力是有局限性的。
?我們日常所觀察到的只是事物表面上的結果，因此我們需要提供一個猜測。

男生總是穿長褲，女生則一半穿長褲一半穿裙子
正向概率：隨機選取一個學生，他（她）穿長褲的概率和穿裙子的概率是多大。
逆向概率：迎面走來一個穿長褲的學生，你只看得見他（她）穿的是否長褲，而無法確定他（她）的性別，你能夠推斷出他（她）是女生的概率是多大嗎？

假設學校里面人的總數是U個
穿長褲的（男生）：U * P(Boy) * P(Pants|Boy)

P(Boy)是男生的概率 = 60%
P(Pants|Boy) 是條件概率，即在 Boy 這個條件下穿長褲的
穿長褲的（女生）：U * P(Girl) * P(Pants|Girl)

與總人數有關嗎？

U * P(Girl) * P(Pants|Girl) / [U * P(Boy) * P(Pants|Boy) + U * P(Girl) * P(Pants|Girl)
容易發現這里校園內人的總數是無關的，可以消去
P(Girl|Pants) = P(Girl) * P(Pants|Girl) / [P(Boy) * P(Pants|Boy) + P(Girl) * P(Pants|Girl)]

化簡：

P(Girl|Pants) = P(Girl) * P(Pants|Girl) / [P(Boy) * P(Pants|Boy) + P(Girl) * P(Pants|Girl)]
分母其實就是P(Pants)
分子其實就是P(Pants, Girl)
即：

由此推導出了貝葉斯公式：

5.1.單詞拼寫錯誤案例

import re, collections import sys# 把語料庫的單詞全部抽取出來，轉寫成小寫，并去掉單詞中間的特殊符號 def words(text):return re.findall('[a-z]+', text.lower())""" 如果遇到一個語料庫中沒有的單詞怎么辦? 假如說一個單詞拼寫正確，但是語料庫中沒有包含這個詞，從而這個詞也永遠不會出現現在的訓練集中。if 于是我們要返回出現這個詞的概率是0 代表這個事件絕對不可能發生而在我們的概率模型中我們期望用一個很小的概率來代表這種情況。lambda:1 """ def train(features):model = collections.defaultdict(lambda: 1)for f in features:model[f] += 1return modelNWORDS = train(words(open('./data/big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz'""" 編輯距離:兩個詞之間的編輯距離定義為使用了幾次插入(在詞中插入一個單字母)，交換（交換相鄰兩個字母），替換(把一個字母換成另一個)的操作從一個詞變到另一個詞 """ # 返回所以與單詞w編輯距離為1的集合 def editsl(word):n = len(word)return set([word[0:i] + word[i + 1:] for i in range(n)] + # deletion[word[0:i] + word[i + 1] + word[i] + word[i + 2:] for i in range(n - 1)] + # transposition[word[0:i] + c + word[i + 1:] for i in range(n) for c in alphabet] + # alteration[word[0:i] + c + word[i:] for i in range(n + 1) for c in alphabet]) # insertion# 返回所有與單詞w編輯距離為2的單詞集合 # 在這些編輯距離小于2的中間，只把那些正確的詞作為候選詞 def know_edits2(word):return set(e2 for e1 in editsl(word) for e2 in editsl(e1) if e2 in NWORDS)def known(words):return set(w for w in words if w in NWORDS)def correct(words):candidates = known([words]) or known(editsl(words)) or know_edits2(words) or [words]return max(candidates, key=lambda w: NWORDS[w])def main():while True:str = input("input word:")if str == 'break':returnc = correct(str)print(c)if __name__ == '__main__':main()

在控制臺輸入內容：

./data/big.txt的內容如下：

Spring Cloud provides tools for developers to quickly build some of the common patterns in distributed systems (e.g. configuration management, service discovery, circuit breakers, intelligent routing, micro-proxy, control bus, one-time tokens, global locks, leadership election, distributed sessions, cluster state). Coordination of distributed systems leads to boiler plate patterns, and using Spring Cloud developers can quickly stand up services and applications that implement those patterns. They will work well in any distributed environment, including the developer’s own laptop, bare metal data centres, and managed platforms such as Cloud Foundry. Spring Cloud focuses on providing good out of box experience for typical use cases and extensibility mechanism to cover others. Distributed/versioned configuration Service registration and discovery Routing Service-to-service calls Load balancing Circuit Breakers Global locks Leadership election and cluster state Distributed messaging Getting Started Generating A New Spring Cloud Project The easiest way to get started is visit start.spring.io, select your Spring Boot version and the Spring Cloud projects you want to use. This will add the corresponding Spring Cloud BOM version to your Maven/Gradle file when you generate the project. Adding Spring Cloud To An Existing Spring Boot Application If you an existing Spring Boot app you want to add Spring Cloud to that app, the first step is to determine the version of Spring Cloud you should use. The version you use in your app will depend on the version of Spring Boot you are using. The table below outlines which version of Spring Cloud maps to which version of Spring Boot. The server certificate is a public entity. It is sent to every client that connects to the server. The private key is a secure entity and should be stored in a file with restricted access, however, it must be readable by nginx’s master process. The private key may alternately be stored in the same file as the certificate: in which case the file access rights should also be restricted. Although the certificate and the key are stored in one file, only the certificate is sent to a client. SSL operations consume extra CPU resources. On multi-processor systems several worker processes should be run, no less than the number of available CPU cores. The most CPU-intensive operation is the SSL handshake. There are two ways to minimize the number of these operations per client: the first is by enabling keepalive connections to send several requests via one connection and the second is to reuse SSL session parameters to avoid SSL handshakes for parallel and subsequent connections. The sessions are stored in an SSL session cache shared between workers and configured by the ssl_session_cache directive. One megabyte of the cache contains about 4000 sessions. The default cache timeout is 5 minutes. It can be increased by using the ssl_session_timeout directive. Here is a sample configuration optimized for a multi-core system with 10 megabyte shared session cache: Some browsers may complain about a certificate signed by a well-known certificate authority, while other browsers may accept the certificate without issues. This occurs because the issuing authority has signed the server certificate using an intermediate certificate that is not present in the certificate base of well-known trusted certificate authorities which is distributed with a particular browser. In this case the authority provides a bundle of chained certificates which should be concatenated to the signed server certificate. The server certificate must appear before the chained certificates in the combined file: Most machine learning workflows involve working with data, creating models, optimizing model parameters, and saving the trained models. This tutorial introduces you to a complete ML workflow implemented in PyTorch, with links to learn more about each of these concepts. We’ll use the FashionMNIST dataset to train a neural network that predicts if an input image belongs to one of the following classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, or Ankle boot. This tutorial assumes a basic familiarity with Python and Deep Learning concepts. To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU if available. We will use a problem of fitting y=sin(x) with a third order polynomial as our running example. The network will have four parameters, and will be trained with gradient descent to fit random data by minimizing the Euclidean distance between the network output and the true output. Numpy provides an n-dimensional array object, and many functions for manipulating these arrays. Numpy is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. However we can easily use numpy to fit a third order polynomial to sine function by manually implementing the forward and backward passes through the network using numpy operations: Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won’t be enough for modern deep learning. Here we introduce the most fundamental PyTorch concept: the Tensor. A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Behind the scenes, Tensors can keep track of a computational graph and gradients, but they’re also useful as a generic tool for scientific computing. Also unlike numpy, PyTorch Tensors can utilize GPUs to accelerate their numeric computations. To run a PyTorch Tensor on GPU, you simply need to specify the correct device. Here we use PyTorch Tensors to fit a third order polynomial to sine function. Like the numpy example above we need to manually implement the forward and backward passes through the network:

總結

以上是生活随笔為你收集整理的5.贝叶斯算法、单词拼写错误案例的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。