机器学习入门02-朴素贝叶斯原理和java实现
? ? ? ? 樸素貝葉斯是一種基于概率統計的機器學習方法,其原理采用計算樣本數據中先驗數據與標簽數據出現的概率,以先驗數據對應最大出現概率的標簽作為預測結果。
? ? ? ? 樸素貝葉斯公式一般記為:P(A|B)=P(A)*P(B|A)/P(B),即當我們要預測先驗條件B出現時A出現的概率時,可以通過條件A出現概率與條件A出現時條件B出現概率的乘積除以條件B的概率。在計算中P(A)*P(B|A)等效于計算條件A與條件B同時出現概率,即P(A∩B)。
? ? ? ? 也就是說,樸素貝葉斯的訓練過程主要就是統計條件概率P(B)和聯合概率P(A∩B),由于樸素貝葉斯是統計各種條件出現概率,對條件本身數據格式沒有具體要求,因此適用于文本型數據的預測,可以應用于郵件過濾、新聞分類等方面。
? ? ? ? 訓練數據由標簽和先驗條件組成,這里假定一組訓練數據,結果標簽由1、2、3表示,先驗條件假設兩個維度,第一個維度由A、B、C、D組成,第二個維度由a、b、c、d組成。
如圖:
? ? ? ? 第一步是讀取訓練數據,訓練數據為csv格式,讀取數據時使用逗號分割標簽數據和不同維度數據。
讀取訓練數據:
public List<List<String>> readTrainFile(File trainFile) throws Exception {List<List<String>> resultList = new ArrayList<List<String>>();if (trainFile.exists()) {BufferedReader reader = new BufferedReader(new FileReader(trainFile));String line;while ((line = reader.readLine()) != null) {String[] strings = line.split(",");List<String> lineList = new ArrayList<String>();for (int i = 0; i < strings.length; i++) {lineList.add(strings[i]);}resultList.add(lineList);}reader.close();}return resultList;}第二步,計算聯合概率:
private Map<String, Double> caculateUnionProbability(List<List<String>> trainData) {Map<String, Double> result = new HashMap<String, Double>();int dataSize = trainData.size();double singleProbability = 1 / (double) dataSize;for (int i = 0; i < dataSize; i++) {List<String> line = trainData.get(i);if (null != line) {String key = new String();for(int j = 0; j < line.size(); j++) {key += line.get(j);}if (result.containsKey(key)) {result.put(key, result.get(key) + singleProbability);} else {result.put(key, singleProbability);}}}return result;}第三步,計算條件概率:
private Map<String, Double> caculateConditionProbability(List<List<String>> trainData) {Map<String, Double> result = new HashMap<String, Double>();int dataSize = trainData.size();double singleProbability = 1 / (double) dataSize;for (int i = 0; i < dataSize; i++) {List<String> line = trainData.get(i);line.remove(0);if (null != line) {String key = new String();for(int j = 0; j < line.size(); j++) {key += line.get(j);}if (result.containsKey(key)) {result.put(key, result.get(key) + singleProbability);} else {result.put(key, singleProbability);}}}return result;}第四步,將訓練結果寫入模型文件:
private void writeTrainResult(List<String> tags,Map<String, Double> unionProbability, Map<String, Double> conditionProbability,File resultFile) throws Exception {resultFile.createNewFile();FileWriter writer = new FileWriter(resultFile);for (int i = 0; i < 3; i++) {if (i == 0) {String allTag = new String();for (int j = 0; j < tags.size(); j++) {String tag = tags.get(j);if(j < tags.size() - 1) {allTag += tag + ",";}else {allTag += tag;}}allTag = "tags-" + allTag;writer.write(allTag);writer.write("\r\n");}else if (i == 1) {// 寫入聯合概率List<String> keyList = new ArrayList<String>();Set<String> keys = unionProbability.keySet();Iterator<String> iterator = keys.iterator();String firstKey = iterator.next();keyList.add(firstKey);while (iterator.hasNext()) {String key = iterator.next();keyList.add(key);}for (int j = 0; j < keyList.size(); j++) {String key = keyList.get(j);Double value = unionProbability.get(key);writer.write(key + "-" + value.toString());writer.write("\r\n");}} else {// 寫入條件概率List<String> keyList = new ArrayList<String>();Set<String> keys = conditionProbability.keySet();Iterator<String> iterator = keys.iterator();String firstKey = iterator.next();keyList.add(firstKey);while (iterator.hasNext()) {String key = iterator.next();keyList.add(key);}for (int j = 0; j < keyList.size(); j++) {String key = keyList.get(j);Double value = conditionProbability.get(key);writer.write(key + "-" + value.toString());writer.write("\r\n");}}}writer.close();}第五步,預測:
public String predict(File resultFile, String conditionB) throws Exception{String result = new String();Map<String, Double> results = new HashMap<String, Double>();String[] conditionAll = getAllCondition(resultFile);//分別計算不同分類對應概率for(int i = 0; i < conditionAll.length; i++) {String condition = conditionAll[i];String unionCondition = condition + conditionB;double res = predictProbability(resultFile, conditionB,unionCondition);results.put(condition, res);}//取出最大概率對應分類作為結果double max = 0;for(int i = 0; i < results.size(); i++) {double res = results.get(conditionAll[i]);if (res > max) {max = res;result = conditionAll[i];}}return result;} 測試: //測試public static void main(String[] args) throws Exception {NaiveBaysian naiveBaysian = new NaiveBaysian();naiveBaysian.train(new File("C:/Users/admin/Desktop/1.txt"), new File("C:/Users/admin/Desktop/2.bys"));String result = naiveBaysian.predict(new File("C:/Users/admin/Desktop/2.bys"), "Da");System.out.println(result);}? ? ? ?測試中訓練模型和預測是一起做的,實際應用中,只需要讀取訓練好的模型文件,用預測部分代碼即可完成樸素貝葉斯的計算。
總結
以上是生活随笔為你收集整理的机器学习入门02-朴素贝叶斯原理和java实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 图像处理之添加文字水印
- 下一篇: 机器学习入门04-线性回归原理与java