日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

敏感词检测算法

發布時間:2023/12/10 编程问答 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 敏感词检测算法 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

思路:DFA算法

確定性有窮自動機,用于正則表達式的匹配,最長左子式匹配

/*** 檢測敏感詞** @param scriptText* @param matchType* @return*/public static Set<String> checkSensitiveWord(String scriptText, int matchType) {Set<String> sensitiveWordSet = new HashSet<>();for (int i = 0; i < scriptText.length(); i++) {int length = testSensitiveWord(scriptText, i, matchType, sensitiveWordMap);if (length > 0) {sensitiveWordSet.add(scriptText.substring(i, i + length));i = i + length - 1;}}return sensitiveWordSet;}

構建敏感詞map

public static void initSensitiveWordMap(List<WordSenstive> wordSenstives) {log.info("開始初始化敏感詞map");List<String> collect = wordSenstives.stream().map(a -> a.getSenstiveWord()).collect(Collectors.toList());Set<String> keyWordSet = new HashSet<String>(collect);Map<String, String> newWorMap = null;String key = null;Map nowMap = null;sensitiveWordMap = new HashMap(keyWordSet.size());Iterator<String> iterator = keyWordSet.iterator();while (iterator.hasNext()) {key = iterator.next();if (key == null) {continue;}nowMap = sensitiveWordMap;for (int i = 0; i < key.length(); i++) {char keyChar = key.charAt(i);Object wordMap = nowMap.get(keyChar);if (wordMap != null) {nowMap = (Map) wordMap;} else {newWorMap = new HashMap<String, String>();newWorMap.put("isEnd", "0");nowMap.put(keyChar, newWorMap);nowMap = newWorMap;}if (i == key.length() - 1) {nowMap.put("deepCount", i + 1 + "");nowMap.put("isEnd", "1");}}}log.info("敏感詞map構建完成");}

匹配敏感詞

private static int testSensitiveWord(String scriptText, int index, int matchType, Map sensitiveWordMap) {boolean flag = false;int matchFlag = 0;char word = 0;Map nowMap = sensitiveWordMap;for (int i = index; i < scriptText.length(); i++) {word = scriptText.charAt(i);nowMap = (Map) nowMap.get(word);if (nowMap != null) {matchFlag++;//找到相應的key,匹配標識+1if ("1".equals(nowMap.get("isEnd"))) {Integer deepCount = Integer.valueOf((String) nowMap.get("deepCount"));flag = isWord(scriptText, i, deepCount);if (1 == matchType || flag) {//1:最小匹配,2:全匹配break;}}} else {break;}}if (matchFlag < 2 || !flag) {matchFlag = 0;}return matchFlag;}

匹配是否是單詞

private static boolean isWord(String scriptText, int i, int deepCount) {boolean isWord = true;if (i - deepCount >= 0 && scriptText.charAt(i - deepCount) > 96 && scriptText.charAt(i - deepCount) < 123) {isWord = false;}return isWord;}

總結

以上是生活随笔為你收集整理的敏感词检测算法的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。