日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Lucene4.3.1 拼写检查SpellChecker

發布時間:2025/3/21 编程问答 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Lucene4.3.1 拼写检查SpellChecker 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

2019獨角獸企業重金招聘Python工程師標準>>>

org.apache.lucene.search.spell?
Class SpellChecker

java.lang.Object
?org.apache.lucene.search.spell.SpellChecker

Lucene拼寫檢查類

使用例子:

?SpellChecker?spellchecker?=?new?SpellChecker(spellIndexDirectory);//?To?index?a?field?of?a?user?index:spellchecker.indexDictionary(new?LuceneDictionary(my_lucene_reader,?a_field));//?To?index?a?file?containing?words:spellchecker.indexDictionary(new?PlainTextDictionary(new?File("myfile.txt")));String[]?suggestions?=?spellchecker.suggestSimilar("misspelt",?5);

SpellChecker有三個構造方法,可以根據給定的Directory實例創建SpellChecker對象進行后續操作;

PlainTextDictionary實現了Dictionary接口,并提供3個構造方法,參數分別為:File、InputStream、Reader

上面例子中根據一個文本文件創建PlainTextDirectory字典,該文本文件的格式為每一行包含一個詞,如:

word1 word2 word3

其他:FileDictionary,?HighFrequencyDictionary,?LuceneDictionary

SpellChecker方法:

String [] suggestSimilar(String word,int numSug)

參數:

word-需要檢查的詞

numSug-返回的suggest詞數

其他的:String [] suggestSimilar(...),可以根據精度等進行,詳情請參考官方文檔;

完整代碼示例:

import?org.apache.lucene.document.Document; import?org.apache.lucene.document.Field; import?org.apache.lucene.document.TextField; import?org.apache.lucene.index.DirectoryReader; import?org.apache.lucene.index.IndexReader; import?org.apache.lucene.index.IndexWriter; import?org.apache.lucene.index.IndexWriterConfig; import?org.apache.lucene.queryparser.classic.QueryParser; import?org.apache.lucene.search.IndexSearcher; import?org.apache.lucene.search.Query; import?org.apache.lucene.search.ScoreDoc; import?org.apache.lucene.search.TopDocs; import?org.apache.lucene.search.spell.PlainTextDictionary; import?org.apache.lucene.search.spell.SpellChecker; import?org.apache.lucene.store.Directory; import?org.apache.lucene.store.RAMDirectory; import?org.apache.lucene.util.Version; import?org.wltea.analyzer.lucene.IKAnalyzer;import?java.io.File; import?java.io.IOException; import?java.util.ArrayList; import?java.util.List;public?class?SpellCheckerTest?{private?static?String?filepath?=?"C:\\Users\\Mr_Tank_\\Desktop\\BaseTest\\dictionaryfile.txt";private?Document?document;private?Directory?directory;private?IndexWriter?indexWriter;private?SpellChecker?spellchecker;private?IndexReader?indexReader;private?IndexSearcher?indexSearcher;private?IndexWriterConfig?getConfig()?{return?new?IndexWriterConfig(Version.LUCENE_43,?new?IKAnalyzer(true));}private?IndexWriter?getIndexWriter()?{directory?=?new?RAMDirectory();try?{return?new?IndexWriter(directory,?getConfig());}?catch?(IOException?e)?{e.printStackTrace();return?null;}}/***?Create?index?for?test**?@param?content*?@throws?IOException*/public?void?createIndex(String?content)?{indexWriter?=?getIndexWriter();document?=?new?Document();document.add(new?TextField("content",?content,?Field.Store.YES));try?{indexWriter.addDocument(document);indexWriter.commit();indexWriter.close();}?catch?(IOException?e)?{e.printStackTrace();}}public?ScoreDoc[]?gethits(String?content)?{try?{indexReader?=?DirectoryReader.open(directory);indexSearcher?=?new?IndexSearcher(indexReader);QueryParser?parser?=?new?QueryParser(Version.LUCENE_43,?"content",?new?IKAnalyzer(true));Query?query?=?parser.parse(content);TopDocs?td?=?indexSearcher.search(query,?1000);return?td.scoreDocs;}?catch?(Exception?e)?{e.printStackTrace();return?null;}}/***?@param?scoreDocs*?@return*?@throws?IOException*/public?List<Document>?getDocumentList(ScoreDoc[]?scoreDocs)?throws?IOException?{List<Document>?documentList?=?null;if?(scoreDocs.length?>=?1)?{documentList?=?new?ArrayList<Document>();for?(int?i?=?0;?i?<?scoreDocs.length;?i++)?{documentList.add(indexSearcher.doc(scoreDocs[i].doc));}}return?documentList;}public?String[]?search(String?word,?int?numSug)?{directory?=?new?RAMDirectory();try?{spellchecker?=?new?SpellChecker(directory);spellchecker.indexDictionary(new?PlainTextDictionary(new?File(filepath)),?getConfig(),?true);return?getSuggestions(spellchecker,?word,?numSug);}?catch?(IOException?e)?{e.printStackTrace();return?null;}}private?String[]?getSuggestions(SpellChecker?spellchecker,?String?word,?int?numSug)?throws?IOException?{return?spellchecker.suggestSimilar(word,?numSug);}public?static?void?main(String[]?args)?throws?IOException?{SpellCheckerTest?spellCheckerTest?=?new?SpellCheckerTest();spellCheckerTest.createIndex("開源中國-找到您想要的開源項目,分享和交流");spellCheckerTest.createIndex("CSDN-全球最大中文IT社區");String?word?=?"開園中國";/*ScoreDoc[]?scoreDocs?=?spellCheckerTest.gethits(word);List<Document>?documentList?=?spellCheckerTest.getDocumentList(scoreDocs);if?(documentList.size()?>=?1)?{for?(Document?d?:?documentList)?{System.out.println("搜索結果:"?+?d.get("content"));}}*/String[]?suggest?=?spellCheckerTest.search(word,?5);if?(suggest?!=?null?&&?suggest.length?>=?1)?{for?(String?s?:?suggest)?{System.out.println("您是不是要找:"?+?s);}}?else?{System.out.println("拼寫正確");}} }

dictionaryfile.txt:

中華人民共和國 開源中國 開源社區 Lucene 拼寫檢查 Lucene4.3.1


轉載于:https://my.oschina.net/tanweijie/blog/194046

總結

以上是生活随笔為你收集整理的Lucene4.3.1 拼写检查SpellChecker的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。