當(dāng)前位置：首頁(yè) > 编程语言 > java >内容正文

java

Lucene之Java实战

發(fā)布時(shí)間：2024/6/21 java 39 豆豆

生活随笔收集整理的這篇文章主要介紹了 Lucene之Java实战小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1.導(dǎo)包

2.索引的創(chuàng)建

2.1首先，我們需要定義一個(gè)詞法分析器。

Analyzer analyzer = new IKAnalyzer();//官方推薦 Analyzer analyzer = new StandardAnalyzer();

2.2第二步，確定索引文件存儲(chǔ)的位置，Lucene提供給我們兩種方式：

2.2.1本地文件存儲(chǔ)?

?Directory directory = FSDirectory.open(new File("D:\\JavaWeb\\Lucene"));

2.2.2 內(nèi)存存儲(chǔ)

Directory directory = new RAMDirectory();

2.3第三步，創(chuàng)建IndexWriter，進(jìn)行索引文件的寫(xiě)入。

IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer); IndexWriter indexWriter = new IndexWriter(directory, config);

2.4第四步，內(nèi)容提取，進(jìn)行索引的存儲(chǔ)。

Document doc = new Document();//申請(qǐng)了一個(gè)document對(duì)象，這個(gè)類(lèi)似于數(shù)據(jù)庫(kù)中的表中的一行。 String text = "This is the text to be indexed.";//即將索引的字符串 Field fileNameField = new TextField("fileName", text, Store.YES); doc.add(fileNameField);//把字符串存儲(chǔ)起來(lái) indexWriter.addDocument(doc);//把doc對(duì)象加入到索引創(chuàng)建中 indexWriter.close();//關(guān)閉IndexWriter,提交創(chuàng)建內(nèi)容

lucene常見(jiàn)Field

IntField主要對(duì)int類(lèi)型的字段進(jìn)行存儲(chǔ)，需要注意的是如果需要對(duì)InfField進(jìn)行排序使用SortField.Type.INT來(lái)比較，如果進(jìn)范圍查詢(xún)或過(guò)濾，需要采用NumericRangeQuery.newIntRange()

LongField	主要處理Long類(lèi)型的字段的存儲(chǔ)，排序使用SortField.Type.Long,如果進(jìn)行范圍查詢(xún)或過(guò)濾利用NumericRangeQuery.newLongRange()，LongField常用來(lái)進(jìn)行時(shí)間戳的排序，保存System.currentTimeMillions()
FloatField	對(duì)Float類(lèi)型的字段進(jìn)行存儲(chǔ)，排序采用SortField.Type.Float,范圍查詢(xún)采用NumericRangeQuery.newFloatRange()
BinaryDocVluesField	只存儲(chǔ)不共享值，如果需要共享值可以用SortedDocValuesField
NumericDocValuesField?	用于數(shù)值類(lèi)型的Field的排序(預(yù)排序)，需要在要排序的field后添加一個(gè)同名的NumericDocValuesField
SortedDocValuesField	用于String類(lèi)型的Field的排序，需要在StringField后添加同名的SortedDocValuesField
StringField	用戶(hù)String類(lèi)型的字段的存儲(chǔ)，StringField是只索引不分詞
TextField	對(duì)String類(lèi)型的字段進(jìn)行存儲(chǔ)，TextField和StringField的不同是TextField既索引又分詞
StoredField	存儲(chǔ)Field的值，可以用IndexSearcher.doc和IndexReader.document來(lái)獲取此Field和存儲(chǔ)的值

實(shí)戰(zhàn)代碼

@Testpublic void testIndex() throws Exception {ApplicationContext ac = new ClassPathXmlApplicationContext("applicationContext.xml");BooksMapper booksMapper = ac.getBean(BooksMapper.class);/*Books book= booksMapper.selectByPrimaryKey(4939);System.out.println(book.getTitle());*/List<Books> listBooks=booksMapper.selectBookList();System.out.println(listBooks.size());Analyzer analyzer = new IKAnalyzer();Directory directory = FSDirectory.open(new File("D:\\JavaWeb\\Lucene"));IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);IndexWriter indexWriter = new IndexWriter(directory, config);for (int i = 0; i < listBooks.size(); i++) {Document doc = new Document();//申請(qǐng)了一個(gè)document對(duì)象，這個(gè)類(lèi)似于數(shù)據(jù)庫(kù)中的表中的一行。 String title = listBooks.get(i).getTitle();Field filedTitle = new TextField("title", title, Store.YES);doc.add(filedTitle);String isbn = listBooks.get(i).getIsbn();Field filedISBN = new TextField("isbn", isbn, Store.YES);doc.add(filedISBN);int wordsCount = listBooks.get(i).getWordscount();Field filedWordsCount = new LongField("WordsCount", wordsCount, Store.YES);doc.add(filedWordsCount);indexWriter.addDocument(doc);//把doc對(duì)象加入到索引創(chuàng)建中 }indexWriter.close();//關(guān)閉IndexWriter,提交創(chuàng)建內(nèi)容}

luke-5.0查看索引結(jié)果（定位到luke-5.0所在的目錄，然后輸入碼命令java -jar luke-5.0.jar）

3.索引的查詢(xún)

@Testpublic void testSearch() throws Exception {// 第一步：創(chuàng)建一個(gè)Directory對(duì)象，也就是索引庫(kù)存放的位置。Directory directory = FSDirectory.open(new File("D:\\temp\\index"));// 磁盤(pán)// 第二步：創(chuàng)建一個(gè)indexReader對(duì)象，需要指定Directory對(duì)象。IndexReader indexReader = DirectoryReader.open(directory);// 第三步：創(chuàng)建一個(gè)indexsearcher對(duì)象，需要指定IndexReader對(duì)象IndexSearcher indexSearcher = new IndexSearcher(indexReader);// 第四步：創(chuàng)建一個(gè)TermQuery對(duì)象，指定查詢(xún)的域和查詢(xún)的關(guān)鍵詞。Query query = new TermQuery(new Term("fileName", "lucene"));// 第五步：執(zhí)行查詢(xún)。TopDocs topDocs = indexSearcher.search(query, 10);// 第六步：返回查詢(xún)結(jié)果。遍歷查詢(xún)結(jié)果并輸出。ScoreDoc[] scoreDocs = topDocs.scoreDocs;for (ScoreDoc scoreDoc : scoreDocs) {int doc = scoreDoc.doc;Document document = indexSearcher.doc(doc);// 文件名稱(chēng)String fileName = document.get("fileName");System.out.println(fileName);// 文件內(nèi)容String fileContent = document.get("fileContent");System.out.println(fileContent);// 文件大小String fileSize = document.get("fileSize");System.out.println(fileSize);// 文件路徑String filePath = document.get("filePath");System.out.println(filePath);System.out.println("------------");}// 第七步：關(guān)閉IndexReader對(duì)象 indexReader.close();}

?4.查看標(biāo)準(zhǔn)分析器的分詞效果

@Testpublic void testTokenStream() throws Exception {// 創(chuàng)建一個(gè)標(biāo)準(zhǔn)分析器對(duì)象//Analyzer analyzer = new StandardAnalyzer();//Analyzer analyzer = new CJKAnalyzer();//Analyzer analyzer = new SmartChineseAnalyzer();Analyzer analyzer = new IKAnalyzer();// 獲得tokenStream對(duì)象// 第一個(gè)參數(shù)：域名，可以隨便給一個(gè)// 第二個(gè)參數(shù)：要分析的文本內(nèi)容//TokenStream tokenStream = analyzer.tokenStream("test",//"The Spring Framework provides a comprehensive programming and configuration model.");TokenStream tokenStream = analyzer.tokenStream("test","高富帥可以用二維表結(jié)構(gòu)來(lái)邏輯表達(dá)實(shí)現(xiàn)的數(shù)據(jù)");// 添加一個(gè)引用，可以獲得每個(gè)關(guān)鍵詞CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);// 添加一個(gè)偏移量的引用，記錄了關(guān)鍵詞的開(kāi)始位置以及結(jié)束位置OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);// 將指針調(diào)整到列表的頭部 tokenStream.reset();// 遍歷關(guān)鍵詞列表，通過(guò)incrementToken方法判斷列表是否結(jié)束while (tokenStream.incrementToken()) {// 關(guān)鍵詞的起始位置System.out.println("start->" + offsetAttribute.startOffset());// 取關(guān)鍵詞 System.out.println(charTermAttribute);// 結(jié)束位置System.out.println("end->" + offsetAttribute.endOffset());}tokenStream.close();}

5.IKAnalyzer分詞

5.1IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 擴(kuò)展配置</comment><entry key="ext_dict">ext.dic;</entry> <entry key="ext_stopwords">stopword.dic;</entry> </properties>

?5.2擴(kuò)展ext.dic

高富帥二維表

5.3停止stopword.dic

我是用的二維表來(lái) a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with

?6.封裝LuceneHelper

package com.mf.lucene;import java.io.File;import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.TextField; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.queryparser.classic.MultiFieldQueryParser; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import org.junit.Test; import org.wltea.analyzer.lucene.IKAnalyzer;public class LuceneHelper {public IndexWriter getIndexWriter() throws Exception {Directory directory = FSDirectory.open(new File("D:\\JavaWeb\\Lucene"));// Directory directory = new RAMDirectory();//保存索引到內(nèi)存中（內(nèi)存索引庫(kù)）Analyzer analyzer = new IKAnalyzer();IndexWriterConfig config = new IndexWriterConfig(Version.LATEST,analyzer);return new IndexWriter(directory, config);}//全刪除 @Testpublic void testAllDelete() throws Exception {IndexWriter indexWriter = getIndexWriter();indexWriter.deleteAll();indexWriter.close();}//根據(jù)條件刪除 @Testpublic void testDelete() throws Exception {IndexWriter indexWriter = getIndexWriter();Query query = new TermQuery(new Term("title","c#"));indexWriter.deleteDocuments(query);indexWriter.close();}//修改 @Testpublic void testUpdate() throws Exception {IndexWriter indexWriter = getIndexWriter();Document doc = new Document();doc.add(new TextField("fileN", "測(cè)試文件名",Store.YES));doc.add(new TextField("fileC", "測(cè)試文件內(nèi)容",Store.YES));indexWriter.updateDocument(new Term("isbn","9787115155108"), doc, new IKAnalyzer());indexWriter.close();}//IndexReader IndexSearcherpublic IndexSearcher getIndexSearcher() throws Exception{// 第一步：創(chuàng)建一個(gè)Directory對(duì)象，也就是索引庫(kù)存放的位置。Directory directory = FSDirectory.open(new File("D:\\JavaWeb\\Lucene"));// 磁盤(pán)// 第二步：創(chuàng)建一個(gè)indexReader對(duì)象，需要指定Directory對(duì)象。IndexReader indexReader = DirectoryReader.open(directory);// 第三步：創(chuàng)建一個(gè)indexsearcher對(duì)象，需要指定IndexReader對(duì)象return new IndexSearcher(indexReader);}//執(zhí)行查詢(xún)的結(jié)果public void printResult(IndexSearcher indexSearcher,Query query)throws Exception{// 第五步：執(zhí)行查詢(xún)。TopDocs topDocs = indexSearcher.search(query, 10);// 第六步：返回查詢(xún)結(jié)果。遍歷查詢(xún)結(jié)果并輸出。ScoreDoc[] scoreDocs = topDocs.scoreDocs;for (ScoreDoc scoreDoc : scoreDocs) {int doc = scoreDoc.doc;Document document = indexSearcher.doc(doc);// 文件名稱(chēng)String title = document.get("title");System.out.println(title);//WordsCountString WordsCount = document.get("WordsCount");System.out.println(WordsCount);System.out.println("------------");}}//查詢(xún)所有 @Testpublic void testMatchAllDocsQuery() throws Exception {IndexSearcher indexSearcher = getIndexSearcher();Query query = new MatchAllDocsQuery();System.out.println(query);printResult(indexSearcher, query);//關(guān)閉資源 indexSearcher.getIndexReader().close();}//根據(jù)數(shù)值范圍查詢(xún) @Testpublic void testNumericRangeQuery() throws Exception {IndexSearcher indexSearcher = getIndexSearcher();Query query = NumericRangeQuery.newLongRange("WordsCount", 0L, 10000L, true, true);System.out.println(query);printResult(indexSearcher, query);//關(guān)閉資源 indexSearcher.getIndexReader().close();}//可以組合查詢(xún)條件 @Testpublic void testBooleanQuery() throws Exception {IndexSearcher indexSearcher = getIndexSearcher();BooleanQuery booleanQuery = new BooleanQuery();Query query1 = new TermQuery(new Term("title","c#"));Query query2 = new TermQuery(new Term("WordsCount","660000"));// select * from user where id =1 or name = 'safdsa' booleanQuery.add(query1, Occur.MUST);booleanQuery.add(query2, Occur.MUST);System.out.println(booleanQuery);printResult(indexSearcher, booleanQuery);//關(guān)閉資源 indexSearcher.getIndexReader().close();}//條件解釋的對(duì)象查詢(xún) @Testpublic void testQueryParser() throws Exception {IndexSearcher indexSearcher = getIndexSearcher();//參數(shù)1：默認(rèn)查詢(xún)的域 //參數(shù)2：采用的分析器QueryParser queryParser = new QueryParser("title",new IKAnalyzer());// *:* 域：值Query query = queryParser.parse("title:c#");printResult(indexSearcher, query);//關(guān)閉資源 indexSearcher.getIndexReader().close();}//條件解析的對(duì)象查詢(xún) 多個(gè)默念域 @Testpublic void testMultiFieldQueryParser() throws Exception {IndexSearcher indexSearcher = getIndexSearcher();String[] fields = {"title","isbn"};//參數(shù)1：默認(rèn)查詢(xún)的域 //參數(shù)2：采用的分析器MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields,new IKAnalyzer());// *:* 域：值Query query = queryParser.parse("c#");printResult(indexSearcher, query);//關(guān)閉資源 indexSearcher.getIndexReader().close();}}

轉(zhuǎn)載于:https://www.cnblogs.com/cnki/p/6746527.html

總結(jié)

以上是生活随笔為你收集整理的Lucene之Java实战的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： ASP.NET MVC4 微信公众号开发
下一篇： java美元兑换,（Java实现）美元