當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

luncene 查询字符串的解析—QueryParser类

發(fā)布時間：2025/3/20 编程问答 42 豆豆

生活随笔收集整理的這篇文章主要介紹了 luncene 查询字符串的解析—QueryParser类小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

http://blog.csdn.net/hongfu_/article/details/1933366

搜索流程中的第二步就是構(gòu)建一個Query。下面就來介紹Query及其構(gòu)建。

當(dāng)用戶輸入一個關(guān)鍵字，搜索引擎接收到后，并不是立刻就將它放入后臺開始進(jìn)行關(guān)鍵字的檢索，而應(yīng)當(dāng)首先對這個關(guān)鍵字進(jìn)行一定的分析和處理，使之成為一種后臺可以理解的形式，只有這樣，才能提高檢索的效率，同時檢索出更加有效的結(jié)果。那么，在Lucene中，這種處理，其實就是構(gòu)建一個Query對象。

就Query對象本身言，它只是Lucene的search包中的一個抽象類，這個抽象類有許多子類，代表了不同類型的檢索。如常見的TermQuery就是將一個簡單的關(guān)鍵字進(jìn)行封裝后的對象，類似的還有BooleanQuery，即布爾型的查找。

IndexSearcher對象的search方法中總是需要一個Query對象（或是Query子類的對象），本節(jié)就來介紹各種Query類。

11.4.1??按詞條搜索—TermQuery

TermQuery是最簡單、也是最常用的Query。TermQuery可以理解成為“詞條搜索”，在搜索引擎中最基本的搜索就是在索引中搜索某一詞條，而TermQuery就是用來完成這項工作的。

在Lucene中詞條是最基本的搜索單位，從本質(zhì)上來講一個詞條其實就是一個名/值對。只不過這個“名”是字段名，而“值”則表示字段中所包含的某個關(guān)鍵字。

要使用TermQuery進(jìn)行搜索首先需要構(gòu)造一個Term對象，示例代碼如下：

Term aTerm = new Term("contents", "java")；

然后使用aTerm對象為參數(shù)來構(gòu)造一個TermQuery對象，代碼設(shè)置如下：

Query query = new TermQuery(aTerm)；

這樣所有在“contents”字段中包含有“java”的文檔都會在使用TermQuery進(jìn)行查詢時作為符合查詢條件的結(jié)果返回。

下面就通過代碼11.4來介紹TermQuery的具體實現(xiàn)過程。

代碼11.4? TermQueryTest.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.TermQuery;

public class TermQueryTest

{

? public static void main(String[] args) throws Exception

? {

? ?? //生成Document對象

??? Document doc1 = new Document();

? ?? //添加“name”字段的內(nèi)容

??? doc1.add(Field.Text("name", "word1 word2 word3"));

? ?? //添加“title”字段的內(nèi)容

??? doc1.add(Field.Keyword("title", "doc1"));

? ?? //生成索引書寫器

??? IndexWriter writer = new IndexWriter("c://index", new StandardAnalyzer(), true);

? ??

??? //將文檔添加到索引中

??? writer.addDocument(doc1);

? ?? //關(guān)閉索引

??? writer.close();

? ?? //生成查詢對象query

??? Query query = null;

? ??

??? //生成hits結(jié)果對象，保存返回的檢索結(jié)果

??? Hits hits = null;

???

? ?? //生成檢索器

??? IndexSearcher searcher = new IndexSearcher("c://index");

???

? ?? //?構(gòu)造一個TermQuery對象

??? query = new TermQuery(new Term("name","word1"));

? ?? //開始檢索，并返回檢索結(jié)果到hits中

??? hits = searcher.search(query);

? ?? //輸出檢索結(jié)果中的相關(guān)信息

??? printResult(hits, "word1");

? ?? //?再次構(gòu)造一個TermQuery對象，只不過查詢的字段變成了"title"

??? query = new TermQuery(new Term("title","doc1"));

? ?? //開始第二次檢索，并返回檢索結(jié)果到hits中

??? hits = searcher.search(query);

? ?? //輸出檢索結(jié)果中的相關(guān)信息

??? printResult(hits, "doc1");

? }

? public static void printResult(Hits hits, String key) throws Exception

? {

? ??System.out.println("查找?/"" + key + "/" :");

??? if (hits != null)

??? {

????? if (hits.length() == 0)

????? {

??????? System.out.println("沒有找到任何結(jié)果");

????? }

????? else

????? {

??????? System.out.println("找到" + hits.length() + "個結(jié)果");

??????? for (int i = 0; i < hits.length(); i++)

??????? {

????????? Document d = hits.doc(i);

????????? String dname = d.get("title");

????????? System.out.print(dname + "?? ");

??????? }

??????? System.out.println();

????? }

??? }

? }

}

在代碼11.4中使用TermQuery進(jìn)行檢索的運行結(jié)果如圖11-8所示。

注意：字段值是區(qū)分大小寫的，因此在查詢時必須注意大小寫的匹配。

從圖11-8中可以看出，代碼11.4兩次分別以“word1”和“doc1”為關(guān)鍵字進(jìn)行檢索，并且都只得到了一個檢索結(jié)果。

在代碼11.4中通過構(gòu)建TermQuery的對象，兩次完成了對關(guān)鍵字的查找。兩次查找過程中不同的是，第一次構(gòu)建的TermQuery是查找“name”這個字段，而第二次構(gòu)建的TermQuery則查找的是“title”這個字段。

11.4.2??“與或”搜索—BooleanQuery

BooleanQuery也是實際開發(fā)過程中經(jīng)常使用的一種Query。它其實是一個組合的Query，在使用時可以把各種Query對象添加進(jìn)去并標(biāo)明它們之間的邏輯關(guān)系。在本節(jié)中所討論的所有查詢類型都可以使用BooleanQuery綜合起來。BooleanQuery本身來講是一個布爾子句的容器，它提供了專門的API方法往其中添加子句，并標(biāo)明它們之間的關(guān)系，以下代碼為BooleanQuery提供的用于添加子句的API接口：

public void add(Query query, boolean required, boolean prohibited)；

注意：BooleanQuery是可以嵌套的，一個BooleanQuery可以成為另一個BooleanQuery的條件子句。

下面以11.5為例來介紹進(jìn)行“與”操作的布爾型查詢。

代碼11.5? BooleanQueryTest1.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.BooleanQuery;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.TermQuery;

public class BooleanQueryTest1

{

? public static void main (String [] args) throws Exception {

? ?? //生成新的Document對象

??? Document doc1 = new Document();

??? doc1.add(Field.Text("name", "word1 word2 word3"));

??? doc1.add(Field.Keyword("title", "doc1"));

???

??? Document doc2 = new Document();

??? doc2.add(Field.Text("name", "word1 word4 word5"));

??? doc2.add(Field.Keyword("title", "doc2"));

???

??? Document doc3 = new Document();

??? doc3.add(Field.Text("name", "word1 word2 word6"));

??? doc3.add(Field.Keyword("title", "doc3"));

???

? ?? //生成索引書寫器

??? IndexWriter writer = new IndexWriter("c://index", new StandardAnalyzer(), true);

? ?? //添加到索引中

??? writer.addDocument(doc1);

??? writer.addDocument(doc2);

??? writer.addDocument(doc3);

??? writer.close();

???

??? Query query1 = null;

??? Query query2 = null;

??? BooleanQuery query = null;

??? Hits hits = null;

???

? ?? //生成IndexSearcher對象

??? IndexSearcher searcher = new IndexSearcher("c://index");

???

??? query1 = new TermQuery(new Term("name","word1"));

??? query2 = new TermQuery(new Term("name","word2"));

???

??? //?構(gòu)造一個布爾查詢

??? query = new BooleanQuery();

???

??? //?添加兩個子查詢

??? query.add(query1, true, false);

??? query.add(query2, true, false);

???

??? hits = searcher.search(query);

??? printResult(hits, "word1和word2");

???

? }

? public static void printResult(Hits hits, String key) throws Exception

? {

??? System.out.println("查找?/"" + key + "/" :");

??? if (hits != null)

??? {

????? if (hits.length() == 0)

????? {

??????? System.out.println("沒有找到任何結(jié)果");

????? }

????? else

????? {

??????? System.out.println("找到" + hits.length() + "個結(jié)果");

??????? for (int i = 0; i < hits.length(); i++)

??????? {

????????? Document d = hits.doc(i);

????????? String dname = d.get("title");

????????? System.out.print(dname + "?? ");

??????? }

??????? System.out.println();

????? }

??? }

? }

}

代碼11.5首先構(gòu)造了兩個TermQuery，然后構(gòu)造了一個BooleanQuery的對象，并將兩個TermQuery當(dāng)成它的查詢子句加入Boolean查詢中。

再來看一下BooleanQuery的add方法，除了它的第一個參數(shù)外，它還有另外兩個布爾型的參數(shù)。第1個參數(shù)的意思是當(dāng)前所加入的查詢子句是否必須滿足，第2個參數(shù)的意思是當(dāng)前所加入的查詢子句是否不需要滿足。這樣，當(dāng)這兩個參數(shù)分別選擇true和false時，會有4種不同的組合。

???? true?＆false：表明當(dāng)前加入的子句是必須要滿足的。

???? false＆true：表明當(dāng)前加入的子句是不可以被滿足的。

???? false＆false：表明當(dāng)前加入的子句是可選的。

???? true＆true：錯誤的情況。

由前面的示例可以看出由于加入的兩個子句都選用了true＆false的組合，因此它們兩個都是需要被滿足的，也就構(gòu)成了實際上的“與”關(guān)系，運行效果如圖11-9所示。

如果是要進(jìn)行“或”運算，則可按如下代碼來構(gòu)建查詢子句：

query.add(query1, false, false);

query.add(query2, false, false);

代碼的運行效果如圖11-10所示。

?????????????????

???????圖11-9? BooleanQuery測試1??????????????????????????圖11-10? BooleanQuery測試2

由于布爾型的查詢是可以嵌套的，因此可以表示多種條件下的組合。不過，如果子句的數(shù)目太多，可能會導(dǎo)致查找效率的降低。因此，Lucene給出了一個默認(rèn)的限制，就是布爾型Query的子句數(shù)目不能超過1024。

11.4.3??在某一范圍內(nèi)搜索—RangeQuery

有時用戶會需要一種在一個范圍內(nèi)查找某個文檔，比如查找某一時間段內(nèi)的所有文檔，此時，Lucene提供了一種名為RangeQuery的類來滿足這種需求。

RangeQuery表示在某范圍內(nèi)的搜索條件，實現(xiàn)從一個開始詞條到一個結(jié)束詞條的搜索功能，在查詢時“開始詞條”和“結(jié)束詞條”可以被包含在內(nèi)也可以不被包含在內(nèi)。它的具體用法如下：

RangeQuery query = new RangeQuery(begin, end, included);

在參數(shù)列表中，最后一個boolean值表示是否包含邊界條件本身，即當(dāng)其為TRUE時，表示包含邊界值，用字符可以表示為“[begin TO end]”；當(dāng)其為FALSE時，表示不包含邊界值，用字符可以表示為“{begin TO end}”。

下面通過代碼11.6介紹RangeQuery使用的方法。

代碼11.6? RangeQueryTest.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.RangeQuery;

public class RangeQueryTest {

? ?? public static void main (String [] args) throws Exception {

? ??????

? ?????? //生成文檔對象，下同

? ?????? Document doc1 = new Document();

? ?????? //添加“time”字段中的內(nèi)容，下同

? ?? ????doc1.add(Field.Text("time", "200001"));

? ?????? //添加“title”字段中的內(nèi)容，下同

? ?? ????doc1.add(Field.Keyword("title", "doc1"));

? ?? ????

? ?? ????Document doc2 = new Document();

? ?? ????doc2.add(Field.Text("time", "200002"));

? ?? ????doc2.add(Field.Keyword("title", "doc2"));

? ?? ????

? ?? ????Document doc3 = new Document();

? ?? ????doc3.add(Field.Text("time", "200003"));

? ?? ????doc3.add(Field.Keyword("title", "doc3"));

? ?? ????

? ?? ????Document doc4 = new Document();

? ?? ????doc4.add(Field.Text("time", "200004"));

? ?? ????doc4.add(Field.Keyword("title", "doc4"));

? ?? ????

? ?? ????Document doc5 = new Document();

? ?? ????doc5.add(Field.Text("time", "200005"));

? ?? ????doc5.add(Field.Keyword("title", "doc5"));

? ?? ????

? ?????? //生成索引書寫器

? ?? ????IndexWriter writer = new IndexWriter("c://index", new StandardAnalyzer(), true);

? ?????? //設(shè)置為混合索引格式

? ????????writer.setUseCompoundFile(true);

? ??????

? ?????? //將文檔對象添加到索引中

? ?????? writer.addDocument(doc1);

? ?????? writer.addDocument(doc2);

? ?????? writer.addDocument(doc3);

? ?????? writer.addDocument(doc4);

? ?????? writer.addDocument(doc5);

? ??????

? ?????? //關(guān)閉索引

? ?????? writer.close();

? ??????

? ?????? //生成索引搜索器

? ?????? IndexSearcher searcher = new IndexSearcher("c://index");

? ??????

? ?????? //構(gòu)造詞條

? ?????? Term beginTime = new Term("time","200001");

? ?????? Term endTime = new Term("time","200005");

? ??????

? ?????? //用于保存檢索結(jié)果

? ?????? Hits hits = null;

? ?????? //生成RangeQuery對象，初始化為null

? ?????? RangeQuery query = null;

? ??????

? ?????? //構(gòu)造RangeQuery對象，檢索條件中不包含邊界值

? ?????? query = new RangeQuery(beginTime, endTime, false);

? ?????? //開始檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "從200001～200005的文檔，不包括200001和200005");

? ??????

? ?????? //再構(gòu)造一個RangeQuery對象，檢索條件中包含邊界值

? ?????? query = new RangeQuery(beginTime, endTime, true);

? ?????? //開始第二次檢索

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "從200001～200005的文檔，包括200001和200005");

? ??????

? ?? }

? ??

? ?? public static void printResult(Hits hits, String key) throws Exception

? ?????? {System.out.println("查找?/"" + key + "/" :");

? ?????? if (hits != null) {

? ?????????? if (hits.length() == 0) {

? ?????????????? System.out.println("沒有找到任何結(jié)果");

? ?????????? } else {

? ?????????????? System.out.print("找到");

? ?????????????? for (int i = 0; i < hits.length(); i++) {

? ?????????????????? Document d = hits.doc(i);

? ?????????????????? String dname = d.get("title");

? ?????????????????? System.out.print(dname + "?? " );

? ?????????????? }

? ?????????????? System.out.println();

? ?????????? }

? ?????? }

? ?? }

}

在上述代碼中首先構(gòu)造了兩個Term詞條，然后構(gòu)造了一個RangeQuery對象。在初始化RangeQuery對象的時候，使用構(gòu)造的兩個Term詞條作為RangeQuery構(gòu)造函數(shù)的參數(shù)。前面已經(jīng)說過，RangeQuery的構(gòu)造函數(shù)中的兩個參數(shù)分別稱為“開始詞條”和“結(jié)束詞條”，它的含義也就是查找介于這兩者之間的所有Document。

構(gòu)建的Document的“time”字段值均介于200001～200005之間，其檢索結(jié)果如圖11-11所示。

圖11-11? RangeQuery測試結(jié)果

從圖11-11中可以看出，在代碼11.6中使用RangeQuery共進(jìn)行了兩次檢索，第一次的檢索條件中不包括邊界值，第二次的檢索條件中包括邊界值。

從代碼11.6和圖11-11中可以看出，第1次使用FALSE參數(shù)構(gòu)造的RangeQuery對象不包括2個邊界值，因此只返回3個Document，而第2次使用TRUE參數(shù)構(gòu)造的RangeQuery則包括2個邊界值，因此將5個Document全部返回了。

11.4.4??使用前綴搜索—PrefixQuery

PrefixQuery就是使用前綴來進(jìn)行查找的。通常情況下，首先定義一個詞條Term。該詞條包含要查找的字段名以及關(guān)鍵字的前綴，然后通過該詞條構(gòu)造一個PrefixQuery對象，就可以進(jìn)行前綴查找了。

下面以代碼11.7為例來介紹使用PrefixQuery進(jìn)行檢索的運行過程。

代碼11.7? PrefixQueryTest.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.PrefixQuery;

import org.apache.lucene.search.RangeQuery;

public class PrefixQueryTest {

? ?? public static void main(String[] args) throws Exception {

? ?????? //生成Document對象，下同

? ?????? Document doc1 = new Document();

? ?????? //添加“name”字段的內(nèi)容，下同

? ?????? doc1.add(Field.Text("name", "David"));

? ?????? //添加“title”字段的內(nèi)容，下同

? ?????? doc1.add(Field.Keyword("title", "doc1"));

? ?????? Document doc2 = new Document();

? ?????? doc2.add(Field.Text("name", "Darwen"));

? ?????? doc2.add(Field.Keyword("title", "doc2"));

? ?????? Document doc3 = new Document();

? ?????? doc3.add(Field.Text("name", "Smith"));

? ?????? doc3.add(Field.Keyword("title", "doc3"));

? ?????? Document doc4 = new Document();

? ?????? doc4.add(Field.Text("name", "Smart"));

? ?????? doc4.add(Field.Keyword("title", "doc4"));

? ?????? //生成索引書寫器

? ?????? IndexWriter writer = new IndexWriter("c://index",

? ?????????????? new StandardAnalyzer(), true);

? ?????? //設(shè)置為混合索引模式

? ?????? writer.setUseCompoundFile(true);

? ?????? //依次將文檔添加到索引中

? ?????? writer.addDocument(doc1);

? ?????? writer.addDocument(doc2);

? ?????? writer.addDocument(doc3);

? ?????? writer.addDocument(doc4);

? ?????? //關(guān)閉索引書寫器

? ?????? writer.close();

? ?????? //生成索引搜索器對象

? ?????? IndexSearcher searcher = new IndexSearcher("c://index");

? ?????? //構(gòu)造詞條

? ?????? Term pre1 = new Term("name", "Da");

??? ???? Term pre2 = new Term("name", "da");

? ?????? Term pre3 = new Term("name", "sm");

? ?????? //用于保存檢索結(jié)果

? ?????? Hits hits = null;

? ?????? //生成PrefixQuery類型的對象，初始化為null

? ?????? PrefixQuery query = null;

? ?????? query = new PrefixQuery(pre1);

? ?????? //開始第一次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出相應(yīng)的檢索結(jié)果

? ?????? printResult(hits, "前綴為'Da'的文檔");

? ??????

? ?????? query = new PrefixQuery(pre2);

? ?????? //開始第二次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出相應(yīng)的檢索結(jié)果

? ?????? printResult(hits, "前綴為'da'的文檔");

? ??????

? ?????? query = new PrefixQuery(pre3);

? ?????? //開始第二次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出相應(yīng)的檢索結(jié)果

? ?????? printResult(hits, "前綴為'sm'的文檔");

? ?? }

? ?? public static void printResult(Hits hits, String key) throws Exception

? ?????? {System.out.println("查找?/"" + key + "/" :");

? ?????? if (hits != null) {

? ?????????? if (hits.length() == 0) {

? ?????????????? System.out.println("沒有找到任何結(jié)果");

? ?????????????? System.out.println();

? ?????????? } else {

? ?????????????? System.out.print("找到");

? ?????????????? for (int i = 0; i < hits.length(); i++) {

? ?????????????????? //取得文檔

? ?????????????????? Document d = hits.doc(i);

? ?????????????????? //取得“title”字段的內(nèi)容

? ?????????????????? String dname = d.get("title");

? ?????????????????? System.out.print(dname + "?? ");

? ?????????????? }

? ?????????????? System.out.println();

? ?????????? }

? ?????? }

? ?? }

}

在上述代碼中，首先構(gòu)造了4個不同的Document。每個Document都有一個名為“name”的字段，其中存儲了人物的名稱。然后，代碼構(gòu)建了3個不同的詞條，分別為“Da”、“da”和“sm”，可以看到，它們正好都是“name”字段中關(guān)鍵字的前綴。

代碼的運行結(jié)果如圖11-12所示。

從圖11-12中可以看出，使用PrefixQuery共進(jìn)行了3次檢索，關(guān)鍵字分別為“Da”、“da”和“sm”，返回的檢索結(jié)果情況在圖中已經(jīng)有明確的說明。不過，如果使用“Da”作為關(guān)鍵字會沒有任何的檢索結(jié)果，而使用“da”就有檢索結(jié)果，這個問題將在后面作詳細(xì)介紹。

從代碼11.7和圖11-12中可以看出，“da”前綴和“sm”前綴都順利地找到了它們所在的文檔，可是為什么與文檔中關(guān)鍵字大小寫一致的“Da”卻沒有找到呢？這是因為Lucene的標(biāo)準(zhǔn)分析器在進(jìn)行分詞過濾時將所有的關(guān)鍵字一律轉(zhuǎn)成了小寫，所以才會出現(xiàn)這樣的結(jié)果。這也是開發(fā)者應(yīng)當(dāng)引起注意的地方。

11.4.5??多關(guān)鍵字的搜索—PhraseQuery

除了普通的TermQuery外，Lucene還提供了一種Phrase查詢的功能。用戶在搜索引擎中進(jìn)行搜索時，常常查找的并非是一個簡單的單詞，很有可能是幾個不同的關(guān)鍵字。這些關(guān)鍵字之間要么是緊密相聯(lián)，成為一個精確的短語，要么是可能在這幾個關(guān)鍵字之間還插有其他無關(guān)的關(guān)鍵字。此時，用戶希望將它們找出來。不過很顯然，從評分的角度看，這些關(guān)鍵字之間擁有與查找內(nèi)容無關(guān) 短語所在的文檔的分值一般會較低一些。

PhraseQuery正是Lucene所提供的滿足上述需求的一種Query對象。它的add方法可以讓用戶往其內(nèi)部添加關(guān)鍵字，在添加完畢后，用戶還可以通過setSlop()方法來設(shè)定一個稱之為“坡度”的變量來確定關(guān)鍵字之間是否允許、允許多少個無關(guān)詞匯的存在。

下面以代碼11.8為例對PhraseQuery進(jìn)行介紹。

代碼11.8? PhraseQueryTest.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.PhraseQuery;

import org.apache.lucene.search.PrefixQuery;

public class PhraseQueryTest {

? ?? public static void main(String[] args) throws Exception {

? ?????? //生成Document對象

? ?????? Document doc1 = new Document();

? ?????? //添加“content”字段的內(nèi)容

? ?????? doc1.add(Field.Text("content", "david mary smith robert"));

? ?????? //添加“title”字段的內(nèi)容

? ?????? doc1.add(Field.Keyword("title", "doc1"));

? ?????? //生成索引書寫器

? ?????? IndexWriter writer = new IndexWriter("c://index",

? ?????????????? new StandardAnalyzer(), true);

? ?????? //設(shè)置為混合索引格式

? ?????? writer.setUseCompoundFile(true);

? ?????? //將文檔添加到索引中

? ?????? writer.addDocument(doc1);

? ?????? //關(guān)閉索引

? ?????? writer.close();

? ?????? //生成索引搜索器

? ?????? IndexSearcher searcher = new IndexSearcher("c://index");

? ?????? //構(gòu)造詞條

? ?????? Term word1 = new Term("content", "david");

? ?????? Term word2 = new Term("content","mary");

? ?????? Term word3 = new Term("content","smith");

? ?????? Term word4 = new Term("content","robert");

? ??????

? ?????? //用于保存檢索結(jié)果

? ?????? Hits hits = null;

? ?????? //生成PhraseQuery對象，初始化為null

? ?????? PhraseQuery query = null;

? ?????? //?第一種情況，兩個詞本身緊密相連，先設(shè)置坡度為0，再設(shè)置坡度為2

? ?????? query = new PhraseQuery();

? ?????? query.add(word1);

? ?????? query.add(word2);

? ?????? //設(shè)置坡度

? ?????? query.setSlop(0);

? ?????? //開始檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "'david'與'mary'緊緊相隔的Document");

? ??????

? ?????? //再次設(shè)置坡度

? ?????? query.setSlop(2);

? ?????? //開始第二次檢索

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果

? ?????? printResult(hits, "'david'與'mary'中相隔兩個詞的短語");

? ??????

? ?????? //?第二種情況，兩個詞本身相隔兩個詞，先設(shè)置坡度為0，再設(shè)置坡度為2

? ?????? query = new PhraseQuery();

? ?????? query.add(word1);

? ?????? query.add(word4);

? ?????? //設(shè)置坡度

? ?????? query.setSlop(0);

? ?????? //開始第三次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果

? ?????? printResult(hits, "'david'與'robert'緊緊相隔的Document");

? ??????

? ?????? //設(shè)置坡度

? ?????? query.setSlop(2);

? ?????? //開始第四次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果

? ?????? printResult(hits, "'david'與'robert'中相隔兩個詞的短語");

? ??????

? ?? }

? ?? public static void printResult(Hits hits, String key) throws Exception

? ?????? {System.out.println("查找?/"" + key + "/" :");

? ?????? if (hits != null) {

? ?????????? if (hits.length() == 0) {

? ?????????????? System.out.println("沒有找到任何結(jié)果");

? ?????????????? System.out.println();

? ?????????? } else {

? ?????????????? System.out.print("找到");

? ?????????????? for (int i = 0; i < hits.length(); i++) {

? ?????????????????? //取得文檔對象

? ?????????????????? Document d = hits.doc(i);

? ?????????????????? //取得“title”字段的內(nèi)容

? ?????????????????? String dname = d.get("title");

? ?????????????????? //輸出相關(guān)的信息

? ?????????????????? System.out.print(dname + "?? ");

? ?????????????? }

? ?????????????? System.out.println();

? ?????????? }

? ?????? }

? ?? }

}

在上述代碼中創(chuàng)建了一個Document，這個Document的“content”域中含有4個關(guān)鍵字。接下來，代碼創(chuàng)建了一個PhraseQuery對象，首先將前兩個緊緊相連關(guān)鍵字放入其中，并設(shè)置它們的坡度值分別為0和2，接下來，又將第一個和最后一個關(guān)鍵字放入其中，同樣設(shè)置它們的坡度值為0和2。

代碼11.8的運行效果，如圖11-13所示。

從圖11.8中可以看出，代碼11.8共進(jìn)行了4次檢索測試，并且分兩組分別對檢索結(jié)果進(jìn)行對比。

從代碼11.8和圖11-13中可以看出，對兩個緊連的關(guān)鍵字來說無論將坡度設(shè)置為多少，Lucene總能找到它所在的文檔，而對兩個不緊連的關(guān)鍵字，如果坡度值小于它們之間無關(guān)詞的數(shù)量，那么則無法找到。其實，當(dāng)兩個關(guān)鍵字之間的無關(guān)詞數(shù)小于等于坡度值時，總是可以被找到。

11.4.6??使用短語綴搜索—PhrasePrefixQuery

PhrasePrefixQuery與Phrase有些類似。在PhraseQuery中，如果用戶想查找短語“david robert”，又想查找短語“mary robert”。那么，他就只能構(gòu)建兩個PhraseQuery，然后再使用BooleanQuery將它們作為其中的子句，并使用“或”操作符來連接，這樣就能達(dá)到需要的效果。PhrasePrefixQuery可以讓用戶很方便地實現(xiàn)這種需要。

接下來看看在代碼11.9中是如何使用PhrasePrefixQuery來實現(xiàn)的。

代碼11.9? PhrasePrefixQueryTest.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.PhrasePrefixQuery;

import org.apache.lucene.search.PhraseQuery;

import org.apache.lucene.search.RangeQuery;

public class PhrasePrefixQueryTest {

? ?? public static void main(String[] args) throws Exception {

? ?????? //生成Document對象

? ?????? Document doc1 = new Document();

? ?????? //添加“content”字段的內(nèi)容

? ?????? doc1.add(Field.Text("content", "david mary smith robert"));

? ?????? //添加“title”字段的內(nèi)容

? ?????? doc1.add(Field.Keyword("title", "doc1"));

? ?????? //生成索引書寫器對象

? ?????? IndexWriter writer = new IndexWriter("c://index",

? ?????????????? new StandardAnalyzer(), true);

? ?????? //將文檔添加到索引中

? ?????? writer.addDocument(doc1);

? ?????? //關(guān)閉索引書寫器

? ?????? writer.close();

? ?????? //生成索引檢索器

? ?????? IndexSearcher searcher = new IndexSearcher("c://index");

? ?????? //構(gòu)造詞條

? ?????? Term word1 = new Term("content", "david");

? ?????? Term word2 = new Term("content", "mary");

? ?????? Term word3 = new Term("content", "smith");

? ?????? Term word4 = new Term("content", "robert");

? ?????? //用于保存檢索結(jié)果

? ?????? Hits hits = null;

? ?????? //生成PhrasePrefixQuery對象，初始化為null

? ?????? PhrasePrefixQuery query = null;

? ??????

? ?????? query = new PhrasePrefixQuery();

? ?????? //?加入可能的所有不確定的詞

? ?????? query.add(new Term[]{word1, word2});

? ?????? //?加入確定的詞

? ?????? query.add(word4);

? ?????? //設(shè)置坡度

? ?????? query.setSlop(2);

? ?????? //開始檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "存在短語'david robert'或'mary robert'的文檔");

? ?? }

? ?? public static void printResult(Hits hits, String key) throws Exception

? ?????? {System.out.println("查找?/"" + key + "/" :");

? ?????? if (hits != null) {

? ?????????? if (hits.length() == 0) {

? ?????????????? System.out.println("沒有找到任何結(jié)果");

? ?????????????? System.out.println();

? ?????????? } else {

? ?????????????? System.out.print("找到");

? ?????????????? for (int i = 0; i < hits.length(); i++) {

? ?????????????? //獲取文檔對象

? ?????????????????? Document d = hits.doc(i);

? ?????????????????? //取得“title”字段內(nèi)容

? ?????????????????? String dname = d.get("title");

? ?????????????????? System.out.print(dname + "?? ");

? ?????????????? }

? ?????????????? System.out.println();

? ?????????? }

? ?????? }

? ?? }

}

在上述代碼中，首先構(gòu)建了一個Document，它的“content”字段中包含4個關(guān)鍵字。接下來，構(gòu)建了一個PhrasePrefixQuery的對象，調(diào)用它的add(Term [])方法設(shè)定出現(xiàn)在短語中的第一個關(guān)鍵詞。由于這個方法的參數(shù)類型為一個Term型的數(shù)組，所以，它可以設(shè)置多個Term，即出現(xiàn)在短語中的第一個詞就在這個數(shù)組中進(jìn)行選擇。然后，再使用add(Term)方法設(shè)置出現(xiàn)在短語中的后一個詞。代碼的運行結(jié)果如圖11-14所示。

圖11-14? PhrasePrefixQuery的測試結(jié)果

從圖11-14中可以看出，使用PhrasePrefixQuery可以非常容易的實現(xiàn)相關(guān)短語的檢索功能。

11.4.7??相近詞語的搜索—FuzzyQuery

FuzzyQuery是一種模糊查詢，它可以簡單地識別兩個相近的詞語。下面以11.10為例進(jìn)行詳細(xì)介紹。

代碼11.10? FuzzyQueryTest.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.FuzzyQuery;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

public class FuzzyQueryTest {

? ?? public static void main(String[] args) throws Exception {

? ?????? //生成Document對象

? ?????? Document doc1 = new Document();

? ?????? //添加“content”字段的內(nèi)容

? ?????? doc1.add(Field.Text("content", "david"));

? ?????? //添加“title”字段的內(nèi)容

? ?????? doc1.add(Field.Keyword("title", "doc1"));

? ?????? Document doc2 = new Document();

? ?????? doc2.add(Field.Text("content", "sdavid"));

? ?????? doc2.add(Field.Keyword("title", "doc2"));

? ?????? Document doc3 = new Document();

? ?????? doc3.add(Field.Text("content", "davie"));

? ?????? doc3.add(Field.Keyword("title", "doc3"));

? ?????? //生成索引書寫器

? ?????? IndexWriter writer = new IndexWriter("c://index",

? ?????????????? new StandardAnalyzer(), true);

? ?????? //將文檔添加到索引中

? ?????? writer.addDocument(doc1);

? ?????? writer.addDocument(doc2);

? ?????? writer.addDocument(doc3);

? ?????? //關(guān)閉索引寫器

? ?????? writer.close();

? ?????? //生成索引搜索器

? ?????? IndexSearcher searcher = new IndexSearcher("c://index");

? ?????? Term word1 = new Term("content", "david");

? ?????? //用于保存檢索結(jié)果

? ?????? Hits hits = null;

? ?????? //生成FuzzyQuery對象，初始化為null

? ?????? FuzzyQuery query = null;

? ?????? query = new FuzzyQuery(word1);

? ?????? //開始檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits,"與'david'相似的詞");

? ?? }

? ?? public static void printResult(Hits hits, String key) throws Exception

? ?????? {System.out.println("查找?/"" + key + "/" :");

? ?????? if (hits != null) {

? ?????????? if (hits.length() == 0) {

? ?????????????? System.out.println("沒有找到任何結(jié)果");

? ?????????????? System.out.println();

? ?????????? } else {

? ?????????????? System.out.print("找到");

? ?????????????? for (int i = 0; i < hits.length(); i++) {

? ?????????????????? //取得文檔對象

? ?????????????????? Document d = hits.doc(i);

? ?????????????????? //取得“title”字段的內(nèi)容

? ?????????????????? String dname = d.get("title");

? ?????????????????? System.out.print(dname + "?? ");

? ?????????????? }

? ?????????????? System.out.println();

? ?????????? }

? ?????? }

? ?? }

}

在上述代碼中，首先構(gòu)建了3個Document，這3個Document的“content”字段中都有一個與“david”較為相似的關(guān)鍵字（其中第一個就是david）。然后使用FuzzyQuery來對其進(jìn)行檢索。運行效果如圖11-15所示。

從圖11-15中可以看出，使用FuzzyQuery可以檢索到索引中所有包含與“david”相近詞語的文檔。

11.4.8??使用通配符搜索—WildcardQuery

Lucene也提供了通配符的查詢，這就是WildcardQuery。下面以代碼11.11為例進(jìn)行介紹。

代碼11.11? WildcardQueryTest.java

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.WildcardQuery;

public class WildcardQueryTest {

? ?? public static void main(String[] args) throws Exception {

? ?????? //生成Document對象，下同

? ?????? Document doc1 = new Document();

? ?????? //添加“content”字段的內(nèi)容，下同

? ?????? doc1.add(Field.Text("content", "whatever"));

? ?????? //添加“title”字段的內(nèi)容，下同

? ?????? doc1.add(Field.Keyword("title", "doc1"));

? ??????

? ?????? Document doc2 = new Document();

? ?????? doc2.add(Field.Text("content", "whoever"));

? ?????? doc2.add(Field.Keyword("title", "doc2"));

? ??????

? ?????? Document doc3 = new Document();

? ?????? doc3.add(Field.Text("content", "however"));

? ?????? doc3.add(Field.Keyword("title", "doc3"));

? ??????

? ?????? Document doc4 = new Document();

? ?????? doc4.add(Field.Text("content", "everest"));

? ?????? doc4.add(Field.Keyword("title", "doc4"));

? ??????

? ?????? //生成索引書寫器

? ?????? IndexWriter writer = new IndexWriter("c://index",

? ?????????????? new StandardAnalyzer(), true);

? ?????? //將文檔對象添加到索引中

? ?????? writer.addDocument(doc1);

? ?????? writer.addDocument(doc2);

? ?????? writer.addDocument(doc3);

? ?????? writer.addDocument(doc4);

? ?????? //關(guān)閉索引書寫器

? ?????? writer.close();

? ?????? //生成索引書寫器

? ?????? IndexSearcher searcher = new IndexSearcher("c://index");

? ?????? //構(gòu)造詞條

? ?????? Term word1 = new Term("content", "*ever");

? ?????? Term word2 = new Term("content", "wh?ever");

? ?????? Term word3 = new Term("content", "h??ever");

? ?????? Term word4 = new Term("content", "ever*");

? ?????? //生成WildcardQuery對象，初始化為null

? ?????? WildcardQuery query = null;

? ?????? //用于保存檢索結(jié)果

? ?????? Hits hits = null;

? ??????

? ?????? query = new WildcardQuery(word1);

? ?????? //開始第一次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "*ever");

? ??????

? ?????? query = new WildcardQuery(word2);

? ?????? //開始第二次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "wh?ever");

? ??????

? ?????? query = new WildcardQuery(word3);

? ?????? //開始第三次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "h??ever");

? ??????

? ?????? query = new WildcardQuery(word4);

? ?????? //開始第四次檢索，并返回檢索結(jié)果

? ?????? hits = searcher.search(query);

? ?????? //輸出檢索結(jié)果的相關(guān)信息

? ?????? printResult(hits, "ever*");

? ?? }

???

? ?? public static void printResult(Hits hits, String key) throws Exception

? ?????? {System.out.println("查找?/"" + key + "/" :");

? ?????? if (hits != null) {

? ?????????? if (hits.length() == 0) {

? ?????????????? System.out.println("沒有找到任何結(jié)果");

? ?????????????? System.out.println();

? ?????????? } else {

? ?????????????? System.out.print("找到");

? ?????????????? for (int i = 0; i < hits.length(); i++) {

? ?????????????????? //取得文檔對象

? ?????????????????? Document d = hits.doc(i);

? ?????????????????? //取得“title”字段的內(nèi)容

? ?????????????????? String dname = d.get("title");

? ?????????????????? System.out.print(dname + "?? ");

? ?????????????? }

? ?????????????? System.out.println();

? ?????????? ??? System.out.println();

? ?????????? }

? ?????? }

? ?? }

}

代碼 11.11 的運行結(jié)果如圖 11-16 所示。

由上述代碼可以看出，通配符“?”代表1個字符，而“*”則代表0至多個字符。不過通配符檢索和上面的FuzzyQuery由于需要對字段關(guān)鍵字進(jìn)行字符串匹配，所以，在搜索的性能上面會受到一些影響。

對于搜索引擎（比如Google和百度）來講，很多情況下只需要用戶在輸入框內(nèi)輸入所需查詢的內(nèi)容，然后再單擊“搜索”就可以了，其余的事情全部交給搜索引擎去處理，最后搜索引擎會把檢索到的結(jié)果顯示出來。那么搜索引擎是怎樣處理用戶輸入得符號串的呢？

在Lucene中，這項工作就交給了QueryParser類來完成，它的作用就是把各種用戶輸入的符號串轉(zhuǎn)為一個內(nèi)部的Query或者一個Query組。雖然Lucene提供的API允許使用者創(chuàng)建各種各樣的Query（查詢語句），但它同時也允許通過QueryParser（查詢分析器）生成各種各樣的Query子對象。這使得Lucene的查詢功能更加靈活和強大。

11.5.1? QueryParser的簡單用法

QueryParser實際上就是一個解析用戶輸入的工具，可以通過掃描用戶輸入的字符串，生成Query對象，以下是一個代碼示例：

Query query = null;

query = QueryParser.parse(keywords,fieldName,new StandardAnalyzer());

由上面代碼可以看出，當(dāng)使用QueryParser構(gòu)建用戶Query時，不僅需要用戶輸入關(guān)鍵字文本，還需要告訴QueryParser默認(rèn)將在哪個字段內(nèi)查找該關(guān)鍵字信息。當(dāng)然，這并不是說用戶只能在這個字段內(nèi)查找關(guān)鍵字信息（例如可以在關(guān)鍵字中使用“content:david”這樣的方式指定搜索字段“content”中的關(guān)鍵字），但是如果用戶在輸入的關(guān)鍵字中沒有給出檢索字段信息時，QueryParser就將在默認(rèn)字段內(nèi)進(jìn)行檢索。

用戶輸入關(guān)鍵字的格式以及QueryParser所理解的含義如表11-2所示。

表11-2????????????????????????????????輸入關(guān)鍵字格式和QueryParser理解的含義

格????式	含????義
“David”	在默認(rèn)的字段中檢索“David”關(guān)鍵字
“content：David”	在“content”字段中檢索“David”關(guān)鍵字
“David Mary”或“David OR Mary”	在默認(rèn)字段中檢索David和Mary關(guān)鍵字，它們是“或”關(guān)系
“+David +Mary”或“David AND Mary”	在默認(rèn)字段中檢索David和Mary關(guān)鍵字，它們是“與”關(guān)系
“content：David –title：Manager”或“content：David AND NOT title：Manager”	在content字段中包括關(guān)鍵字David但在title字段中不包含關(guān)鍵字Manager
“(David OR Mary) AND Robert”	在默認(rèn)字段中包含David或Mary關(guān)鍵字，但一定要包含Robert關(guān)鍵字
Davi*	在默認(rèn)字段中檢索前綴為Davi
“content："David is a manager"”	在“content”字段中包含短語“David is a manager”

另外很重要的一點，就是在使用QueryParser對用戶的輸入進(jìn)行掃描時，還需要給它一個分析器。有關(guān)分析器的概念將在后面的章節(jié)中介紹。不過，當(dāng)對用戶輸入的關(guān)鍵字進(jìn)行分析時的分析器應(yīng)當(dāng)與建立索引時的分析器一樣，這樣才能保證分析成功。

11.5.2? QueryParser的“與”和“或”

通過表11-1可以了解，當(dāng)用戶輸入兩個關(guān)鍵字時，QueryParser默認(rèn)它們之間的關(guān)系為“或”關(guān)系。如果用戶需要改變這種邏輯關(guān)系，則可采用下面的方法：

QueryParser parser = new QueryParser(fieldName, new StandardAnalyzer());

parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);

這樣構(gòu)建的QueryParser實例在對用戶輸入進(jìn)行掃描時，就會用空格分開的關(guān)鍵字理解為“與”，其實也就是構(gòu)建了一個“與”關(guān)系的布爾型查詢。

總結(jié)

以上是生活随笔為你收集整理的luncene 查询字符串的解析—QueryParser类的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Git的思想和基本工作原理
下一篇： gradle入门，安卓gradle入门

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

luncene 查询字符串的解析—QueryParser类

11.4.1??按詞條搜索—TermQuery

11.4.2??“與或”搜索—BooleanQuery

11.4.3??在某一范圍內(nèi)搜索—RangeQuery

11.4.4??使用前綴搜索—PrefixQuery

11.4.5??多關(guān)鍵字的搜索—PhraseQuery

11.4.6??使用短語綴搜索—PhrasePrefixQuery

11.4.7??相近詞語的搜索—FuzzyQuery

11.4.8??使用通配符搜索—WildcardQuery

11.5.1? QueryParser的簡單用法

11.5.2? QueryParser的“與”和“或”

總結(jié)