日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

lucene源码分析(2)读取过程实例

發布時間:2025/4/5 编程问答 17 豆豆
生活随笔 收集整理的這篇文章主要介紹了 lucene源码分析(2)读取过程实例 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.官方提供的代碼demo

Analyzer analyzer = new StandardAnalyzer();// Store the index in memory:Directory directory = new RAMDirectory();// To store an index on disk, use this instead://Directory directory = FSDirectory.open("/tmp/testindex");IndexWriterConfig config = new IndexWriterConfig(analyzer);IndexWriter iwriter = new IndexWriter(directory, config);Document doc = new Document();String text = "This is the text to be indexed.";doc.add(new Field("fieldname", text, TextField.TYPE_STORED));iwriter.addDocument(doc);iwriter.close();

2.涉及到的類及其關系

2.1?TokenStream

/*** A <code>TokenStream</code> enumerates the sequence of tokens, either from* {@link Field}s of a {@link Document} or from query text.* <p>* This is an abstract class; concrete subclasses are: * <ul>* <li>{@link Tokenizer}, a <code>TokenStream</code> whose input is a Reader; and* <li>{@link TokenFilter}, a <code>TokenStream</code> whose input is another* <code>TokenStream</code>.* </ul>* A new <code>TokenStream</code> API has been introduced with Lucene 2.9. This API* has moved from being {@link Token}-based to {@link Attribute}-based. While* {@link Token} still exists in 2.9 as a convenience class, the preferred way* to store the information of a {@link Token} is to use {@link AttributeImpl}s.* <p>* <code>TokenStream</code> now extends {@link AttributeSource}, which provides* access to all of the token {@link Attribute}s for the <code>TokenStream</code>.* Note that only one instance per {@link AttributeImpl} is created and reused* for every token. This approach reduces object creation and allows local* caching of references to the {@link AttributeImpl}s. See* {@link #incrementToken()} for further details.* <p>* <b>The workflow of the new <code>TokenStream</code> API is as follows:</b>* <ol>* <li>Instantiation of <code>TokenStream</code>/{@link TokenFilter}s which add/get* attributes to/from the {@link AttributeSource}.* <li>The consumer calls {@link TokenStream#reset()}.* <li>The consumer retrieves attributes from the stream and stores local* references to all attributes it wants to access.* <li>The consumer calls {@link #incrementToken()} until it returns false* consuming the attributes after each call.* <li>The consumer calls {@link #end()} so that any end-of-stream operations* can be performed.* <li>The consumer calls {@link #close()} to release any resource when finished* using the <code>TokenStream</code>.* </ol>* To make sure that filters and consumers know which attributes are available,* the attributes must be added during instantiation. Filters and consumers are* not required to check for availability of attributes in* {@link #incrementToken()}.* <p>* You can find some example code for the new API in the analysis package level* Javadoc.* <p>* Sometimes it is desirable to capture a current state of a <code>TokenStream</code>,* e.g., for buffering purposes (see {@link CachingTokenFilter},* TeeSinkTokenFilter). For this usecase* {@link AttributeSource#captureState} and {@link AttributeSource#restoreState}* can be used.* <p>The {@code TokenStream}-API in Lucene is based on the decorator pattern.* Therefore all non-abstract subclasses must be final or have at least a final* implementation of {@link #incrementToken}! This is checked when Java* assertions are enabled.*/

2.2?Analyzer

/*** An Analyzer builds TokenStreams, which analyze text. It thus represents a* policy for extracting index terms from text.* <p>* In order to define what analysis is done, subclasses must define their* {@link TokenStreamComponents TokenStreamComponents} in {@link #createComponents(String)}.* The components are then reused in each call to {@link #tokenStream(String, Reader)}.* <p>* Simple example:* <pre class="prettyprint">* Analyzer analyzer = new Analyzer() {* {@literal @Override}* protected TokenStreamComponents createComponents(String fieldName) {* Tokenizer source = new FooTokenizer(reader); * TokenStream filter = new FooFilter(source); * filter = new BarFilter(filter); * return new TokenStreamComponents(source, filter); * }* {@literal @Override}* protected TokenStream normalize(TokenStream in) {* // Assuming FooFilter is about normalization and BarFilter is about* // stemming, only FooFilter should be applied* return new FooFilter(in); * }* }; * </pre>* For more examples, see the {@link org.apache.lucene.analysis Analysis package documentation}.* <p>* For some concrete implementations bundled with Lucene, look in the analysis modules:* <ul>* <li><a href="{@docRoot}/../analyzers-common/overview-summary.html">Common</a>:* Analyzers for indexing content in different languages and domains.* <li><a href="{@docRoot}/../analyzers-icu/overview-summary.html">ICU</a>:* Exposes functionality from ICU to Apache Lucene. * <li><a href="{@docRoot}/../analyzers-kuromoji/overview-summary.html">Kuromoji</a>:* Morphological analyzer for Japanese text.* <li><a href="{@docRoot}/../analyzers-morfologik/overview-summary.html">Morfologik</a>:* Dictionary-driven lemmatization for the Polish language.* <li><a href="{@docRoot}/../analyzers-phonetic/overview-summary.html">Phonetic</a>:* Analysis for indexing phonetic signatures (for sounds-alike search).* <li><a href="{@docRoot}/../analyzers-smartcn/overview-summary.html">Smart Chinese</a>:* Analyzer for Simplified Chinese, which indexes words.* <li><a href="{@docRoot}/../analyzers-stempel/overview-summary.html">Stempel</a>:* Algorithmic Stemmer for the Polish Language.* <li><a href="{@docRoot}/../analyzers-uima/overview-summary.html">UIMA</a>: * Analysis integration with Apache UIMA. * </ul>*/

2.3?Directory

/** A Directory is a flat list of files. Files may be written once, when they* are created. Once a file is created it may only be opened for read, or* deleted. Random access is permitted both when reading and writing.** <p> Java's i/o APIs not used directly, but rather all i/o is* through this API. This permits things such as: <ul>* <li> implementation of RAM-based indices; * <li> implementation indices stored in a database, via JDBC; * <li> implementation of an index as a single file; * </ul>** Directory locking is implemented by an instance of {@link* LockFactory}.**/

2.4?IndexWriter

/**An <code>IndexWriter</code> creates and maintains an index.<p>The {@link OpenMode} option on {@link IndexWriterConfig#setOpenMode(OpenMode)} determines whether a new index is created, or whether an existing index isopened. Note that you can open an index with {@link OpenMode#CREATE}even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. If {@link OpenMode#CREATE_OR_APPEND} is used IndexWriter will create a new index if there is not already an index at the provided pathand otherwise open the existing index.</p><p>In either case, documents are added with {@link #addDocument(Iterable)addDocument} and removed with {@link #deleteDocuments(Term...)} or {@link#deleteDocuments(Query...)}. A document can be updated with {@link#updateDocument(Term, Iterable) updateDocument} (which just deletesand then adds the entire document). When finished adding, deleting and updating documents, {@link #close() close} should be called.</p><a name="sequence_numbers"></a><p>Each method that changes the index returns a {@code long} sequence number, whichexpresses the effective order in which each change was applied.{@link #commit} also returns a sequence number, describing whichchanges are in the commit point and which are not. Sequence numbersare transient (not saved into the index in any way) and only validwithin a single {@code IndexWriter} instance.</p><a name="flush"></a><p>These changes are buffered in memory and periodicallyflushed to the {@link Directory} (during the above methodcalls). A flush is triggered when there are enough added documentssince the last flush. Flushing is triggered either by RAM usage of thedocuments (see {@link IndexWriterConfig#setRAMBufferSizeMB}) or thenumber of added documents (see {@link IndexWriterConfig#setMaxBufferedDocs(int)}).The default is to flush when RAM usage hits{@link IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB} MB. Forbest indexing speed you should flush by RAM usage with alarge RAM buffer. Additionally, if IndexWriter reaches the configured number ofbuffered deletes (see {@link IndexWriterConfig#setMaxBufferedDeleteTerms})the deleted terms and queries are flushed and applied to existing segments.In contrast to the other flush options {@link IndexWriterConfig#setRAMBufferSizeMB} and {@link IndexWriterConfig#setMaxBufferedDocs(int)}, deleted termswon't trigger a segment flush. Note that flushing just moves theinternal buffered state in IndexWriter into the index, butthese changes are not visible to IndexReader until either{@link #commit()} or {@link #close} is called. A flush mayalso trigger one or more segment merges which by defaultrun with a background thread so as not to block theaddDocument calls (see <a href="#mergePolicy">below</a>for changing the {@link MergeScheduler}).</p><p>Opening an <code>IndexWriter</code> creates a lock file for the directory in use. Trying to openanother <code>IndexWriter</code> on the same directory will lead to a{@link LockObtainFailedException}.</p><a name="deletionPolicy"></a><p>Expert: <code>IndexWriter</code> allows an optional{@link IndexDeletionPolicy} implementation to be specified. Youcan use this to control when prior commits are deleted fromthe index. The default policy is {@link KeepOnlyLastCommitDeletionPolicy}which removes all prior commits as soon as a new commit isdone. Creating your own policy can allow you to explicitlykeep previous "point in time" commits alive in the index forsome time, either because this is useful for your application,or to give readers enough time to refresh to the new commitwithout having the old commit deleted out from under them.The latter is necessary when multiple computers take turns openingtheir own {@code IndexWriter} and {@code IndexReader}sagainst a single shared index mounted via remote filesystemslike NFS which do not support "delete on last close" semantics.A single computer accessing an index via NFS is fine with thedefault deletion policy since NFS clients emulate "delete onlast close" locally. That said, accessing an index via NFSwill likely result in poor performance compared to a local IOdevice. </p><a name="mergePolicy"></a> <p>Expert:<code>IndexWriter</code> allows you to separately changethe {@link MergePolicy} and the {@link MergeScheduler}.The {@link MergePolicy} is invoked whenever there arechanges to the segments in the index. Its role is toselect which merges to do, if any, and return a {@linkMergePolicy.MergeSpecification} describing the merges.The default is {@link LogByteSizeMergePolicy}. Then, the {@linkMergeScheduler} is invoked with the requested merges andit decides when and how to run the merges. The default is{@link ConcurrentMergeScheduler}. </p><a name="OOME"></a><p><b>NOTE</b>: if you hit aVirtualMachineError, or disaster strikes during a checkpointthen IndexWriter will close itself. This is adefensive measure in case any internal state (buffereddocuments, deletions, reference counts) were corrupted. Any subsequent calls will throw an AlreadyClosedException.</p><a name="thread-safety"></a><p><b>NOTE</b>: {@linkIndexWriter} instances are completely threadsafe, meaning multiple threads can call any of itsmethods, concurrently. If your application requiresexternal synchronization, you should <b>not</b>synchronize on the <code>IndexWriter</code> instance asthis may cause deadlock; use your own (non-Lucene) objects instead. </p><p><b>NOTE</b>: If you call<code>Thread.interrupt()</code> on a thread that's withinIndexWriter, IndexWriter will try to catch this (eg, ifit's in a wait() or Thread.sleep()), and will then throwthe unchecked exception {@link ThreadInterruptedException}and <b>clear</b> the interrupt status on the thread.</p> *//** Clarification: Check Points (and commits)* IndexWriter writes new index files to the directory without writing a new segments_N* file which references these new files. It also means that the state of* the in memory SegmentInfos object is different than the most recent* segments_N file written to the directory.** Each time the SegmentInfos is changed, and matches the (possibly* modified) directory files, we have a new "check point".* If the modified/new SegmentInfos is written to disk - as a new* (generation of) segments_N file - this check point is also an* IndexCommit.** A new checkpoint always replaces the previous checkpoint and* becomes the new "front" of the index. This allows the IndexFileDeleter* to delete files that are referenced only by stale checkpoints.* (files that were created since the last commit, but are no longer* referenced by the "front" of the index). For this, IndexFileDeleter* keeps track of the last non commit checkpoint.*/

?

轉載于:https://www.cnblogs.com/davidwang456/p/9935786.html

總結

以上是生活随笔為你收集整理的lucene源码分析(2)读取过程实例的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 欧美日韩卡一卡二 | 91嫩草精品 | 欧美日本道| 99精品一区二区三区无码吞精 | 婷婷亚洲精品 | 小柔的淫辱日记(1~7) | 国产精品探花一区二区三区 | 日韩精品――色哟哟 | 极品人妻一区二区三区 | 国产男人搡女人免费视频 | 精品人妻伦一二三区久久 | 色骚综合 | 人妻天天爽夜夜爽一区二区三区 | 欧美日韩亚洲激情 | 孕妇爱爱视频 | 欧美情爱视频 | 久久亚洲AV成人无码国产野外 | 国产三区在线观看 | 成年人看的黄色 | 重口味av| 成人理论视频 | 久久久久网站 | 男人看的网站 | 午夜大片网 | 一本一道久久 | 老女人黄色片 | 久久激情网站 | 黄色一级片一级片 | 哪里可以看免费毛片 | 国产在线拍揄自揄拍无码视频 | 黑人操日本女优 | 黑人巨大精品欧美一区二区免费 | 在线不卡 | 在线www| 爱爱中文字幕 | 一本一道波多野结衣av黑人 | 毛片亚洲av无码精品国产午夜 | 狠狠干干干 | 欧美激情啪啪 | 国产精品乱码久久久久 | 色四虎| 中文字幕在线观看视频网站 | 亚洲精品www. | 少妇av片 | 久久久久久久久久久久久久国产 | 久久精品中文闷骚内射 | 国产精品免费无遮挡无码永久视频 | 日本老太婆做爰视频 | 给我看免费高清在线观看 | 一本色道久久综合亚洲精品酒店 | 亚洲图片在线播放 | 欧美老熟妇一区二区三区 | 女同毛片一区二区三区 | 欧美aaaaaaaaa | 亚洲av午夜精品一区二区三区 | 中文一区二区在线观看 | 国产白嫩美女无套久久 | 黄色片免费视频 | 另类图片亚洲色图 | 青青草成人免费在线视频 | 久久免费毛片 | 国产成人久久精品流白浆 | www五月婷婷 | 91精品国产99| 欧美日韩 一区二区三区 | 久久久久久免费毛片精品 | 成人精品亚洲 | 影音先锋 日韩 | 欧美成人vr18sexvr | 欧美日韩激情在线观看 | 日韩精品成人免费观看视频 | 亚洲va国产va天堂va久久 | 一级特黄aa大片免费播放 | 男人和女人日b视频 | 欧美另类人妖 | 国产精品毛片一区二区在线看舒淇 | 中文字幕被公侵犯的漂亮人妻 | 久草视频手机在线观看 | 久久黑人 | 亚洲欧美日韩久久 | 亚洲1024| 国产视频精选 | 日韩成人黄色片 | av在哪里看 | 久久久久久亚洲精品 | 中文字幕三级视频 | 久久久久久免费视频 | 少妇一级淫免费放 | 九九九九九精品 | 自拍偷拍欧美日韩 | 亚洲女人天堂网 | 天天躁狠狠躁 | 瑟瑟视频在线观看 | 欧亚成人av| 亚洲欧美精品一区二区三区 | 亚洲色图欧美色 | 亚洲毛茸茸 | 国产精品一区二区三区在线播放 | 久久影视av |