[development][PCRE] old PCRE
介紹, man手冊?
txt版 http://www.pcre.org/original/pcre.txt
html版?http://www.pcre.org/original/doc/html/pcre.html
?
In addition to the Perl-compatible matching function, PCRE contains an alternative function that matches the same compiled patterns in a different way.In certain circumstances, the alternative function has some advantages. For a discussion of the two matching algorithms, see the pcrematching page.
pcrematching:
http://www.pcre.org/original/doc/html/pcrematching.html
摘要:
0.? 是批量處理的意思? 一個pattern處理多個subject么?
The set of strings that are matched by a regular expression can be represented as a tree structure.?
1.??Jeffrey Friedl's book "Mastering Regular Expressions"
中文版:精通正則表達式:https://book.douban.com/subject/2154713/
英文版PDF:https://doc.lagout.org/Others/O%27Reilly%20-%20Mastering%20Regular%20Expressions.pdf
2, PCRE匹配分標準接口(pcre_exec(),?pcre16_exec()?and?pcre32_exec()?functions.?)和非標準接口(?pcre_dfa_exec(),?pcre16_dfa_exec()?and?pcre32_dfa_exec()?functions?)兩種。
前者在同一個串中只能返回一個匹配結果,或者可以同時返回一個串中的多個匹配結果。
標準接口返回的結果有可能是最長串,最短串或任意長度的串,這取決于貪婪與非貪婪的設置。
標準接口就是NFA?algorithm是深度優先查找樹,同時可以有貪婪(greedy)與非貪婪(ungreedy)兩種控制種類。
非標準接口為廣度優先查找樹,為DFA算法(?In Friedl's terminology, this is a kind of "DFA algorithm",?though it is not implemented as a traditional finite state machine (it keeps multiple states active simultaneously).)subject串的掃描會一直進行到串的尾部或者沒有其他需要遍歷的路徑。所有的已終結路徑即代表了全部的匹配結果,返回的結果按照字符串長度遞減。有一個開關設置第一個命中即返回,也就是最短命中串。
3.? 非標準方法的優點:
a, 匹配多個結果,尤其是找到最長匹配。
b, 可以對超長的subject數據進行多次分批次的匹配。
? 非標準方法的缺點:
a, 比標準方法慢。
b, 不支持子串提取。
c,?Although atomic groups are supported, their use does not provide the performance advantage that it does for the standard algorithm.
?
pcrejit:
http://www.pcre.org/original/doc/html/pcrejit.html
摘要:
JIT提供特別深度的優化. 犧牲額外的處理步驟,從而提高匹配性能。適合一次pattern編譯多次match操作的應用場景。
1. 只支持標準PCRE接口,不支持DFA匹配模式。
2. PCRE默認不打開JIT,需要在編譯的時候增加--enable-jit選項。
3.? 有硬件平臺限制
ARM v5, v7, and Thumb2Intel x86 32-bit and 64-bitMIPS 32-bitPower PC 32-bit and 64-bitSPARC 32-bit (experimental)4.??the?pcre_jit_exec()?function was not available at all before 8.32
5.??The JIT compiler generates different optimized code for each of the three modes (normal, soft partial, hard partial).?When?pcre_exec()?is called, the appropriate code is run if it is available. Otherwise, the pattern is matched using interpretive code.
6.??There are some?pcre_exec()?options that are not supported for JIT execution. There are also some pattern items that JIT cannot handle. Details are given below. In both cases, execution automatically falls back to the interpretive code.
7.??Once a pattern has been studied, with or without JIT, it can be used as many times as you like for matching different subject strings.
8.? ?The code that is generated by the JIT compiler is architecture-specific, and is also position dependent. For those reasons it cannot be saved (in a file or database) and restored later like the bytecode and other data of a compiled pattern.?
more info:?http://www.pcre.org/original/doc/html/pcreprecompile.html
9.? 有時候JIT機器碼沒有成功編譯,但是pcre_exec()仍然正常運行,只不過fallback回了解釋碼。我們在高性能場景下不希望使用解釋碼的時候,使用API pcre_jit_exec().
Because the API described above falls back to interpreted execution when JIT is not available, it is convenient for programs that are writtenfor general use in many environments. However, calling JIT via pcre_exec() does have a performance impact. Programs that are written for use
where JIT is known to be available, and which need the best possible performance, can instead use a "fast path" API to call JIT execution directly
instead of calling pcre_exec() (obviously only for patterns that have been successfully studied by JIT).
? 10.??pcre_exec()會做參數合法性的檢測。pcre_jit_exec()為了提高性能,不做合法性檢測,如果參數不合法,結果無法預期。
?
API:?
http://www.pcre.org/original/doc/html/pcreapi.html
摘要:
1,
The functions pcre_compile(), pcre_compile2(), pcre_study(), and pcre_exec() are used for compiling and matching regular expressions in a Perl-compatible manner.2, compile a pattern
http://www.pcre.org/original/doc/html/pcreapi.html#SEC11
3,? ? studying a pattern
Studying a pattern does two things: first, a lower bound for the length of subject string that is needed to match the pattern is computed. This does not meanthat there are any strings of that length that match, but it does guarantee that no shorter strings match. The value is used to avoid wasting time by trying
to match strings that are shorter than the lower bound. Studying a pattern is also useful for non-anchored patterns that do not have a single fixed starting character. A bitmap of possible starting bytes is created.
This speeds up finding a position in the subject at which to start matching.
4,? ?matching a pattern
However, it is possible to save compiled patterns and study data, and then use them later in different processes, possibly even on different hosts.For a discussion about this, see the pcreprecompile documentation.
?
對比一下PCRE2:
[development][PCRE] PCRE
?
?
trie:
https://zh.wikipedia.org/zh-hans/Trie
?
-------------------
黑哥的blog:http://www.cnblogs.com/zzqcn/p/3525636.html
這個講的很好,對比PCRE、PCRE-JIT,hyperscan:https://mp.weixin.qq.com/s?__biz=MzI3NDA4ODY4MA==&mid=2653334341&idx=1&sn=bf10ca6d8ca1452723b84a62f7fc436d&chksm=f0cb5cc2c7bcd5d4f423af8d78aeb58dd6d9494c1562b1e775579321df3b9f59a951656100d0&scene=21#wechat_redirect
?
?
?
?
轉載于:https://www.cnblogs.com/hugetong/p/8619196.html
總結
以上是生活随笔為你收集整理的[development][PCRE] old PCRE的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Jmeter添加压力机
- 下一篇: java学习--基础知识第六天--笔记