當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

PG使用 nlpbamboo chinesecfg 中文分词

發布時間：2023/12/18 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了 PG使用 nlpbamboo chinesecfg 中文分词小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

環境： CentOS 6.5 64bit PostgreSQL 9.4.4 nlpbamboo-1.1.2 cmake-3.3.1 CRF++-0.57 2.安裝cmake [root@prod /opt]# wegt http://www.cmake.org/files/v3.3/cmake-3.3.1.tar.gz [root@prod /opt]#cd cmake-3.3.1 [root@prod /opt]#./bootstrap --prefix=/opt/cmake3.3.1 [root@prod /opt]#gmake [root@prod /opt]#gmake install vi ~/.bash_profile export PATH=/opt/cmake3.3.1/bin:$PATH . ~/.bash_profile 3.安裝CRF? ?? [root@prod /opt]# wget http://ncu.dl.sourceforge.net/project/crfpp/crfpp/0.54/CRF%2B%2B-0.54.tar.gz [root@prod /opt]# tar -zxvf CRF++-0.54.tar.gz?
[root@prod /opt]# cd CRF++-0.54 [root@prod /opt/CRF++-0.54]# ./configure? [root@prod /opt/CRF++-0.54]# gmake
[root@prod /opt/CRF++-0.54]# gmake install
4.安裝nlpbamboo [root@prod /opt]# wget http://nlpbamboo.googlecode.com/files/nlpbamboo-1.1.2.tar.bz2貌似要翻墻，附件是這個文件。
[root@prod /opt]# tar -jxvf nlpbamboo-1.1.2.tar.bz2?
[root@prod /opt/nlpbamboo-1.1.2]# cd nlpbamboo-1.1.2
[root@prod /opt/nlpbamboo-1.1.2/build]# cd build/
[root@prod /opt/nlpbamboo-1.1.2/build]# cmake .. -DCMAKE_BUILD_TYPE=release
[root@prod /opt/nlpbamboo-1.1.2/build]# gmake all
[root@prod /opt/nlpbamboo-1.1.2/build]# gmake install
5.加入lib庫鏈接

[root@prod /opt/nlpbamboo-1.1.2/build]# echo "/usr/lib" >>/etc/ld.so.conf

[root@prod /opt/nlpbamboo-1.1.2/build]# echo "/usr/local/lib" >>/etc/ld.so.conf

[root@prod /opt/nlpbamboo-1.1.2/build]# ldconfig -f /etc/ld.so.conf
測試是否正常：

[root@prod /opt/nlpbamboo-1.1.2/build]# ldconfig -p|grep bambo

? ? ? ? libbamboo.so.2 (libc6,x86-64) => /usr/lib/libbamboo.so.2

? ? ? ? libbamboo.so (libc6,x86-64) => /usr/lib/libbamboo.so

[root@prod /opt/nlpbamboo-1.1.2/build]# ldconfig -p|grep crf

? ? ? ? libcrfpp.so.0 (libc6,x86-64) => /usr/local/lib/libcrfpp.so.0

? ? ? ? libcrfpp.so (libc6,x86-64) => /usr/local/lib/libcrfpp.so

6.加入索引：
[root@prod /opt/nlpbamboo-1.1.2/build]# cd /opt/bamboo
wget http://nlpbamboo.googlecode.com/files/index.tar.bz2
完蛋這里又下載不了了，還是別人幫忙下載的，附件有。
[root@prod /opt]# cd bamboo/
[root@prod /opt/bamboo]# tar -jxvf index.tar.bz2?
[root@prod /opt/bamboo]# cd exts/postgres/chinese_parser/
[root@prod /opt/bamboo/exts/postgres/chinese_parser]# make
[root@prod /opt/bamboo/exts/postgres/chinese_parser]# make install
7.編輯PG支持模塊
[postgres@prod /opt/pgsql/share/contrib]$ cd /opt/bamboo/exts/postgres/chinese_parser
[postgres@prod /opt/pgsql/share/contrib]$make
[postgres@prod /opt/pgsql/share/contrib]$make install

[root@prod /opt/bamboo/exts/postgres/chinese_parser]# touch $PGHOME/share/tsearch_data/chinese_utf8.stop

[root@prod /opt/bamboo/exts/postgres/chinese_parser]# cd /opt/bamboo/exts/postgres/pg_tokenize

[root@prod /opt/bamboo/exts/postgres/chinese_parser]# make
[root@prod /opt/bamboo/exts/postgres/chinese_parser]# make install
8.安裝PG支持模塊

[root@prod /opt/bamboo/exts/postgres/chinese_parser]# su postgres

[postgres@prod /opt/bamboo/exts/postgres/chinese_parser]$ cd /opt/pgsql/share/contrib/

[postgres@prod /opt/pgsql/share/contrib]$ psql -h 127.0.0.1 postgres postgres -f chinese_parser.sql

[postgres@prod /opt/pgsql/share/contrib]$ psql -h 127.0.0.1 postgres postgres -f pg_tokenize.sql
中間可能會遇到這種錯誤：

postgres=# select * from tokenize('你好我是中國人');

ERROR: ?invalid byte sequence for encoding "UTF8": 0xc4 0xe3

原因是沒有正確設置客戶端字符集。
解決方法：方法一：設置postgresql的客戶端編碼為GBK，這時PostgreSQL就知道輸入的內容是GBK編碼的，這樣PostgreSQL數據庫會自動做字符集的轉換，把其轉換成UTF8編碼。psql中輸入“\encoding GBK” 方法二：直接設置終端的字符集編碼為UTF8，讓輸入的編碼直接為UTF8，而不是GBK。[postgres@dsc ~]$ export PGCLIENTENCODING=GBK

9.查看全文檢索配置中加入了chinesecfg的配置

postgres=# select * from tpg_ts_config;

? cfgname ? | cfgnamespace | cfgowner | cfgparser?

------------+--------------+----------+-----------

?simple ? ? | ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?danish ? ? | ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?dutch ? ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?english ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?finnish ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?french ? ? | ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?german ? ? | ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?hungarian ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?italian ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?norwegian ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?portuguese | ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?romanian ? | ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?russian ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?spanish ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?swedish ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?turkish ? ?| ? ? ? ? ? 11 | ? ? ? 10 | ? ? ?3722

?chinesecfg | ? ? ? ? ? 11 | ? ? ? 10 | ? ? 16453

(17 rows)

測試：
postgres=# select * from to_tsvector('chinesecfg','你好,我是中國人. 前在杭州斯凱做數據庫相關的ぷ'); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?to_tsvector ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ------------------------------------------------------------------------------------------------------- ',':2 '.':7,17 '中國':5 '人':6 '你好':1 '做':12 '在':9 '工作':16 '我':3 '數據庫':13 '斯凱':11 '是':4 '杭州':10 '的':15 '目前':8 '相關':14 (1 row)

來自 “ ITPUB博客 ” ，鏈接：http://blog.itpub.net/25954236/viewspace-1809299/，如需轉載，請注明出處，否則將追究法律責任。

轉載于:http://blog.itpub.net/25954236/viewspace-1809299/

總結

以上是生活随笔為你收集整理的PG使用 nlpbamboo chinesecfg 中文分词的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Linux unison 效率,Linu
下一篇： Pygame库200行代码实现简易飞机大

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

PG使用 nlpbamboo chinesecfg 中文分词

總結