小试debian-7.11.0-amd64+Plone5.1.2全文检索和预览中文WORD中文PDF
先勸誡提醒一下,沒接觸過Plone的老鐵們最好就不要往下看了,哈哈。易度的老潘寫的兩篇文章講透了Zope/Plone有多糟心
(知乎上的)https://www.zhihu.com/question/19649024 (豆瓣上的)https://www.douban.com/group/topic/11400495/我自己用Zope/Plone的體會是,國內(nèi)用的人少,中文資料極缺,學(xué)習(xí)的Zope/Plone知識基本上用不到其他項(xiàng)目中去,一點(diǎn)點(diǎn)小問題的解決都要大費(fèi)周折,而且第三方組件往往不能隨核心版本升級,總之是Zope/Plone造成的問題比它解決的問題還多。
2007年的時候沒認(rèn)對方向,一時糊涂,在windows下用Plone-2.1.3配合
老潘(http://old.zope.org/Members/panjunyong)的 CJKSplitter ZopeChinaPak ingeniweb(http://ingeniweb.sourceforge.net/)的 PloneExFile AttachmentField FileSystemStorage建了一個自己用的電子文檔管理系統(tǒng),二進(jìn)制文件不保存進(jìn)ZODB而存入文件系統(tǒng),FTP批量上傳,中文全文檢索和預(yù)覽Word2003和文本類PDF文件,目錄列表和搜索結(jié)果時截取前幾十百把個字符顯示在每條下方。
這些年主要升級了兩次一次升級到2.5.5,一次到3.3.5(ZODB3.8支持blob,用wc.pageturner加入了圖片類PDF預(yù)覽的功能),系統(tǒng)與其說是用Plone,不如說是用PloneExFile,由于PloneExFile項(xiàng)目停了(不支持Plone4了),再加上一直用的ODBCDA也只支持Plone3(Python2.4),就實(shí)在是不想繼續(xù)升級了。
系統(tǒng)一直自己用著還不錯,但是沒辦法給本部門其他同事共用,因?yàn)镻loneExFile不支持office2007及以后版本,雖然Products.OpenXml可以加入office2007及以后版本的中文全文檢索,但是預(yù)覽功能和截取字符功能沒辦法實(shí)現(xiàn)了,wc.pageturner也有點(diǎn)小毛病(FTP批量上傳的PDF在轉(zhuǎn)換SWF時往往會導(dǎo)致ZODB崩潰,得用fsrecover.py才能修復(fù),網(wǎng)頁上傳的卻正常)。
前一段時間在一個網(wǎng)頁上
https://stackoverflow.com/questions/12420334/how-to-use-wc-pageturner-in-plone-4-1看到wc.pageturner的作者vangheem回復(fù)了這樣一句話?
If you want a pdf viewer, use collective.documentviewer. I am no longer updating wc.pageturner-- collective.documentviewer is a much better viewer and implementation. – vangheem 2012-09-14?他在自己的網(wǎng)站上也提到It is recommended that you do not use this method anymore. Please use collective.documentviewer now which should cover all the use cases.
https://www.nathanvangheem.com/posts/2011/04/14/using-plone-as-a-document-repository.html?自己也想了解一下現(xiàn)在Plone發(fā)展到什么程度了,建文檔管理系統(tǒng)的方便程度如何,于是試著用最新的Plone版本來用一用collective.documentviewer。選擇debian是因?yàn)閏ollective.documentviewer不支持windows,用debian-7.11.0是因?yàn)槭诸^正好下載了完整10張DVD,原型測試夠用就行,需要說明的是collective.documentviewer要用到docsplit,而docsplit又基于libreoffice或openoffice,安裝debian時一定要選擇安裝“桌面支持”和“開發(fā)支持”以及中文支持。好像還有一種libreoffice的headless進(jìn)程服務(wù),似乎不需要圖形界面,用端口提供轉(zhuǎn)換服務(wù)(Alfresco中就有用到),但是我沒有去試。debian-7.11.0網(wǎng)絡(luò)安裝如果只用debian-7.11.0-amd64的DVD通過FTP提供內(nèi)網(wǎng)APT Repository服務(wù),只用到DVD1,但是如果用DVD1安裝,會要求切換三張DVD,說明用DVD安裝的版本更全,我就發(fā)現(xiàn)DVD安裝的才有中文輸入法。
主要參考資料
http://documentcloud.github.io/docsplit https://www.documentcloud.org/opensource https://www.nathanvangheem.com/posts/2012/04/29/document-viewer-integration-in-plone.html https://www.dangtrinh.com/2013/07/plone-review-documents-in-plone-with.html http://tunmer.me/how-tos/installing-plone-on-ubuntu.html一、先安裝必備的支持組件,部分是安裝Plone需要的,部分是運(yùn)行Plone與其組件需要的
apt-get -y --force-yes install build-essential apt-get -y --force-yes install gcc g++ sudo git apt-get -y --force-yes install libxml2 libxml2-dev libxslt1-dev apt-get -y --force-yes install zlibc zlib1g-dev libbz2-dev libssl-dev p7zip-full unzip apt-get -y --force-yes install unace unp bzip2 gzip patch apt-get -y --force-yes install python-dev libjpeg-dev apt-get -y --force-yes install libsqlite3-dev apt-get -y --force-yes install libreadline-dev apt-get -y --force-yes install rubygems apt-get -y --force-yes install graphicsmagick apt-get -y --force-yes install poppler-utils poppler-data apt-get -y --force-yes install ghostscript apt-get -y --force-yes install tesseract-ocr apt-get -y --force-yes install pdftk二、下載Plone5.1.2,解壓,基礎(chǔ)安裝
https://launchpad.net/plone/5.1/5.1.2/+download/Plone-5.1.2-UnifiedInstaller.tgz解壓、檢查安裝參數(shù)?
tar zxvf Plone-5.1.2-UnifiedInstaller.tgz cd Plone-5.1.2-UnifiedInstaller ./install.sh --helpdebian-7.11.0-amd64中自帶的是Python2.7.3(查了一下當(dāng)前stretch:python 2.7.13-2;sid:python 2.7.14-8),不符合Plone5.1.2要求Python version must be 2.7.9+,必須指定--build-python
./install.sh --build-python --target=/opt/plone zeo安裝過程會下載Python-2.7.14.tgz到Plone-5.1.2-UnifiedInstaller/packages目錄中,用于編譯構(gòu)建virtualenv環(huán)境(如果先前沒有apt-get install libreadline-dev,會看到如下提示,提示編譯出的不支持readline,安裝還是可以完成的)
Warning: This Python does not have readline support. It may still be usable for Zope, but interacting directly with Python will be painful.安裝出錯時可以查看安裝LOG,安裝成功時LOG也有“ chmod: 更改“***----***”的權(quán)限:不允許的操作 ”等字樣。
Plone-5.1.2-UnifiedInstaller/install.log等待漫長的安裝過程結(jié)束,網(wǎng)絡(luò)狀況的好壞決定了能否順利完成安裝及速度。安裝后半段執(zhí)行了buildout,buildout運(yùn)行時下載的文件(也包括部分安裝包自帶組件解壓出來的)存在以下目錄中
/opt/plone/buildout-cache/downloads/dist/安裝結(jié)束會提示管理用戶名和密碼,如果此時沒有記錄下來,還可以查看一個記錄的文件
cat /opt/plone/zeocluster/adminPassword.txt
admin jkMq3sadkxJm同時提示中還表明安裝時建立了一個用戶組plone_group,和兩個用戶,這個很重要,后面會用到
ZEO & Client Daemons :plone_daemon Code Resources & buildout :plone_buildout Setting /opt/plone ownership to plone_buildout:plone_group三、建立第一個站點(diǎn)
在debian上用root用戶啟動服務(wù)
cd /opt/plone/zeocluster bin/plonectl start在客戶端的瀏覽器上連接8080端口
http://10.16.97.205:8080選擇建立一個站點(diǎn),需要用admin登錄,如果用的缺省的站點(diǎn)名字Plone,以后訪問站點(diǎn)的URL就是
http://10.16.97.205:8080/Plone四、rubygems安裝docsplit
collective.documentviewer 5.0.1依賴于docsplit,collective.documentviewer是DocumentCloud Projects項(xiàng)目的子項(xiàng)目中NY Times' Document Viewer的Plone綁定。docsplit也是這個項(xiàng)目的子項(xiàng)目,其轉(zhuǎn)換依賴LibreOffice,因此服務(wù)器debian安裝時一定要選擇安裝“桌面支持”和“開發(fā)支持”。docsplit主頁為:
https://rubygems.org/gems/docsplit/versions/0.7.6collective.documentviewer只支持Plone4和Plone5,并且不支持windows(docsplit好像不支持windows),參見以下網(wǎng)頁
https://stackoverflow.com/questions/14543419/can-collective-documentviewer-work-on-windows-2003-server-plone4-2?gem安裝docsplit
gem install docsplit --version=0.7.6如果網(wǎng)絡(luò)原因無法安裝,可以下載gem文件(下載鏈接:https://rubygems.org/downloads/docsplit-0.7.6.gem?)手工安裝:
gem install docsplit-0.7.6.gemdebian中已安裝的gem可以在以下目錄中找到原始gem文件
/var/lib/gems/1.8/cache/docsplit-0.7.6.gem五、下載對應(yīng)tika1.11版本的tika.cfg
?tika是apache的一個java項(xiàng)目,是Apache Lucene的子項(xiàng)目,支持識別二進(jìn)制文件的格式和編碼encoding,提取出文本內(nèi)容(還有meta等格式信息),據(jù)說也支持windows,而ftw.tika是tika的Plone綁定,全文檢索就靠它了
最早看到的推薦ftw.tika的文章: https://stackoverflow.com/questions/23151319/plone-full-text-indexing-excel-files 項(xiàng)目網(wǎng)址: https://github.com/4teamwork/ftw.tika下載master分支的zip文件,目前是對應(yīng)tika 1.11的版本,解壓文件,只將tika.cfg文件拷貝到/opt/plone/zeocluster/目錄中,與buildout.cfg在同一目錄中
六、?安裝Oracle官方的jdk1.8
tika 1.11最低要求java1.7。為了運(yùn)行一個python項(xiàng)目,不僅安裝了ruby,還要安裝java,會不會被純Pythoner鄙視?
安裝過程略。
?java -version
java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)七、編輯buildout.cfg
編輯前先備份一下,這是必須養(yǎng)成的習(xí)慣,后文不再強(qiáng)調(diào)
cp /opt/plone/zeocluster/buildout.cfg /opt/plone/zeocluster/buildout.cfg.bk這部分參考以下網(wǎng)頁,這個網(wǎng)頁可能無法直接打開,但網(wǎng)頁的確存在,原因~!@#¥%……&*()——+
http://blog.abdullahsolutions.com/2016/08/installing-ftwtika-in-plone.html引用主要內(nèi)容如下,加入了collective.documentviewer的內(nèi)容
I love being able to search in all the documents uploaded into plone. I keep on forgetting that this was an add-on and not natively provided. The latest add-on I tried to enable that feature was ftw.tika. To install it, first download the tika.cfg file from their github page at https://github.com/4teamwork/ftw.tika. Once that has been downloaded, modify your buildout.cfg with: ############ [buildout] extends =... tika.cfgeggs =...ftw.tikacollective.documentviewerzcml =...ftw.tikaftw.tika-metaparts =...tika-server-downloadtika-server[client1] ... zcml-additional += ${tika:zcml} eggs += ftw.tika[client2] ... zcml-additional += ${tika:zcml} eggs += ftw.tika[versions] collective.documentviewer = 5.0.1############ Once that is done, run buildout. Then you can start the tika server with "bin/tika-server". Then you can start your plone instance. After that make sure you login and enable the tika add-on in your "site-setup", "add-ons" page.生成patch:diff -uN buildout.cfg?buildout.cfg.ok >buildout512.cfg.diff,以后重裝時在buildout.cfg同一目錄中patch -p0 <buildout512.cfg.diff
--- buildout.cfg 2018-08-11 17:16:24.934227831 +0800 +++ buildout.cfg.ok 2018-08-11 17:16:09.310229089 +0800 @@ -38,6 +38,7 @@extends =base.cfgversions.cfg + tika.cfg# http://dist.plone.org/release/5.1.2/versions.cfg# If you change your Plone version, you'll also need to update @@ -71,6 +72,8 @@eggs =PlonePillow + ftw.tika + collective.documentviewer############################################# ZCML Slugs @@ -79,7 +82,8 @@# use them. This is increasingly rare.zcml =# plone.reload - + ftw.tika + ftw.tika-meta############################################# Development Eggs# ---------------- @@ -149,7 +153,8 @@unifiedinstallerprecompilersetpermissions - + tika-server-download + tika-server############################################# Major Parts# ---------------------- @@ -167,12 +172,17 @@recipe = plone.recipe.zope2instancezeo-address = ${zeoserver:zeo-address}http-address = 8080 +ftp-address = 8021 +zcml-additional += ${tika:zcml} +eggs += ftw.tika[client2]<= client_baserecipe = plone.recipe.zope2instancezeo-address = ${zeoserver:zeo-address}http-address = 8081 +zcml-additional += ${tika:zcml} +eggs += ftw.tika############################################# Versions Specification @@ -197,3 +207,14 @@plone.recipe.unifiedinstaller = 4.3.2plone.recipe.command = 1.1plone.recipe.precompiler = 0.6 + +certifi = 2017.11.5 +chardet = 3.0.4 +collective.recipe.scriptgen = 0.2 +ftw.tika = 2.9.0 +hexagonit.recipe.download = 1.7.1 +idna = 2.6 +requests = 2.18.4 +urllib3 = 1.22 + +collective.documentviewer = 5.0.1八、開始buildout
開始buildout前務(wù)必先停掉服務(wù)
cd /opt/plone/zeocluster bin/plonectl stopPlone不允許用root用戶運(yùn)行buildout,必須用普通用戶sudo為plone_buildout角色運(yùn)行。
Buildout should not be run while superuser. Doing so allows untrusted code to be run as root. Instead, you probably wish to do something like:sudo -u plone_buildout bin/buildoutIf you have a good reason to bypass this restriction, remove the buildout.sanitycheck extension from your buildout.如果是新安裝的wheezy,可能還不允許普通用戶運(yùn)行sudo,新建配置文件讓普通用戶可以運(yùn)行sudo,假設(shè)普通用戶賬號為hero
joe /etc/sudoers.d/hero內(nèi)容只有一行
hero ALL=(ALL:ALL) ALL正常的情況下應(yīng)該先設(shè)置buildout使用pypi的鏡像甚至本地pypi庫,否則全世界的buildout都用官網(wǎng),速度肯定快不了,只需要在base.cfg的[buildout]段加入一行?index=http://mirrors.163.com/pypi/simple/
joe /opt/plone/zeocluster/base.cfg
[buildout] ... ... index=http://mirrors.163.com/pypi/simple/如果想完全重裝,之前安裝用到Plone-5.1.2-UnifiedInstaller/packages/Python-2.7.14.tgz,修改好了的buildout.cfg文件,以及目錄/opt/plone/buildout-cache/downloads/dist/中的內(nèi)容可以備份好,在相應(yīng)時間節(jié)點(diǎn)拷貝回新安裝的相同目錄(要注意文件的擁有者和文件屬性,后文會提到),以節(jié)約時間。用以下命令開始buildout,?見證噩夢的時刻到了。。。。。。。。。。。。。。。。。
hero@mydebian205:/opt/plone/zeocluster$ sudo -u plone_builout bin/buildout -vvvbuildout會遇到各種狀況,buildout意外中斷,buildout停止反應(yīng)既不下載也不編譯,buildout看似完成但服務(wù)無法啟動,服務(wù)啟動但用URL無法訪問站點(diǎn),可訪問站點(diǎn)但組件沒出現(xiàn)在面板中或未生效等不可預(yù)測的情況都有可能遇到。
最常見的情況是下載不順問題,解決辦法是按Ctrl+C中斷進(jìn)程,記下需要的文件和版本到https://pypi.org直接下載文件保存到以下目錄,并設(shè)置文件擁有者和文件屬性
ls /opt/plone/buildout-cache/downloads/dist/ chown -R plone_buildout:plone_group /opt/plone/buildout-cache/downloads/dist/ chmod -R 664 /opt/plone/buildout-cache/downloads/dist/然后重新開始buildout
hero@mydebian205:/opt/plone/zeocluster$ sudo -u plone_builout bin/buildout -vvv一遍又一遍,一遍又一遍,直到每次輸入buildout命令準(zhǔn)備按回車前都雙手合十,求上蒼保佑,才算進(jìn)入角色了。畢竟本文只增加了兩個組件ftw.tika和collective.documentviewer,坑不夠大,在瘋掉前還是有希望成功的。buildout順利完成后顯示的是picked的組件名和版本號。
為了加快速度,我修改了tika.cfg,將下載慢的兩個最大文件用win下迅雷下載并上傳到/opt/share目錄中,改http://為file:///
#url = http://repo1.maven.org/maven2/org/apache/tika/tika-app/1.11/tika-app-1.11.jar url = file:///opt/share/tika-app-1.11.jar#url = http://repo1.maven.org/maven2/org/apache/tika/tika-server/1.11/tika-server-1.11.jar url = file:///opt/share/tika-server-1.11.jar九、站點(diǎn)中安裝組件和配置組件
用root權(quán)限啟動服務(wù),在客戶機(jī)瀏覽器中訪問站點(diǎn),用admin登錄(站點(diǎn)不登錄是不可能被改動的,下文不再強(qiáng)調(diào))
admin-網(wǎng)站設(shè)置-附加組件
| 可啟用附加組件 |
| Document Viewer Installs the collective.documentviewer package – (collective.documentviewer 5.0.1) 警告 此附加組件無法卸載! |
| ftw.tika Apache Tika integration for Plone – (ftw.tika 2.9.0) |
先只安裝ftw.tika,然后停掉服務(wù)
cd /opt/plone/zeocluster bin/plonectl stop分別在debian中開兩個終端,分別用root權(quán)限運(yùn)行tika-server和Plone服務(wù),其中tika-server終端將滾屏顯示(服務(wù)器上訪問http://localhost:9998可以查看到tika的一個界面),Plone服務(wù)的會回到命令提示符下
cd /opt/plone/zeocluster bin/tika-server cd /opt/plone/zeocluster bin/plonectl start添加新的條目-文件
上傳幾個中文文件名中文內(nèi)容的doc,docx,pdf(文本類,非圖片類),測試全文搜索是否生效,注意是文件內(nèi)容的全文檢索,文件名的檢索是Plone自帶的,不需要tika,實(shí)際上如果debian中安裝有wv(apt-get install wv),無tika組件的Plone也支持doc文件中文全文檢索,但docx的全文檢索是tika貢獻(xiàn)的功能(支持全文檢索的組件也不只tika一個,只是tika的前景應(yīng)是最好的)。全文檢索只是定位到文件的位置,并沒有文件實(shí)際內(nèi)容的預(yù)覽。
全文檢索功能正常后開始解決文檔預(yù)覽,Plone下文件無組件支持情況只能下載是無法預(yù)覽內(nèi)容的。
admin-網(wǎng)站設(shè)置-附加組件
安裝Document Viewer
同tika不同這個組件自身還要配置
admin-網(wǎng)站設(shè)置--附加組件配置-文檔管理系統(tǒng)設(shè)置-按文件類型自動布局
只有PDF被選中,增加鉤選Word Document,保存
十、解決collective.documentviewer的BUG
上傳一個中文內(nèi)容的docx文件,發(fā)現(xiàn)collective.documentviewer沒有生效,但是全文檢索是有效的,網(wǎng)頁中出現(xiàn)錯誤提示,但是點(diǎn)擊Show Document viewer Conversion Error鏈接無效
Info There was an error trying to convert the document. Maybe the document is encrypted, corrupt or malformed? Check log for details. 測試.docx Show Document Viewer Conversion Error?用文本編輯器查看/opt/plone/zeocluster/var/client1/event.log文件末尾的內(nèi)容
Traceback (most recent call last):File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 598, in __call__pages = self.run_conversion()File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 428, in run_conversionreturn docsplit.convert(self.storage_dir, **args)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 324, in convertself.convert_to_pdf(path, filename, output_dir)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 280, in convert_to_pdfself._run_command(cmd)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 126, in _run_commandraise Exception(error) Exception: Command /usr/local/bin/docsplit pdf /tmp/tmpdfnKDQ/dump.docx --output /tmp/tmpdfnKDQ finished with return code 1 and output:terminate called after throwing an instance of 'com::sun::star::uno::RuntimeException' Aborted /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:33:in `libre_office?': undefined method `match' for nil:NilClass (NoMethodError)from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:128:in `extract'from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:120:in `each'from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:120:in `extract'from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit.rb:65:in `extract_pdf'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/../lib/docsplit/command_line.rb:47:in `run'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/../lib/docsplit/command_line.rb:37:in `initialize'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/docsplit:5:in `new'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/docsplit:5from /usr/local/bin/docsplit:23:in `load'from /usr/local/bin/docsplit:23------ 2018-08-10T01:42:44 INFO ftw.tika Converting document with tika JAXRS server: 測試.docx在網(wǎng)上搜索了一下有兩篇文章似乎提供了解決辦法
?一是
https://github.com/collective/collective.documentviewer/issues/11?建議修改docsplit的組件/var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb
二是
https://pypi.org/project/collective.documentviewer/建議修改/tmp和/var/tmp的權(quán)限,增加粘滯位。
經(jīng)實(shí)測都不解決問題。繼續(xù)分析出錯日志中,運(yùn)行出錯命令是
/usr/local/bin/docsplit pdf /tmp/tmpdfnKDQ/dump.docx --output /tmp/tmpdfnKDQ查看一下相應(yīng)目錄
# ls -l /tmp/tmpdfnKDQ -rw------- 1 plone_daemon plone_group 41139 8月 10 01:42 dump.docx?用root用戶執(zhí)行出錯日志中的命令
# /usr/local/bin/docsplit pdf /tmp/tmpdfnKDQ/dump.docx --output /tmp/tmpdfnKDQ居然沒有報錯
用root用戶查看一下相應(yīng)目錄
# ls -l /tmp/tmpdfnKDQ -rw------- 1 plone_daemon plone_group 41139 8月 10 01:42 dump.docx -rw-r--r-- 1 root root 143352 8月 10 01:51 dump.pdf drwxr-xr-x 3 root root 4096 8月 10 01:51 libreoffice發(fā)現(xiàn)轉(zhuǎn)換pdf文件已成功,用相應(yīng)軟件打開這個pdf也正常。既然root用戶可以,而plone_daemon用戶不行那一定是權(quán)限問題,排除pdf_extractor.rb的問題,因?yàn)槟瞧恼陆鉀Q的是不能識別LiberOffice的問題,癥狀應(yīng)該是root用戶或任何用戶都運(yùn)行出錯。增加粘滯位是從權(quán)限角度,但是很容易證明也不能解決問題。
我想既然root用戶運(yùn)行可行,那就讓代碼調(diào)用docsplit時運(yùn)行sudo docsplit,問題在于sudo時會要求輸入root密碼,只適用交互界面,代碼需要附加的解決辦法,后來發(fā)現(xiàn)sudoers可以配置成不需要root密碼。這當(dāng)然會有一定的安全問題,本文只是原型測試,只能先把功能搞定,以后有時間再去找最優(yōu)方案,新建配置文件
joe /etc/sudoers.d/plone_daemon只有一行,為什么要加入/bin/rm后文會解釋
plone_daemon ALL = NOPASSWD:/usr/local/bin/docsplit,/bin/rm?joe /opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py找到271行
def convert_to_pdf(self, filepath, filename, output_dir):# get ext from filenameext = os.path.splitext(os.path.normcase(filename))[1][1:]inputfilepath = os.path.join(output_dir, 'dump.%s' % ext)shutil.move(filepath, inputfilepath)orig_files = set(os.listdir(output_dir))cmd = [self.binary, 'pdf', inputfilepath,'--output', output_dir]self._run_command(cmd)?在self.binary前加入 '/usr/bin/sudo',
def convert_to_pdf(self, filepath, filename, output_dir):# get ext from filenameext = os.path.splitext(os.path.normcase(filename))[1][1:]inputfilepath = os.path.join(output_dir, 'dump.%s' % ext)shutil.move(filepath, inputfilepath)orig_files = set(os.listdir(output_dir))cmd = ['/usr/bin/sudo', self.binary, 'pdf', inputfilepath,'--output', output_dir]self._run_command(cmd)?重啟服務(wù),繼續(xù)測試,仍然出錯,界面出錯信息沒有任何有用的信息,繼續(xù)分析LOG,用文本編輯器查看/opt/plone/zeocluster/var/client1/event.log文件末尾的內(nèi)容,內(nèi)容改變了,好兆頭。LOG顯示“---sudo docsplit pdf---"部分已完成了,出錯的是后續(xù)部分
------ 2018-08-10T22:02:54 INFO collective.documentviewer Running command /usr/bin/sudo /usr/local/bin/docsplit pdf /tmp/tmpBsifva/dump.docx --output /tmp/tmpBsifva ------ 2018-08-10T22:03:07 INFO collective.documentviewer Finished Running Command /usr/bin/sudo /usr/local/bin/docsplit pdf /tmp/tmpBsifva/dump.docx --output /tmp/tmpBsifva ------ 2018-08-10T22:03:07 ERROR collective.documentviewer Error converting PDF:Traceback (most recent call last):File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 598, in __call__pages = self.run_conversion()File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 428, in run_conversionreturn docsplit.convert(self.storage_dir, **args)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 324, in convertself.convert_to_pdf(path, filename, output_dir)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 289, in convert_to_pdfshutil.rmtree(libreOfficePath)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 261, in rmtreermtree(fullname, ignore_errors, onerror)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 253, in rmtreeonerror(os.listdir, path, sys.exc_info())File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 251, in rmtreenames = os.listdir(path) OSError: [Errno 13] Permission denied: '/tmp/tmpBsifva/libreoffice/3' Traceback (most recent call last):File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 598, in __call__pages = self.run_conversion()File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 428, in run_conversionreturn docsplit.convert(self.storage_dir, **args)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 324, in convertself.convert_to_pdf(path, filename, output_dir)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 289, in convert_to_pdfshutil.rmtree(libreOfficePath)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 261, in rmtreermtree(fullname, ignore_errors, onerror)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 253, in rmtreeonerror(os.listdir, path, sys.exc_info())File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 251, in rmtreenames = os.listdir(path) OSError: [Errno 13] Permission denied: '/tmp/tmpBsifva/libreoffice/3' ------ 2018-08-10T22:03:07 INFO ftw.tika Converting document with tika JAXRS server: 測試.docx出錯的是shutil.rmtree(libreOfficePath),并且是permission denied錯誤,經(jīng)過分析,/tmp/tmpBsifva臨時目錄中除了生成新的pdf文件外還有一個libreoffice目錄,由于docsplit是sudo為root權(quán)限建立的,plone_daemon用戶沒有權(quán)限刪除這個目錄導(dǎo)至出錯,解決辦法就是刪除目錄也用sudo調(diào)用的/bin/rm代替shutil.rmtree。/etc/sudoers中已經(jīng)加好了plone_daemon用戶無需root密碼sudo運(yùn)行/bin/rm,按相同原則將所有docsplit調(diào)用前都加上'/usr/bin/sudo',對應(yīng)四個參數(shù)?"images","text","length",'pdf'所在行。再將shutil.rmtree(libreOfficePath)和shutil.rmtree(storage_dir)都改成系統(tǒng)調(diào)用
os.system('/usr/bin/sudo /bin/rm -fr %s' % (libreOfficePath,)) os.system('/usr/bin/sudo /bin/rm -fr %s' % (storage_dir,))?測試過程中還出現(xiàn)過finished with return code ......?and output:后的內(nèi)容encodeing編碼出錯,找到106行
def _run_command(self, cmd):if isinstance(cmd, basestring):cmd = cmd.split()cmdformatted = ' '.join(cmd)logger.info("Running command %s" % cmdformatted)process = subprocess.Popen(cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, close_fds=self.close_fds)output, error = process.communicate()process.stdout.close()process.stderr.close()if process.returncode != 0:error = """Command %s finished with return code %i and output: %s %s""" % (cmdformatted, process.returncode, output, error)logger.info(error)raise Exception(error)logger.info("Finished Running Command %s" % cmdformatted)return output?顯示出錯信息時出錯,好奇葩,沒工夫去解決,搞定了還是不能解決最初的錯誤,把output, error和對應(yīng)的兩個%s刪掉了事。
生成patch:diff -uN convert.py convert.py.ok >convert501.py.diff,以后重裝時在convert.py同一目錄中patch -p0 <convert501.py.diff
--- convert.py 2018-08-10 23:14:20.068147869 +0800 +++ convert.py.ok 2018-08-10 23:02:37.864147759 +0800 @@ -120,8 +120,7 @@finished with return code%iand output: -%s -%s""" % (cmdformatted, process.returncode, output, error) +""" % (cmdformatted, process.returncode,)logger.info(error)raise Exception(error)logger.info("Finished Running Command %s" % cmdformatted) @@ -224,7 +223,7 @@# docsplit images pdf.pdf --size 700x,300x,50x# --format gif --outputcmd = [ - self.binary, "images", filepath, + '/usr/bin/sudo', self.binary, "images", filepath,'--language', lang,'--size', ','.join([str(s[1]) + 'x' for s in sizes]),'--format', format, @@ -251,7 +250,7 @@output_dir = os.path.join(output_dir, TEXT_REL_PATHNAME)ocr = not ocr and 'no-' or ''cmd = [ - self.binary, "text", filepath, + '/usr/bin/sudo', self.binary, "text", filepath,'--language', lang,'--%socr' % ocr,'--pages', 'all', @@ -265,7 +264,7 @@self._run_command(cmd)def get_num_pages(self, filepath): - cmd = [self.binary, "length", filepath] + cmd = ['/usr/bin/sudo', self.binary, "length", filepath]return int(self._run_command(cmd).strip())def convert_to_pdf(self, filepath, filename, output_dir): @@ -275,7 +274,7 @@shutil.move(filepath, inputfilepath)orig_files = set(os.listdir(output_dir))cmd = [ - self.binary, 'pdf', inputfilepath, + '/usr/bin/sudo', self.binary, 'pdf', inputfilepath,'--output', output_dir]self._run_command(cmd)@@ -286,7 +285,9 @@# folder next to the generated PDF, removes it!libreOfficePath = os.path.join(output_dir, 'libreoffice')if os.path.exists(libreOfficePath): - shutil.rmtree(libreOfficePath) + os.system('/usr/bin/sudo /bin/rm -fr %s' % (libreOfficePath,)) + #shutil.rmtree(libreOfficePath) + pass# move the file to the right location nowfiles = set(os.listdir(output_dir)) @@ -481,7 +482,8 @@files[filename] = saveFileToBlob(filepath)settings.blob_files = files - shutil.rmtree(storage_dir) + os.system('/usr/bin/sudo /bin/rm -fr %s' % (storage_dir,)) + #shutil.rmtree(storage_dir)# check for old storage to remove... Just in case.old_storage_dir = os.path.join(gsettings.storage_location,重啟plone讓改動生效后上傳docx成功預(yù)覽,原型測試結(jié)束。
=================================
其他值得探索的功能及需求
一、異步支持
上傳一個文件時,轉(zhuǎn)換預(yù)覽很慢,collective.documentviewer同時支持plone.app.async和collective.celery進(jìn)行異步轉(zhuǎn)換,上傳時可以迅速返回,實(shí)際轉(zhuǎn)換在后臺運(yùn)行,還可以查看進(jìn)度,但是網(wǎng)上的文章都是Plone4的,只找到一篇Plone5使用collective.celery的文章。
https://www.codesyntax.com/en/blog/collective-documentviewer-with-redis-backed-celery-tasks-on-plone-4-and-5文章中兩個指向https://gist.github.com的鏈接,不能直接打開,但網(wǎng)頁的確存在,原因~!@#¥%&*()+
二、中文目錄(路徑)中文文件名的ID
已上傳的文件ID變成ASCII碼和數(shù)字構(gòu)成,下載時原文件名被破壞,作為一個文檔管理系統(tǒng),文件名也是一種重要信息,最理想情況下可以用FTP將整個目錄和子目錄及文件上傳到Plone,Plone提供可識別的文檔全文檢索和預(yù)覽功能,不應(yīng)該破壞原信息,必要時還要可以用FTP從Plone原封不動地下載回來。
三、搜索結(jié)果及目錄內(nèi)容列表時的截取部分匹配上下文預(yù)覽
搜索引擎都有這樣的功能,搜索結(jié)果及目錄內(nèi)容列表時每一條目下應(yīng)有其部分文本內(nèi)容的截取,搜索結(jié)果中最理想情況下應(yīng)截取出現(xiàn)關(guān)鍵詞的前后文,同時關(guān)鍵詞突出顯示。這個功能對用戶快速定位自己需要的條目非常重要,用戶不必一個個點(diǎn)開全文預(yù)覽。
四、權(quán)限和工作流
禁止匿名用戶查看任何內(nèi)容,對登錄用戶也有部分內(nèi)容保密,不會在搜索結(jié)果中包含。
五、與apache或nginx的整合
新建Plone站點(diǎn)時有一個選項(xiàng)似乎與Plone直接將文件系統(tǒng)資源提供用戶訪問相關(guān),但更通用的情況是與apache或nginx的整合
Static resource storage A folder for storing and serving static resource files一是超大文件應(yīng)存在文件系統(tǒng)中,或者已經(jīng)在文件系統(tǒng)中,整合apache或nginx性能上有優(yōu)勢。整合還有登錄認(rèn)證的整合
六、嵌入在線視頻播放器、圖片thumb、代碼高亮
視頻、圖片、代碼也是重要的文檔形式,
視頻要支持本地視頻,也要支持遠(yuǎn)程視頻服務(wù)器提供的資源。
網(wǎng)上有一個項(xiàng)目plumi是基于Plone4.2的視頻分享項(xiàng)目
https://plumi.org https://github.com/plumi/plumi.app/這個項(xiàng)目使用的播放器是
https://pypi.org/project/collective.flowplayer/這個組件的主頁上顯示支持Plone4,不過嵌入在線視頻播放器應(yīng)該不難辦到。
七、pin所有組件版本
如果不pin所有組件版本,在將來buildout時可能會取來最新版本組件,也許就不支持Plone5了,為了防止這種情況,將所有組件版本號在buildout.cfg的[versions]固定。
總結(jié)
以上是生活随笔為你收集整理的小试debian-7.11.0-amd64+Plone5.1.2全文检索和预览中文WORD中文PDF的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 缓冲技术之三:Linux下I/O操作bu
- 下一篇: 温湿度传感器——DHT11学习总结