小试debian-7.11.0-amd64+Plone5.1.2全文检索和预览中文WORD中文PDF
先勸誡提醒一下,沒接觸過Plone的老鐵們最好就不要往下看了,哈哈。易度的老潘寫的兩篇文章講透了Zope/Plone有多糟心
(知乎上的)https://www.zhihu.com/question/19649024 (豆瓣上的)https://www.douban.com/group/topic/11400495/我自己用Zope/Plone的體會是,國內用的人少,中文資料極缺,學習的Zope/Plone知識基本上用不到其他項目中去,一點點小問題的解決都要大費周折,而且第三方組件往往不能隨核心版本升級,總之是Zope/Plone造成的問題比它解決的問題還多。
2007年的時候沒認對方向,一時糊涂,在windows下用Plone-2.1.3配合
老潘(http://old.zope.org/Members/panjunyong)的 CJKSplitter ZopeChinaPak ingeniweb(http://ingeniweb.sourceforge.net/)的 PloneExFile AttachmentField FileSystemStorage建了一個自己用的電子文檔管理系統,二進制文件不保存進ZODB而存入文件系統,FTP批量上傳,中文全文檢索和預覽Word2003和文本類PDF文件,目錄列表和搜索結果時截取前幾十百把個字符顯示在每條下方。
這些年主要升級了兩次一次升級到2.5.5,一次到3.3.5(ZODB3.8支持blob,用wc.pageturner加入了圖片類PDF預覽的功能),系統與其說是用Plone,不如說是用PloneExFile,由于PloneExFile項目停了(不支持Plone4了),再加上一直用的ODBCDA也只支持Plone3(Python2.4),就實在是不想繼續升級了。
系統一直自己用著還不錯,但是沒辦法給本部門其他同事共用,因為PloneExFile不支持office2007及以后版本,雖然Products.OpenXml可以加入office2007及以后版本的中文全文檢索,但是預覽功能和截取字符功能沒辦法實現了,wc.pageturner也有點小毛病(FTP批量上傳的PDF在轉換SWF時往往會導致ZODB崩潰,得用fsrecover.py才能修復,網頁上傳的卻正常)。
前一段時間在一個網頁上
https://stackoverflow.com/questions/12420334/how-to-use-wc-pageturner-in-plone-4-1看到wc.pageturner的作者vangheem回復了這樣一句話?
If you want a pdf viewer, use collective.documentviewer. I am no longer updating wc.pageturner-- collective.documentviewer is a much better viewer and implementation. – vangheem 2012-09-14?他在自己的網站上也提到It is recommended that you do not use this method anymore. Please use collective.documentviewer now which should cover all the use cases.
https://www.nathanvangheem.com/posts/2011/04/14/using-plone-as-a-document-repository.html?自己也想了解一下現在Plone發展到什么程度了,建文檔管理系統的方便程度如何,于是試著用最新的Plone版本來用一用collective.documentviewer。選擇debian是因為collective.documentviewer不支持windows,用debian-7.11.0是因為手頭正好下載了完整10張DVD,原型測試夠用就行,需要說明的是collective.documentviewer要用到docsplit,而docsplit又基于libreoffice或openoffice,安裝debian時一定要選擇安裝“桌面支持”和“開發支持”以及中文支持。好像還有一種libreoffice的headless進程服務,似乎不需要圖形界面,用端口提供轉換服務(Alfresco中就有用到),但是我沒有去試。debian-7.11.0網絡安裝如果只用debian-7.11.0-amd64的DVD通過FTP提供內網APT Repository服務,只用到DVD1,但是如果用DVD1安裝,會要求切換三張DVD,說明用DVD安裝的版本更全,我就發現DVD安裝的才有中文輸入法。
主要參考資料
http://documentcloud.github.io/docsplit https://www.documentcloud.org/opensource https://www.nathanvangheem.com/posts/2012/04/29/document-viewer-integration-in-plone.html https://www.dangtrinh.com/2013/07/plone-review-documents-in-plone-with.html http://tunmer.me/how-tos/installing-plone-on-ubuntu.html一、先安裝必備的支持組件,部分是安裝Plone需要的,部分是運行Plone與其組件需要的
apt-get -y --force-yes install build-essential apt-get -y --force-yes install gcc g++ sudo git apt-get -y --force-yes install libxml2 libxml2-dev libxslt1-dev apt-get -y --force-yes install zlibc zlib1g-dev libbz2-dev libssl-dev p7zip-full unzip apt-get -y --force-yes install unace unp bzip2 gzip patch apt-get -y --force-yes install python-dev libjpeg-dev apt-get -y --force-yes install libsqlite3-dev apt-get -y --force-yes install libreadline-dev apt-get -y --force-yes install rubygems apt-get -y --force-yes install graphicsmagick apt-get -y --force-yes install poppler-utils poppler-data apt-get -y --force-yes install ghostscript apt-get -y --force-yes install tesseract-ocr apt-get -y --force-yes install pdftk二、下載Plone5.1.2,解壓,基礎安裝
https://launchpad.net/plone/5.1/5.1.2/+download/Plone-5.1.2-UnifiedInstaller.tgz解壓、檢查安裝參數?
tar zxvf Plone-5.1.2-UnifiedInstaller.tgz cd Plone-5.1.2-UnifiedInstaller ./install.sh --helpdebian-7.11.0-amd64中自帶的是Python2.7.3(查了一下當前stretch:python 2.7.13-2;sid:python 2.7.14-8),不符合Plone5.1.2要求Python version must be 2.7.9+,必須指定--build-python
./install.sh --build-python --target=/opt/plone zeo安裝過程會下載Python-2.7.14.tgz到Plone-5.1.2-UnifiedInstaller/packages目錄中,用于編譯構建virtualenv環境(如果先前沒有apt-get install libreadline-dev,會看到如下提示,提示編譯出的不支持readline,安裝還是可以完成的)
Warning: This Python does not have readline support. It may still be usable for Zope, but interacting directly with Python will be painful.安裝出錯時可以查看安裝LOG,安裝成功時LOG也有“ chmod: 更改“***----***”的權限:不允許的操作 ”等字樣。
Plone-5.1.2-UnifiedInstaller/install.log等待漫長的安裝過程結束,網絡狀況的好壞決定了能否順利完成安裝及速度。安裝后半段執行了buildout,buildout運行時下載的文件(也包括部分安裝包自帶組件解壓出來的)存在以下目錄中
/opt/plone/buildout-cache/downloads/dist/安裝結束會提示管理用戶名和密碼,如果此時沒有記錄下來,還可以查看一個記錄的文件
cat /opt/plone/zeocluster/adminPassword.txt
admin jkMq3sadkxJm同時提示中還表明安裝時建立了一個用戶組plone_group,和兩個用戶,這個很重要,后面會用到
ZEO & Client Daemons :plone_daemon Code Resources & buildout :plone_buildout Setting /opt/plone ownership to plone_buildout:plone_group三、建立第一個站點
在debian上用root用戶啟動服務
cd /opt/plone/zeocluster bin/plonectl start在客戶端的瀏覽器上連接8080端口
http://10.16.97.205:8080選擇建立一個站點,需要用admin登錄,如果用的缺省的站點名字Plone,以后訪問站點的URL就是
http://10.16.97.205:8080/Plone四、rubygems安裝docsplit
collective.documentviewer 5.0.1依賴于docsplit,collective.documentviewer是DocumentCloud Projects項目的子項目中NY Times' Document Viewer的Plone綁定。docsplit也是這個項目的子項目,其轉換依賴LibreOffice,因此服務器debian安裝時一定要選擇安裝“桌面支持”和“開發支持”。docsplit主頁為:
https://rubygems.org/gems/docsplit/versions/0.7.6collective.documentviewer只支持Plone4和Plone5,并且不支持windows(docsplit好像不支持windows),參見以下網頁
https://stackoverflow.com/questions/14543419/can-collective-documentviewer-work-on-windows-2003-server-plone4-2?gem安裝docsplit
gem install docsplit --version=0.7.6如果網絡原因無法安裝,可以下載gem文件(下載鏈接:https://rubygems.org/downloads/docsplit-0.7.6.gem?)手工安裝:
gem install docsplit-0.7.6.gemdebian中已安裝的gem可以在以下目錄中找到原始gem文件
/var/lib/gems/1.8/cache/docsplit-0.7.6.gem五、下載對應tika1.11版本的tika.cfg
?tika是apache的一個java項目,是Apache Lucene的子項目,支持識別二進制文件的格式和編碼encoding,提取出文本內容(還有meta等格式信息),據說也支持windows,而ftw.tika是tika的Plone綁定,全文檢索就靠它了
最早看到的推薦ftw.tika的文章: https://stackoverflow.com/questions/23151319/plone-full-text-indexing-excel-files 項目網址: https://github.com/4teamwork/ftw.tika下載master分支的zip文件,目前是對應tika 1.11的版本,解壓文件,只將tika.cfg文件拷貝到/opt/plone/zeocluster/目錄中,與buildout.cfg在同一目錄中
六、?安裝Oracle官方的jdk1.8
tika 1.11最低要求java1.7。為了運行一個python項目,不僅安裝了ruby,還要安裝java,會不會被純Pythoner鄙視?
安裝過程略。
?java -version
java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)七、編輯buildout.cfg
編輯前先備份一下,這是必須養成的習慣,后文不再強調
cp /opt/plone/zeocluster/buildout.cfg /opt/plone/zeocluster/buildout.cfg.bk這部分參考以下網頁,這個網頁可能無法直接打開,但網頁的確存在,原因~!@#¥%……&*()——+
http://blog.abdullahsolutions.com/2016/08/installing-ftwtika-in-plone.html引用主要內容如下,加入了collective.documentviewer的內容
I love being able to search in all the documents uploaded into plone. I keep on forgetting that this was an add-on and not natively provided. The latest add-on I tried to enable that feature was ftw.tika. To install it, first download the tika.cfg file from their github page at https://github.com/4teamwork/ftw.tika. Once that has been downloaded, modify your buildout.cfg with: ############ [buildout] extends =... tika.cfgeggs =...ftw.tikacollective.documentviewerzcml =...ftw.tikaftw.tika-metaparts =...tika-server-downloadtika-server[client1] ... zcml-additional += ${tika:zcml} eggs += ftw.tika[client2] ... zcml-additional += ${tika:zcml} eggs += ftw.tika[versions] collective.documentviewer = 5.0.1############ Once that is done, run buildout. Then you can start the tika server with "bin/tika-server". Then you can start your plone instance. After that make sure you login and enable the tika add-on in your "site-setup", "add-ons" page.生成patch:diff -uN buildout.cfg?buildout.cfg.ok >buildout512.cfg.diff,以后重裝時在buildout.cfg同一目錄中patch -p0 <buildout512.cfg.diff
--- buildout.cfg 2018-08-11 17:16:24.934227831 +0800 +++ buildout.cfg.ok 2018-08-11 17:16:09.310229089 +0800 @@ -38,6 +38,7 @@extends =base.cfgversions.cfg + tika.cfg# http://dist.plone.org/release/5.1.2/versions.cfg# If you change your Plone version, you'll also need to update @@ -71,6 +72,8 @@eggs =PlonePillow + ftw.tika + collective.documentviewer############################################# ZCML Slugs @@ -79,7 +82,8 @@# use them. This is increasingly rare.zcml =# plone.reload - + ftw.tika + ftw.tika-meta############################################# Development Eggs# ---------------- @@ -149,7 +153,8 @@unifiedinstallerprecompilersetpermissions - + tika-server-download + tika-server############################################# Major Parts# ---------------------- @@ -167,12 +172,17 @@recipe = plone.recipe.zope2instancezeo-address = ${zeoserver:zeo-address}http-address = 8080 +ftp-address = 8021 +zcml-additional += ${tika:zcml} +eggs += ftw.tika[client2]<= client_baserecipe = plone.recipe.zope2instancezeo-address = ${zeoserver:zeo-address}http-address = 8081 +zcml-additional += ${tika:zcml} +eggs += ftw.tika############################################# Versions Specification @@ -197,3 +207,14 @@plone.recipe.unifiedinstaller = 4.3.2plone.recipe.command = 1.1plone.recipe.precompiler = 0.6 + +certifi = 2017.11.5 +chardet = 3.0.4 +collective.recipe.scriptgen = 0.2 +ftw.tika = 2.9.0 +hexagonit.recipe.download = 1.7.1 +idna = 2.6 +requests = 2.18.4 +urllib3 = 1.22 + +collective.documentviewer = 5.0.1八、開始buildout
開始buildout前務必先停掉服務
cd /opt/plone/zeocluster bin/plonectl stopPlone不允許用root用戶運行buildout,必須用普通用戶sudo為plone_buildout角色運行。
Buildout should not be run while superuser. Doing so allows untrusted code to be run as root. Instead, you probably wish to do something like:sudo -u plone_buildout bin/buildoutIf you have a good reason to bypass this restriction, remove the buildout.sanitycheck extension from your buildout.如果是新安裝的wheezy,可能還不允許普通用戶運行sudo,新建配置文件讓普通用戶可以運行sudo,假設普通用戶賬號為hero
joe /etc/sudoers.d/hero內容只有一行
hero ALL=(ALL:ALL) ALL正常的情況下應該先設置buildout使用pypi的鏡像甚至本地pypi庫,否則全世界的buildout都用官網,速度肯定快不了,只需要在base.cfg的[buildout]段加入一行?index=http://mirrors.163.com/pypi/simple/
joe /opt/plone/zeocluster/base.cfg
[buildout] ... ... index=http://mirrors.163.com/pypi/simple/如果想完全重裝,之前安裝用到Plone-5.1.2-UnifiedInstaller/packages/Python-2.7.14.tgz,修改好了的buildout.cfg文件,以及目錄/opt/plone/buildout-cache/downloads/dist/中的內容可以備份好,在相應時間節點拷貝回新安裝的相同目錄(要注意文件的擁有者和文件屬性,后文會提到),以節約時間。用以下命令開始buildout,?見證噩夢的時刻到了。。。。。。。。。。。。。。。。。
hero@mydebian205:/opt/plone/zeocluster$ sudo -u plone_builout bin/buildout -vvvbuildout會遇到各種狀況,buildout意外中斷,buildout停止反應既不下載也不編譯,buildout看似完成但服務無法啟動,服務啟動但用URL無法訪問站點,可訪問站點但組件沒出現在面板中或未生效等不可預測的情況都有可能遇到。
最常見的情況是下載不順問題,解決辦法是按Ctrl+C中斷進程,記下需要的文件和版本到https://pypi.org直接下載文件保存到以下目錄,并設置文件擁有者和文件屬性
ls /opt/plone/buildout-cache/downloads/dist/ chown -R plone_buildout:plone_group /opt/plone/buildout-cache/downloads/dist/ chmod -R 664 /opt/plone/buildout-cache/downloads/dist/然后重新開始buildout
hero@mydebian205:/opt/plone/zeocluster$ sudo -u plone_builout bin/buildout -vvv一遍又一遍,一遍又一遍,直到每次輸入buildout命令準備按回車前都雙手合十,求上蒼保佑,才算進入角色了。畢竟本文只增加了兩個組件ftw.tika和collective.documentviewer,坑不夠大,在瘋掉前還是有希望成功的。buildout順利完成后顯示的是picked的組件名和版本號。
為了加快速度,我修改了tika.cfg,將下載慢的兩個最大文件用win下迅雷下載并上傳到/opt/share目錄中,改http://為file:///
#url = http://repo1.maven.org/maven2/org/apache/tika/tika-app/1.11/tika-app-1.11.jar url = file:///opt/share/tika-app-1.11.jar#url = http://repo1.maven.org/maven2/org/apache/tika/tika-server/1.11/tika-server-1.11.jar url = file:///opt/share/tika-server-1.11.jar九、站點中安裝組件和配置組件
用root權限啟動服務,在客戶機瀏覽器中訪問站點,用admin登錄(站點不登錄是不可能被改動的,下文不再強調)
admin-網站設置-附加組件
| 可啟用附加組件 |
| Document Viewer Installs the collective.documentviewer package – (collective.documentviewer 5.0.1) 警告 此附加組件無法卸載! |
| ftw.tika Apache Tika integration for Plone – (ftw.tika 2.9.0) |
先只安裝ftw.tika,然后停掉服務
cd /opt/plone/zeocluster bin/plonectl stop分別在debian中開兩個終端,分別用root權限運行tika-server和Plone服務,其中tika-server終端將滾屏顯示(服務器上訪問http://localhost:9998可以查看到tika的一個界面),Plone服務的會回到命令提示符下
cd /opt/plone/zeocluster bin/tika-server cd /opt/plone/zeocluster bin/plonectl start添加新的條目-文件
上傳幾個中文文件名中文內容的doc,docx,pdf(文本類,非圖片類),測試全文搜索是否生效,注意是文件內容的全文檢索,文件名的檢索是Plone自帶的,不需要tika,實際上如果debian中安裝有wv(apt-get install wv),無tika組件的Plone也支持doc文件中文全文檢索,但docx的全文檢索是tika貢獻的功能(支持全文檢索的組件也不只tika一個,只是tika的前景應是最好的)。全文檢索只是定位到文件的位置,并沒有文件實際內容的預覽。
全文檢索功能正常后開始解決文檔預覽,Plone下文件無組件支持情況只能下載是無法預覽內容的。
admin-網站設置-附加組件
安裝Document Viewer
同tika不同這個組件自身還要配置
admin-網站設置--附加組件配置-文檔管理系統設置-按文件類型自動布局
只有PDF被選中,增加鉤選Word Document,保存
十、解決collective.documentviewer的BUG
上傳一個中文內容的docx文件,發現collective.documentviewer沒有生效,但是全文檢索是有效的,網頁中出現錯誤提示,但是點擊Show Document viewer Conversion Error鏈接無效
Info There was an error trying to convert the document. Maybe the document is encrypted, corrupt or malformed? Check log for details. 測試.docx Show Document Viewer Conversion Error?用文本編輯器查看/opt/plone/zeocluster/var/client1/event.log文件末尾的內容
Traceback (most recent call last):File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 598, in __call__pages = self.run_conversion()File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 428, in run_conversionreturn docsplit.convert(self.storage_dir, **args)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 324, in convertself.convert_to_pdf(path, filename, output_dir)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 280, in convert_to_pdfself._run_command(cmd)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 126, in _run_commandraise Exception(error) Exception: Command /usr/local/bin/docsplit pdf /tmp/tmpdfnKDQ/dump.docx --output /tmp/tmpdfnKDQ finished with return code 1 and output:terminate called after throwing an instance of 'com::sun::star::uno::RuntimeException' Aborted /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:33:in `libre_office?': undefined method `match' for nil:NilClass (NoMethodError)from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:128:in `extract'from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:120:in `each'from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb:120:in `extract'from /var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit.rb:65:in `extract_pdf'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/../lib/docsplit/command_line.rb:47:in `run'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/../lib/docsplit/command_line.rb:37:in `initialize'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/docsplit:5:in `new'from /var/lib/gems/1.8/gems/docsplit-0.7.6/bin/docsplit:5from /usr/local/bin/docsplit:23:in `load'from /usr/local/bin/docsplit:23------ 2018-08-10T01:42:44 INFO ftw.tika Converting document with tika JAXRS server: 測試.docx在網上搜索了一下有兩篇文章似乎提供了解決辦法
?一是
https://github.com/collective/collective.documentviewer/issues/11?建議修改docsplit的組件/var/lib/gems/1.8/gems/docsplit-0.7.6/lib/docsplit/pdf_extractor.rb
二是
https://pypi.org/project/collective.documentviewer/建議修改/tmp和/var/tmp的權限,增加粘滯位。
經實測都不解決問題。繼續分析出錯日志中,運行出錯命令是
/usr/local/bin/docsplit pdf /tmp/tmpdfnKDQ/dump.docx --output /tmp/tmpdfnKDQ查看一下相應目錄
# ls -l /tmp/tmpdfnKDQ -rw------- 1 plone_daemon plone_group 41139 8月 10 01:42 dump.docx?用root用戶執行出錯日志中的命令
# /usr/local/bin/docsplit pdf /tmp/tmpdfnKDQ/dump.docx --output /tmp/tmpdfnKDQ居然沒有報錯
用root用戶查看一下相應目錄
# ls -l /tmp/tmpdfnKDQ -rw------- 1 plone_daemon plone_group 41139 8月 10 01:42 dump.docx -rw-r--r-- 1 root root 143352 8月 10 01:51 dump.pdf drwxr-xr-x 3 root root 4096 8月 10 01:51 libreoffice發現轉換pdf文件已成功,用相應軟件打開這個pdf也正常。既然root用戶可以,而plone_daemon用戶不行那一定是權限問題,排除pdf_extractor.rb的問題,因為那篇文章解決的是不能識別LiberOffice的問題,癥狀應該是root用戶或任何用戶都運行出錯。增加粘滯位是從權限角度,但是很容易證明也不能解決問題。
我想既然root用戶運行可行,那就讓代碼調用docsplit時運行sudo docsplit,問題在于sudo時會要求輸入root密碼,只適用交互界面,代碼需要附加的解決辦法,后來發現sudoers可以配置成不需要root密碼。這當然會有一定的安全問題,本文只是原型測試,只能先把功能搞定,以后有時間再去找最優方案,新建配置文件
joe /etc/sudoers.d/plone_daemon只有一行,為什么要加入/bin/rm后文會解釋
plone_daemon ALL = NOPASSWD:/usr/local/bin/docsplit,/bin/rm?joe /opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py找到271行
def convert_to_pdf(self, filepath, filename, output_dir):# get ext from filenameext = os.path.splitext(os.path.normcase(filename))[1][1:]inputfilepath = os.path.join(output_dir, 'dump.%s' % ext)shutil.move(filepath, inputfilepath)orig_files = set(os.listdir(output_dir))cmd = [self.binary, 'pdf', inputfilepath,'--output', output_dir]self._run_command(cmd)?在self.binary前加入 '/usr/bin/sudo',
def convert_to_pdf(self, filepath, filename, output_dir):# get ext from filenameext = os.path.splitext(os.path.normcase(filename))[1][1:]inputfilepath = os.path.join(output_dir, 'dump.%s' % ext)shutil.move(filepath, inputfilepath)orig_files = set(os.listdir(output_dir))cmd = ['/usr/bin/sudo', self.binary, 'pdf', inputfilepath,'--output', output_dir]self._run_command(cmd)?重啟服務,繼續測試,仍然出錯,界面出錯信息沒有任何有用的信息,繼續分析LOG,用文本編輯器查看/opt/plone/zeocluster/var/client1/event.log文件末尾的內容,內容改變了,好兆頭。LOG顯示“---sudo docsplit pdf---"部分已完成了,出錯的是后續部分
------ 2018-08-10T22:02:54 INFO collective.documentviewer Running command /usr/bin/sudo /usr/local/bin/docsplit pdf /tmp/tmpBsifva/dump.docx --output /tmp/tmpBsifva ------ 2018-08-10T22:03:07 INFO collective.documentviewer Finished Running Command /usr/bin/sudo /usr/local/bin/docsplit pdf /tmp/tmpBsifva/dump.docx --output /tmp/tmpBsifva ------ 2018-08-10T22:03:07 ERROR collective.documentviewer Error converting PDF:Traceback (most recent call last):File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 598, in __call__pages = self.run_conversion()File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 428, in run_conversionreturn docsplit.convert(self.storage_dir, **args)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 324, in convertself.convert_to_pdf(path, filename, output_dir)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 289, in convert_to_pdfshutil.rmtree(libreOfficePath)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 261, in rmtreermtree(fullname, ignore_errors, onerror)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 253, in rmtreeonerror(os.listdir, path, sys.exc_info())File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 251, in rmtreenames = os.listdir(path) OSError: [Errno 13] Permission denied: '/tmp/tmpBsifva/libreoffice/3' Traceback (most recent call last):File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 598, in __call__pages = self.run_conversion()File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 428, in run_conversionreturn docsplit.convert(self.storage_dir, **args)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 324, in convertself.convert_to_pdf(path, filename, output_dir)File "/opt/plone/buildout-cache/eggs/collective.documentviewer-5.0.1-py2.7.egg/collective/documentviewer/convert.py", line 289, in convert_to_pdfshutil.rmtree(libreOfficePath)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 261, in rmtreermtree(fullname, ignore_errors, onerror)File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 253, in rmtreeonerror(os.listdir, path, sys.exc_info())File "/opt/plone/Python-2.7/lib/python2.7/shutil.py", line 251, in rmtreenames = os.listdir(path) OSError: [Errno 13] Permission denied: '/tmp/tmpBsifva/libreoffice/3' ------ 2018-08-10T22:03:07 INFO ftw.tika Converting document with tika JAXRS server: 測試.docx出錯的是shutil.rmtree(libreOfficePath),并且是permission denied錯誤,經過分析,/tmp/tmpBsifva臨時目錄中除了生成新的pdf文件外還有一個libreoffice目錄,由于docsplit是sudo為root權限建立的,plone_daemon用戶沒有權限刪除這個目錄導至出錯,解決辦法就是刪除目錄也用sudo調用的/bin/rm代替shutil.rmtree。/etc/sudoers中已經加好了plone_daemon用戶無需root密碼sudo運行/bin/rm,按相同原則將所有docsplit調用前都加上'/usr/bin/sudo',對應四個參數?"images","text","length",'pdf'所在行。再將shutil.rmtree(libreOfficePath)和shutil.rmtree(storage_dir)都改成系統調用
os.system('/usr/bin/sudo /bin/rm -fr %s' % (libreOfficePath,)) os.system('/usr/bin/sudo /bin/rm -fr %s' % (storage_dir,))?測試過程中還出現過finished with return code ......?and output:后的內容encodeing編碼出錯,找到106行
def _run_command(self, cmd):if isinstance(cmd, basestring):cmd = cmd.split()cmdformatted = ' '.join(cmd)logger.info("Running command %s" % cmdformatted)process = subprocess.Popen(cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, close_fds=self.close_fds)output, error = process.communicate()process.stdout.close()process.stderr.close()if process.returncode != 0:error = """Command %s finished with return code %i and output: %s %s""" % (cmdformatted, process.returncode, output, error)logger.info(error)raise Exception(error)logger.info("Finished Running Command %s" % cmdformatted)return output?顯示出錯信息時出錯,好奇葩,沒工夫去解決,搞定了還是不能解決最初的錯誤,把output, error和對應的兩個%s刪掉了事。
生成patch:diff -uN convert.py convert.py.ok >convert501.py.diff,以后重裝時在convert.py同一目錄中patch -p0 <convert501.py.diff
--- convert.py 2018-08-10 23:14:20.068147869 +0800 +++ convert.py.ok 2018-08-10 23:02:37.864147759 +0800 @@ -120,8 +120,7 @@finished with return code%iand output: -%s -%s""" % (cmdformatted, process.returncode, output, error) +""" % (cmdformatted, process.returncode,)logger.info(error)raise Exception(error)logger.info("Finished Running Command %s" % cmdformatted) @@ -224,7 +223,7 @@# docsplit images pdf.pdf --size 700x,300x,50x# --format gif --outputcmd = [ - self.binary, "images", filepath, + '/usr/bin/sudo', self.binary, "images", filepath,'--language', lang,'--size', ','.join([str(s[1]) + 'x' for s in sizes]),'--format', format, @@ -251,7 +250,7 @@output_dir = os.path.join(output_dir, TEXT_REL_PATHNAME)ocr = not ocr and 'no-' or ''cmd = [ - self.binary, "text", filepath, + '/usr/bin/sudo', self.binary, "text", filepath,'--language', lang,'--%socr' % ocr,'--pages', 'all', @@ -265,7 +264,7 @@self._run_command(cmd)def get_num_pages(self, filepath): - cmd = [self.binary, "length", filepath] + cmd = ['/usr/bin/sudo', self.binary, "length", filepath]return int(self._run_command(cmd).strip())def convert_to_pdf(self, filepath, filename, output_dir): @@ -275,7 +274,7 @@shutil.move(filepath, inputfilepath)orig_files = set(os.listdir(output_dir))cmd = [ - self.binary, 'pdf', inputfilepath, + '/usr/bin/sudo', self.binary, 'pdf', inputfilepath,'--output', output_dir]self._run_command(cmd)@@ -286,7 +285,9 @@# folder next to the generated PDF, removes it!libreOfficePath = os.path.join(output_dir, 'libreoffice')if os.path.exists(libreOfficePath): - shutil.rmtree(libreOfficePath) + os.system('/usr/bin/sudo /bin/rm -fr %s' % (libreOfficePath,)) + #shutil.rmtree(libreOfficePath) + pass# move the file to the right location nowfiles = set(os.listdir(output_dir)) @@ -481,7 +482,8 @@files[filename] = saveFileToBlob(filepath)settings.blob_files = files - shutil.rmtree(storage_dir) + os.system('/usr/bin/sudo /bin/rm -fr %s' % (storage_dir,)) + #shutil.rmtree(storage_dir)# check for old storage to remove... Just in case.old_storage_dir = os.path.join(gsettings.storage_location,重啟plone讓改動生效后上傳docx成功預覽,原型測試結束。
=================================
其他值得探索的功能及需求
一、異步支持
上傳一個文件時,轉換預覽很慢,collective.documentviewer同時支持plone.app.async和collective.celery進行異步轉換,上傳時可以迅速返回,實際轉換在后臺運行,還可以查看進度,但是網上的文章都是Plone4的,只找到一篇Plone5使用collective.celery的文章。
https://www.codesyntax.com/en/blog/collective-documentviewer-with-redis-backed-celery-tasks-on-plone-4-and-5文章中兩個指向https://gist.github.com的鏈接,不能直接打開,但網頁的確存在,原因~!@#¥%&*()+
二、中文目錄(路徑)中文文件名的ID
已上傳的文件ID變成ASCII碼和數字構成,下載時原文件名被破壞,作為一個文檔管理系統,文件名也是一種重要信息,最理想情況下可以用FTP將整個目錄和子目錄及文件上傳到Plone,Plone提供可識別的文檔全文檢索和預覽功能,不應該破壞原信息,必要時還要可以用FTP從Plone原封不動地下載回來。
三、搜索結果及目錄內容列表時的截取部分匹配上下文預覽
搜索引擎都有這樣的功能,搜索結果及目錄內容列表時每一條目下應有其部分文本內容的截取,搜索結果中最理想情況下應截取出現關鍵詞的前后文,同時關鍵詞突出顯示。這個功能對用戶快速定位自己需要的條目非常重要,用戶不必一個個點開全文預覽。
四、權限和工作流
禁止匿名用戶查看任何內容,對登錄用戶也有部分內容保密,不會在搜索結果中包含。
五、與apache或nginx的整合
新建Plone站點時有一個選項似乎與Plone直接將文件系統資源提供用戶訪問相關,但更通用的情況是與apache或nginx的整合
Static resource storage A folder for storing and serving static resource files一是超大文件應存在文件系統中,或者已經在文件系統中,整合apache或nginx性能上有優勢。整合還有登錄認證的整合
六、嵌入在線視頻播放器、圖片thumb、代碼高亮
視頻、圖片、代碼也是重要的文檔形式,
視頻要支持本地視頻,也要支持遠程視頻服務器提供的資源。
網上有一個項目plumi是基于Plone4.2的視頻分享項目
https://plumi.org https://github.com/plumi/plumi.app/這個項目使用的播放器是
https://pypi.org/project/collective.flowplayer/這個組件的主頁上顯示支持Plone4,不過嵌入在線視頻播放器應該不難辦到。
七、pin所有組件版本
如果不pin所有組件版本,在將來buildout時可能會取來最新版本組件,也許就不支持Plone5了,為了防止這種情況,將所有組件版本號在buildout.cfg的[versions]固定。
總結
以上是生活随笔為你收集整理的小试debian-7.11.0-amd64+Plone5.1.2全文检索和预览中文WORD中文PDF的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 缓冲技术之三:Linux下I/O操作bu
- 下一篇: 温湿度传感器——DHT11学习总结