nutch1.9和solr4.5集成 输出信息
生活随笔
收集整理的這篇文章主要介紹了
nutch1.9和solr4.5集成 输出信息
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1,通過sor查詢nutch抓取的結果
{"responseHeader": {"status": 0,"QTime": 2,"params": {"indent": "true","q": "title:幻想","_": "1418266706916","wt": "json"}},"response": {"numFound": 7,"start": 0,"docs": [{"content": "幻想江湖-2.2資料片,巔峰對決,震撼來襲! 跳轉官網 裝備凝練 巔峰擂臺 萬圣之夜 新版時裝","id": "http://hxjh.zqgame.com/","title": "幻想江湖-2.2資料片,巔峰對決,震撼來襲!","segment": "20141211104005","boost": 0,"digest": "c61521c1861b1a7574c8920fd27d0155","tstamp": "2014-12-11T02:40:14.477Z","url": "http://hxjh.zqgame.com/","anchor": ["幻想江湖","幻想江湖"],"_version_": 1487159323035435000},{"content": "幻想江湖-鬼靈精怪萬圣節(jié) 開啟時間 : 10 月 30 日 萬圣節(jié)禮包領取> 萬圣節(jié)前夕,為了避免惡靈干擾,大俠們紛紛掛起了南瓜燈,驅逐鬼怪。江湖有一傳聞,一群糖果商人行經龍脈嶺時,因為身上的糖果、餅干、寶石而找來鬼魂附身,如果幫助他們驅逐了附身邪靈,將會獲得他們道謝的禮物哦~! 1 萬圣節(jié)天天有禮 2 練級打寶兩不誤 3 節(jié)日消費獎勵翻倍 4 獎勵兌換驚喜不斷 5 洗煉折扣大放送 溫馨提示: 活動期間,大俠們請每天提著南瓜燈,穿上蝙蝠衫,去龍脈去收集糖果餅干,要不停地說:“trick or treat.”(意思是給不給,不給就搗蛋)。要是不肯給的話,就用各種方法去懲罰他,例如:一招一個怪,“唰唰唰————”把龍脈掛個三小時! 關閉 恭喜你獲得幻想江湖萬圣節(jié)禮包! IOS用戶領取: 安卓和越獄用戶領取: 有效時間: 即日-2014.11.30 兌換次數: 只限兌換一次 兌換范圍: 全服 禮包使用方法: 登錄游戲后,點擊游戲右上方【領獎】-【福利】-【禮包】后輸入正確禮品卡號領取禮包獎勵!","id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html","title": "幻想江湖-鬼靈精怪萬圣節(jié)","segment": "20141211104057","boost": 0,"digest": "5ae39251ad06017e4e1854aae9129126","tstamp": "2014-12-11T02:41:37.669Z","url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html","anchor": ["萬圣之夜"],"_version_": 1487159633802952700},{"content": "幻想江湖-優(yōu)雅轉身華麗時裝首曝 夜魔游龍 西式時裝 全新時裝新品上架啦,這批時裝看上去是不是和以前大有不同呢,此次大膽革新,看到下面的時裝,不禁令人想到后面可能真的會有結婚系統(tǒng)咯,新版本新?lián)Q裝,不走平凡路~我們就是這樣的與眾不同! 進入官網 返回活動首頁","id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html","title": "幻想江湖-優(yōu)雅轉身華麗時裝首曝","segment": "20141211104057","boost": 0,"digest": "e086540bf0f721f39560440c85d2161f","tstamp": "2014-12-11T02:41:47.879Z","url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html","anchor": ["新版時裝"],"_version_": 1487159633805049900},{"content": "《幻想江湖》官網-首部超萌動作武俠片!今天開始,做武俠片主人公 首頁 新聞中心 游戲資料 游戲論壇 分享到: 安卓下載 ios越獄下載 ios正版下載 禮包領取 1 2 3 4 幻想江湖絕尚發(fā)布會精彩視頻 最新 新聞 公告 活動 《幻想江湖》IOS18區(qū)“美人天下”12月10日火爆開啟 2014史上最萌武俠手游來襲!不用吃藥,放棄治療,12月10日上午11:00新區(qū)“美人天下”火爆開啟!快來沒日沒夜一起萌萌噠!... 查看詳情 > 2014-12-10 ? [新聞] 菜鳥進階強力黨 《幻想江湖》裝備屬性輕松堆 2014-12-10 ? [新聞] 全新資料片即將來襲《幻想江湖》四大活動任你玩 2014-12-09 ? [活動] 雙12 玩幻想送福利 2014-12-09 ? [新聞] 細節(jié)決定成敗 《幻想江湖》人物屬性全掌握 2014-12-09 ? [活動] 《幻想江湖》IOS18區(qū)”美人天下”十六大活動 2014-12-08 ? [新聞] 刀尖上的武俠 挑戰(zhàn)《幻想江湖》秦陵副本 2014-12-10 ? [新聞] 菜鳥進階強力黨 《幻想江湖》裝備屬性輕松堆 2014-12-10 ? [新聞] 全新資料片即將來襲《幻想江湖》四大活動任你玩 2014-12-09 ? [新聞] 細節(jié)決定成敗 《幻想江湖》人物屬性全掌握 2014-12-08 ? [新聞] 刀尖上的武俠 挑戰(zhàn)《幻想江湖》秦陵副本 2014-12-08 ? [新聞] 新版“姑姑”遭吐槽 《幻想江湖》還你女神夢 2014-12-05 ? [新聞] 《幻想江湖》我們結婚吧!——訂婚篇 2014-12-03 ? [公告] 幻想江湖-公測9~14區(qū) 數據互通公告 2014-12-01 ? [公告] 《幻想江湖》12月2日臨時維護公告 2014-11-26 ? [公告] 《幻想江湖》2.4版本更新 2014-11-25 ? [公告] 幻想江湖臨時維護公告 2014-11-25 ? [公告] 《幻想江湖》appstore1~8服數據互通完畢 2014-11-25 ? [公告] 幻想江湖-appstore數據互通延長公告 2014-12-09 ? [活動] 雙12 玩幻想送福利 2014-12-09 ? [活動] 《幻想江湖》IOS18區(qū)”美人天下”十六大活動 2014-12-08 ? [活動] 周末齊消費 歡樂享不停 2014-12-08 ? [活動] 《幻想江湖》美女主播齊聚樂———回顧 2014-12-08 ? [活動] 《幻想江湖》25區(qū)”獨步江湖”十六大活動 2014-12-05 ? [活動] 《幻想江湖》玩家體驗指南——做好產品,專注體驗 聯(lián)系人:施若熙 聯(lián)系QQ:744415486 手機:13510624817 郵箱:ruoxi.shi@zqgame.com 聯(lián)系人:方彥瓊 聯(lián)系QQ:611535985 手機:13603061895 郵箱:yanqiong.fang@zqgame.com 玩家群② 264103428 企業(yè)客服QQ:800056019 客服熱線:0755-86160520 特色玩法 玩家攻略 職業(yè)介紹 明教 唐門 天山 逍遙 18183 766 91手游網 合作媒體 ———————————————————— 微信公眾號 新浪微博 騰訊微博 掃描二維碼下載 快速注冊 通行證: 密 碼: 確認密碼: 驗證碼: 立即注冊 用戶名 恭喜你已經注冊成功! 關閉 恭喜您獲得幻想江湖公測新手禮包! 你的禮包卡號是: 禮包使用方法: 登陸游戲后,點擊游戲右上方【領獎】-【福利】-【禮包】后輸入8位的禮包卡號領取禮包獎勵!內容包含:止血丹*2、白色強化石*20、成長丹*5、易功丹*10、進階丹*5。 關閉 微信公眾號","id": "http://hxjh.zqgame.com/index.html","title": "《幻想江湖》官網-首部超萌動作武俠片!今天開始,做武俠片主人公","segment": "20141211104057","boost": 0,"digest": "3f9a2060e12f95316ee0201ce8a21da0","tstamp": "2014-12-11T02:41:01.462Z","url": "http://hxjh.zqgame.com/index.html","anchor": ["進入官網"],"_version_": 1487159633828118500},{"content": "【仙幻奇緣】官網 12.6首次開放公測!無商城,真正免費! 進入官網 論壇中心 游戲下載 購卡充值 1 2 3 4 5 媒體友鏈 通行證賬號: 通行證密碼: 確認密碼: 驗證碼: 同意 《中青寶》協(xié)議 恭喜你!注冊成功! 用戶名是: 客戶端 立即下載 獲取特權禮包 版權所有:深圳中青寶互動網絡股份有限公司 客服傳真:0755-86368269 中華人民共和國增值電信業(yè)務經營許可證:粵B2-20030216 粵ICP備:09057836 網絡文化經營許可證:文網文[2008]088號 中華人民共和國互聯(lián)網出版許可證:新出網證(粵)字017號 每個IP只能參加一次抽獎, 謝謝您的參與! ","id": "http://xh.zqgame.com/","title": "【仙幻奇緣】官網 12.6首次開放公測!無商城,真正免費!","segment": "20141211104057","boost": 0,"digest": "471def081683b7c5f94a39382e4c00a1","tstamp": "2014-12-11T02:41:02.165Z","url": "http://xh.zqgame.com/","anchor": ["仙幻奇緣","仙幻奇緣"],"_version_": 1487159634570510300},{"content": "《諸神世界》官方網站—3D魔幻戰(zhàn)爭網游 諸神世界 首頁 新聞動態(tài) 游戲資料 下載微端 快速充值 官方論壇 下載微端 快速充值 VIP介紹 領取新手卡 選擇大區(qū) 請選擇服務器 風暴荒漠 戰(zhàn)爭血徑 無盡沙海 燃燒平原 雙線1-16服 領取中,請稍候…… 您的禮包號為: 更多服務器 《諸神世界》是一款MMORPG的3D國戰(zhàn)網頁游戲,采用魔幻風格,3D旋轉俯瞰視角,以國家戰(zhàn)爭、團隊冒險等玩法為特色,以大范圍多維度強PVP玩法為核心的超激情游戲,體驗游戲國戰(zhàn)pk激情就來諸神世界。 0755-26635899 客服郵箱:kefu@zqgame.com 客服傳真:0755-86368269 游戲QQ群:219759659 259942575 用戶名: * 以字母開頭由大小寫字母、數字、下劃線組成,長度為4-32位 密碼: * 6-20字母、數字、符號組成,不含空格鍵、「\"」及「'」 確認密碼: * 請再一次輸入密碼 1 2 3 4 最新 新聞 活動 公告 攻略 諸神世界混服部分區(qū)服數據互通公告 公告 06-04 諸神世界混服部分區(qū)服數據互通公告 公告 05-23 5月29日12點諸神新區(qū)-風暴荒漠火爆開啟 新聞 05-14 5月15日12點諸神新區(qū)-亡魂峽谷火爆開啟 公告 04-18 諸神世界混服合服活動精彩上線 公告 04-18 諸神世界混服部分區(qū)服數據互通公告 【新聞】 05-14 5月15日12點諸神新區(qū)-亡魂峽谷火爆開啟 【新聞】 04-16 4月17日12點諸神新區(qū)-呼嘯沙漠火爆開啟 【新聞】 03-24 3月27日12點諸神新區(qū)-巨龍之吼火爆開啟 【新聞】 03-17 3月20日12點諸神新區(qū)-塵風峽谷火爆開啟 【新聞】 03-11 3月13日12點諸神新區(qū)-耳語海岸火爆開啟 【活動】 04-02 《諸神世界》十大開服活動 【活動】 02-13 《諸神世界》元宵&情人節(jié)活動 【活動】 01-26 《諸神世界》春節(jié)活動 【活動】 11-21 諸神世界周末限時活動火爆上線 【活動】 11-08 雙十一《諸神世界》勁爆大酬賓 【公告】 06-04 諸神世界混服部分區(qū)服數據互通公告 【公告】 05-23 5月29日12點諸神新區(qū)-風暴荒漠火爆開啟 【公告】 04-18 諸神世界混服合服活動精彩上線 【公告】 04-18 諸神世界混服部分區(qū)服數據互通公告 【公告】 03-25 3月28日平臺網絡升級公告 魔 牧 槍 炮 術 戰(zhàn) 魔 刃 狩獵靈魂 攻擊方式:近程魔法攻擊 核心屬性:智力 敏捷 職業(yè)特質:隱匿暗殺能力 職業(yè)說明:刀鋒舞者,狩獵著生者的靈魂。隱沒于黑暗,游走于光明。不被歷史描述,卻是歷史的主宰! 點擊查看詳情 牧 師 神的寵兒 攻擊方式:中程魔法攻擊 核心屬性:精神 智力 職業(yè)特質:恢復治愈能力 職業(yè)說明:神之使徒,捍衛(wèi)生者,拯救死者。信者永生,不信者也救贖。虔誠的信徒,是神的寵兒! 點擊查看詳情 槍 手 一擊必殺 攻擊方式:遠程物理攻擊 核心屬性:力量 精神 職業(yè)特質:傷害輸出 職業(yè)說明:獵命王者,半邊惡魔半邊天使。沉著冷靜,是他們的特質;一擊必殺,是他們的實力! 點擊查看詳情 魔 炮 焚天怒焰 攻擊方式:遠程魔法攻擊 核心屬性:智力 精神 職業(yè)特質:群體傷害 職業(yè)說明:焚天烈焰,吞噬罪孽與蒼生。沉穩(wěn)步伐,吼出戰(zhàn)歌嘹亮;怒放炮火,點亮生命奇跡! 點擊查看詳情 術 士 破碎虛空 攻擊方式:中程魔法攻擊 核心屬性:智力 精神 職業(yè)特質:戰(zhàn)斗節(jié)奏控制能力 職業(yè)說明:掌握法則,智慧象征。探索真理,識古通今,洞悉未來。以世間威能,抑惡揚善,改天逆命,破碎虛空! 點擊查看詳情 戰(zhàn) 士 金剛不壞 攻擊方式:近程物理攻擊 核心屬性:體質 力量 職業(yè)特質:生存能力 職業(yè)說明:移動城墻,金剛不壞。戰(zhàn),則掠地千里;守,則萬夫莫開。英勇的靈魂鑄造不滅傳奇! 點擊查看詳情 系統(tǒng)介紹 進階指導 特色系統(tǒng) 活動玩法 結婚系統(tǒng) | 職業(yè)介紹 | FAQ | VIP如何獲得 | 坐騎強化 | 轉職重修 | 戰(zhàn)友系統(tǒng) | 升級送祝福 | 日常任務 | 拍賣寄售 | 技能遺忘重生 | 道具商城 | 財產保護 煉金系統(tǒng) | 星耀石 | 裝備鑲嵌 | 裝備升階 | 裝備打孔 | 要塞守衛(wèi)站 | 神器合成 | 寵物潛力修改 | 寶石摘除 斗氣系統(tǒng) | 羽翼系統(tǒng) | 1V1模擬戰(zhàn) | 移民系統(tǒng) | 擊鼓傳花 | 情緣任務 | 神圣血脈 | 軍銜系統(tǒng) | 釣魚系統(tǒng) | 稱號系統(tǒng) | 封印進度 | 離線經驗 巴比倫塔 | 跨區(qū)國戰(zhàn) | 跨區(qū)巡游 | 跨區(qū)極速狂飆 | 跨區(qū)組隊爭奪戰(zhàn) | 超級血戰(zhàn)到底 | 血戰(zhàn)到底 | 小丑的夢境 | 王者試煉 | 探險者地宮 | 前線速遞 | 騎魂谷 | 冒險島 | 極速狂飆 | 毀滅神跡 | 國家正式戰(zhàn)爭 | 國家遠征 | 國家情報 | 國家BOSS | 藏寶峽谷 游戲壁紙 游戲截圖 玩家相冊 MORE 265G百科 073專區(qū) 新浪愛問 抵制不良游戲 拒絕盜版游戲 注意自我保護 謹防上當受騙 適度游戲益腦 沉迷游戲傷身 合理安排時間 享受健康生活 增值電信許可證:粵B2-20120680 網絡文化經營許可證: 粵網文[2014]0615-215號 粵ICP備09057836號 深圳市卓頁互動網絡科技有限公司 Copyright ? 2012-2014 All Rights Reserved 本游戲適合18歲以上用戶,不含暴力、恐怖、殘酷、色情等妨害未成年人身心健康的內容,屬于綠色健康產品 yy","id": "http://zs.ucjoy.com/","title": "《諸神世界》官方網站—3D魔幻戰(zhàn)爭網游","cache": "content","segment": "20141211104057","boost": 0,"digest": "8d00af8aaa03c2cf68a69dc68892b764","tstamp": "2014-12-11T02:41:18.686Z","url": "http://zs.ucjoy.com/","anchor": ["官網","諸神世界"],"_version_": 1487159634641813500},{"content": "《諸神世界》官方網站—3D魔幻戰(zhàn)爭網游 諸神世界 首頁 新聞動態(tài) 游戲資料 下載微端 快速充值 官方論壇 下載微端 快速充值 VIP介紹 領取新手卡 選擇大區(qū) 請選擇服務器 風暴荒漠 戰(zhàn)爭血徑 無盡沙海 燃燒平原 雙線1-16服 領取中,請稍候…… 您的禮包號為: 更多服務器 《諸神世界》是一款MMORPG的3D國戰(zhàn)網頁游戲,采用魔幻風格,3D旋轉俯瞰視角,以國家戰(zhàn)爭、團隊冒險等玩法為特色,以大范圍多維度強PVP玩法為核心的超激情游戲,體驗游戲國戰(zhàn)pk激情就來諸神世界。 0755-26635899 客服郵箱:kefu@zqgame.com 客服傳真:0755-86368269 游戲QQ群:219759659 259942575 用戶名: * 以字母開頭由大小寫字母、數字、下劃線組成,長度為4-32位 密碼: * 6-20字母、數字、符號組成,不含空格鍵、「\"」及「'」 確認密碼: * 請再一次輸入密碼 您所在的位置: 首頁 > 服務器列表 推薦服務器列表 風暴荒漠 火爆 戰(zhàn)爭血徑 火爆 我的服務器列表 你還未進入過游戲,請先登錄游戲! 所有服務器 1-10 11-20 諸神混服 雙線1-16服 火爆 風暴荒漠 火爆 戰(zhàn)爭血徑 火爆 無盡沙海 火爆 燃燒平原 火爆 抵制不良游戲 拒絕盜版游戲 注意自我保護 謹防上當受騙 適度游戲益腦 沉迷游戲傷身 合理安排時間 享受健康生活 增值電信許可證:粵B2-20120680 網絡文化經營許可證: 粵網文[2014]0615-215號 粵ICP備09057836號 深圳市卓頁互動網絡科技有限公司 Copyright ? 2012-2014 All Rights Reserved 本游戲適合18歲以上用戶,不含暴力、恐怖、殘酷、色情等妨害未成年人身心健康的內容,屬于綠色健康產品 yy","id": "http://zs.ucjoy.com/serverlist.app","title": "《諸神世界》官方網站—3D魔幻戰(zhàn)爭網游","cache": "content","segment": "20141211104057","boost": 0,"digest": "30a836aae5886924d1a87d3ab1ad42c8","tstamp": "2014-12-11T02:41:13.476Z","url": "http://zs.ucjoy.com/serverlist.app","anchor": ["進入新服","開始游戲"],"_version_": 1487159634643910700}]} }2,截圖展示solr展示的結果
bin/crawl urls ?crawl ?http://xx.xx.xx.xx:8983/solr ?5
3,nutch抓取時候日志:
<pre name="code" class="plain">2014-12-11 10:23:02,927 INFO crawl.Injector - Injector: starting at 2014-12-11 10:23:022289 2014-12-11 10:23:02,928 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb2290 2014-12-11 10:23:02,928 INFO crawl.Injector - Injector: urlDir: urls2291 2014-12-11 10:23:02,929 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.2292 2014-12-11 10:23:03,210 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2293 2014-12-11 10:23:03,266 WARN snappy.LoadSnappy - Snappy native library not loaded2294 2014-12-11 10:23:03,748 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default2295 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Total number of urls rejected by filters: 02296 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Total number of urls after normalization: 12297 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Merging injected urls into crawl db.2298 2014-12-11 10:23:04,779 INFO crawl.Injector - Injector: overwrite: false2299 2014-12-11 10:23:04,779 INFO crawl.Injector - Injector: update: false2300 2014-12-11 10:23:05,606 INFO crawl.Injector - Injector: URLs merged: 12301 2014-12-11 10:23:05,611 INFO crawl.Injector - Injector: Total new urls injected: 02302 2014-12-11 10:23:05,612 INFO crawl.Injector - Injector: finished at 2014-12-11 10:23:05, elapsed: 00:00:022303 2014-12-11 10:23:06,551 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2304 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: starting at 2014-12-11 10:23:062305 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.2306 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: filtering: false2307 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: normalizing: true2308 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: topN: 500002309 2014-12-11 10:23:07,201 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule2310 2014-12-11 10:23:07,202 INFO crawl.AbstractFetchSchedule - defaultInterval=25920002311 2014-12-11 10:23:07,202 INFO crawl.AbstractFetchSchedule - maxInterval=77760002312 2014-12-11 10:23:07,211 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default2313 2014-12-11 10:23:07,267 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule2314 2014-12-11 10:23:07,267 INFO crawl.AbstractFetchSchedule - defaultInterval=25920002315 2014-12-11 10:23:07,267 INFO crawl.AbstractFetchSchedule - maxInterval=77760002316 2014-12-11 10:23:07,272 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default2317 2014-12-11 10:23:07,875 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.2318 2014-12-11 10:23:08,875 INFO crawl.Generator - Generator: segment: crawl/segments/201412111023082319 2014-12-11 10:23:09,051 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default2320 2014-12-11 10:23:09,993 INFO crawl.Generator - Generator: finished at 2014-12-11 10:23:09, elapsed: 00:00:032321 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:23:102322 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111023082323 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher Timelimit set for : 14182753906812324 2014-12-11 10:23:10,956 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2325 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Using queue mode : byHost2326 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Fetcher: threads: 502327 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Fetcher: time-out divisor: 22328 2014-12-11 10:23:11,435 INFO fetcher.Fetcher - QueueFeeder finished: total 18 records + hit by time limit :02329 2014-12-11 10:23:11,585 INFO fetcher.Fetcher - Using queue mode : byHost2330 2014-12-11 10:23:11,586 INFO fetcher.Fetcher - Using queue mode : byHost2331 2014-12-11 10:23:11,586 INFO fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/5/001 (queue crawl delay=5000ms)2332 2014-12-11 10:23:11,587 INFO fetcher.Fetcher - Using queue mode : byHost2348 2014-12-11 10:23:11,597 INFO http.Http - http.proxy.host = null2349 2014-12-11 10:23:11,597 INFO http.Http - http.proxy.port = 80802350 2014-12-11 10:23:11,597 INFO http.Http - http.timeout = 100002351 2014-12-11 10:23:11,597 INFO http.Http - http.content.limit = 655362352 2014-12-11 10:23:11,597 INFO http.Http - http.agent = My Nutch Spider/Nutch-1.92353 2014-12-11 10:23:11,597 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.32354 2014-12-11 10:23:11,597 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.82355 2014-12-11 10:23:11,597 INFO fetcher.Fetcher - Using queue mode : byHost2387 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - Fetcher: throughput threshold: -12388 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - Fetcher: throughput threshold retries: 52389 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - fetcher.maxNum.threads can't be < than 50 : using 50 instead2390 2014-12-11 10:23:12,622 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12391 2014-12-11 10:23:13,622 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12392 2014-12-11 10:23:14,623 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12393 2014-12-11 10:23:15,623 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12394 2014-12-11 10:23:16,624 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12395 2014-12-11 10:23:16,891 INFO fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/3/3 (queue crawl delay=5000ms)2396 2014-12-11 10:23:17,624 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12397 2014-12-11 10:23:18,625 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12398 2014-12-11 10:23:19,625 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12399 2014-12-11 10:23:20,626 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12400 2014-12-11 10:23:21,626 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12401 2014-12-11 10:23:21,935 INFO fetcher.Fetcher - fetching http://v.zqgame.com/view/index (queue crawl delay=5000ms)2402 2014-12-11 10:23:22,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12403 2014-12-11 10:23:23,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12404 2014-12-11 10:23:24,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12405 2014-12-11 10:23:25,628 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=13158 2014-12-11 10:27:15,997 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3159 2014-12-11 10:27:15,997 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=03160 2014-12-11 10:27:16,004 INFO fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=03161 2014-12-11 10:27:16,005 INFO fetcher.Fetcher - -activeThreads=03162 2014-12-11 10:27:16,629 INFO fetcher.Fetcher - Fetcher: finished at 2014-12-11 10:27:16, elapsed: 00:00:073163 2014-12-11 10:27:17,320 INFO parse.ParseSegment - ParseSegment: starting at 2014-12-11 10:27:173164 2014-12-11 10:27:17,320 INFO parse.ParseSegment - ParseSegment: segment: crawl/segments/201412111027073165 2014-12-11 10:27:17,591 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3166 2014-12-11 10:27:18,518 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature3167 2014-12-11 10:27:18,528 INFO parse.ParseSegment - Parsed (12ms):http://v.zqgame.com/indexmain3168 2014-12-11 10:27:18,571 INFO parse.ParseSegment - Parsed (1ms):http://v.zqgame.com/moviePlay/goMoviePlay/4/43169 2014-12-11 10:27:18,659 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default3170 2014-12-11 10:27:18,871 INFO parse.ParseSegment - ParseSegment: finished at 2014-12-11 10:27:18, elapsed: 00:00:013171 2014-12-11 10:27:19,794 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3172 2014-12-11 10:27:19,810 INFO crawl.CrawlDb - CrawlDb update: starting at 2014-12-11 10:27:193173 2014-12-11 10:27:19,810 INFO crawl.CrawlDb - CrawlDb update: db: crawl/crawldb3174 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: segments: [crawl/segments/20141211102707]3175 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: additions allowed: true3176 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: URL normalizing: false3177 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: URL filtering: false3178 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: 404 purging: false3179 2014-12-11 10:27:19,812 INFO crawl.CrawlDb - CrawlDb update: Merging segment data into db.3180 2014-12-11 10:27:20,639 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3181 2014-12-11 10:27:20,639 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003182 2014-12-11 10:27:20,639 INFO crawl.AbstractFetchSchedule - maxInterval=77760003183 2014-12-11 10:27:21,120 INFO crawl.CrawlDb - CrawlDb update: finished at 2014-12-11 10:27:21, elapsed: 00:00:013184 2014-12-11 10:27:22,066 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3185 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: starting at 2014-12-11 10:27:223186 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb3187 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: URL normalize: true3188 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: URL filter: true3189 2014-12-11 10:27:22,068 INFO crawl.LinkDb - LinkDb: internal links will be ignored.3190 2014-12-11 10:27:22,068 INFO crawl.LinkDb - LinkDb: adding segment: crawl/segments/201412111027073191 2014-12-11 10:27:23,376 INFO crawl.LinkDb - LinkDb: merging with existing linkdb: crawl/linkdb3192 2014-12-11 10:27:23,688 INFO regex.RegexURLNormalizer - can't find rules for scope 'linkdb', using default3193 2014-12-11 10:27:24,510 INFO crawl.LinkDb - LinkDb: finished at 2014-12-11 10:27:24, elapsed: 00:00:023194 2014-12-11 10:27:25,209 INFO crawl.DeduplicationJob - DeduplicationJob: starting at 2014-12-11 10:27:253195 2014-12-11 10:27:25,483 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3196 2014-12-11 10:27:26,760 INFO crawl.DeduplicationJob - Deduplication: 2 documents marked as duplicates3197 2014-12-11 10:27:26,760 INFO crawl.DeduplicationJob - Deduplication: Updating status of duplicate urls into crawl db.3198 2014-12-11 10:27:27,931 INFO crawl.DeduplicationJob - Deduplication finished at 2014-12-11 10:27:27, elapsed: 00:00:023199 2014-12-11 10:27:28,623 INFO indexer.IndexingJob - Indexer: starting at 2014-12-11 10:27:283200 2014-12-11 10:27:28,711 INFO indexer.IndexingJob - Indexer: deleting gone documents: false3201 2014-12-11 10:27:28,711 INFO indexer.IndexingJob - Indexer: URL filtering: false3202 2014-12-11 10:27:28,718 INFO indexer.IndexingJob - Indexer: URL normalizing: false3203 2014-12-11 10:27:28,933 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3204 2014-12-11 10:27:28,933 INFO indexer.IndexingJob - Active IndexWriters :3205 SOLRIndexWriter3206 solr.server.url : URL of the SOLR instance (mandatory)3207 solr.commit.size : buffer size when sending to SOLR (default 1000)3208 solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)3209 solr.auth : use authentication (default false)3210 solr.auth.username : use authentication (default false)3211 solr.auth : username for authentication3212 solr.auth.password : password for authentication3213 3214 3215 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb3216 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb3217 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/201412111027073218 2014-12-11 10:27:29,087 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3219 2014-12-11 10:27:29,585 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off3220 2014-12-11 10:27:29,995 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3221 2014-12-11 10:27:30,022 INFO solr.SolrMappingReader - source: content dest: content3222 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: title dest: title3223 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: host dest: host3224 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: segment dest: segment3225 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: boost dest: boost3226 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: digest dest: digest3227 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3228 2014-12-11 10:27:30,054 INFO solr.SolrIndexWriter - Indexing 2 documents3229 2014-12-11 10:27:30,175 INFO solr.SolrIndexWriter - Indexing 2 documents2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: starting at 2014-12-11 10:39:343254 2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb3255 2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: urlDir: urls3256 2014-12-11 10:39:34,708 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.3257 2014-12-11 10:39:34,989 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3258 2014-12-11 10:39:35,046 WARN snappy.LoadSnappy - Snappy native library not loaded3259 2014-12-11 10:39:35,528 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default3260 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Total number of urls rejected by filters: 03261 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Total number of urls after normalization: 13262 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Merging injected urls into crawl db.3263 2014-12-11 10:39:36,577 INFO crawl.Injector - Injector: overwrite: false3264 2014-12-11 10:39:36,577 INFO crawl.Injector - Injector: update: false3265 2014-12-11 10:39:37,387 INFO crawl.Injector - Injector: URLs merged: 13266 2014-12-11 10:39:37,392 INFO crawl.Injector - Injector: Total new urls injected: 03267 2014-12-11 10:39:37,392 INFO crawl.Injector - Injector: finished at 2014-12-11 10:39:37, elapsed: 00:00:023268 2014-12-11 10:39:38,327 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3269 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: starting at 2014-12-11 10:39:383270 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.3271 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: filtering: false3272 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: normalizing: true3273 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: topN: 500003274 2014-12-11 10:39:38,978 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3275 2014-12-11 10:39:38,978 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003276 2014-12-11 10:39:38,978 INFO crawl.AbstractFetchSchedule - maxInterval=77760003277 2014-12-11 10:39:38,987 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3278 2014-12-11 10:39:39,040 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3279 2014-12-11 10:39:39,040 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003280 2014-12-11 10:39:39,040 INFO crawl.AbstractFetchSchedule - maxInterval=77760003281 2014-12-11 10:39:39,045 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default3282 2014-12-11 10:39:39,649 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.3283 2014-12-11 10:39:40,649 INFO crawl.Generator - Generator: segment: crawl/segments/201412111039403284 2014-12-11 10:39:40,814 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3285 2014-12-11 10:39:41,755 INFO crawl.Generator - Generator: finished at 2014-12-11 10:39:41, elapsed: 00:00:033286 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:39:423287 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111039403288 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher Timelimit set for : 14182763824473289 2014-12-11 10:39:42,720 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3290 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Using queue mode : byHost3291 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Fetcher: threads: 503292 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Fetcher: time-out divisor: 23293 2014-12-11 10:39:43,182 INFO fetcher.Fetcher - QueueFeeder finished: total 1 records + hit by time limit :03294 2014-12-11 10:39:43,336 INFO fetcher.Fetcher - Using queue mode : byHost3295 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Using queue mode : byHost3296 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - fetching http://passport.zqgame.com/common/agreement.jsp (queue crawl delay=5000ms)3297 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3298 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13299 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Using queue mode : byHost3300 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3301 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13302 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Using queue mode : byHost3303 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3304 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13305 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Using queue mode : byHost3306 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3307 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13308 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - Using queue mode : byHost3309 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3310 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13311 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Using queue mode : byHost3312 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3313 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13314 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Using queue mode : byHost3315 2014-12-11 10:39:43,341 INFO fetcher.Fetcher - Thread FetcherThread has no more work available2014-12-11 10:39:57,352 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb3511 2014-12-11 10:39:57,352 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb3512 2014-12-11 10:39:57,353 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/201412111039403513 2014-12-11 10:39:57,501 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3514 2014-12-11 10:39:57,970 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off3515 2014-12-11 10:39:58,376 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3516 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: content dest: content3517 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: title dest: title3518 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: host dest: host3519 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: segment dest: segment3520 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: boost dest: boost3521 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: digest dest: digest3522 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3523 2014-12-11 10:39:58,434 INFO solr.SolrIndexWriter - Indexing 1 documents3524 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: content dest: content3525 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: title dest: title3526 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: host dest: host3527 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: segment dest: segment3528 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: boost dest: boost3529 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: digest dest: digest3530 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3531 2014-12-11 10:40:00,130 INFO indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:033532 2014-12-11 10:40:00,830 INFO indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:003533 2014-12-11 10:40:01,101 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3534 2014-12-11 10:40:01,748 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3535 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: content dest: content3536 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: title dest: title3537 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: host dest: host3538 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: segment dest: segment3539 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: boost dest: boost3540 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: digest dest: digest3541 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3542 2014-12-11 10:40:01,963 INFO indexer.CleaningJob - CleaningJob: deleted a total of 10 documents3543 2014-12-11 10:40:01,967 WARN mapred.FileOutputCommitter - Output path is null in cleanup3544 2014-12-11 10:40:02,382 INFO indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:013545 2014-12-11 10:40:03,313 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable014-12-11 10:40:01,967 WARN mapred.FileOutputCommitter - Output path is null in cleanup3544 2014-12-11 10:40:02,382 INFO indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:013545 2014-12-11 10:40:03,313 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3546 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: starting at 2014-12-11 10:40:033547 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.3548 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: filtering: false3549 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: normalizing: true3550 2014-12-11 10:40:03,315 INFO crawl.Generator - Generator: topN: 500003551 2014-12-11 10:40:03,963 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3552 2014-12-11 10:40:03,964 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003553 2014-12-11 10:40:03,964 INFO crawl.AbstractFetchSchedule - maxInterval=77760003554 2014-12-11 10:40:03,972 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3555 2014-12-11 10:40:04,062 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3556 2014-12-11 10:40:04,062 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003557 2014-12-11 10:40:04,062 INFO crawl.AbstractFetchSchedule - maxInterval=77760003558 2014-12-11 10:40:04,067 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default3559 2014-12-11 10:40:04,635 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.3560 2014-12-11 10:40:05,636 INFO crawl.Generator - Generator: segment: crawl/segments/201412111040053561 2014-12-11 10:40:05,803 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3562 2014-12-11 10:40:06,747 INFO crawl.Generator - Generator: finished at 2014-12-11 10:40:06, elapsed: 00:00:033563 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:40:073564 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111040053565 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher Timelimit set for : 14182764074353566 2014-12-11 10:40:07,707 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3567 2014-12-11 10:40:08,157 INFO fetcher.Fetcher - Using queue mode : byHost3568 2014-12-11 10:40:08,158 INFO fetcher.Fetcher - Fetcher: threads: 503569 2014-12-11 10:40:08,158 INFO fetcher.Fetcher - Fetcher: time-out divisor: 23570 2014-12-11 10:40:08,187 INFO fetcher.Fetcher - QueueFeeder finished: total 40 records + hit by time limit :03571 2014-12-11 10:40:08,326 INFO fetcher.Fetcher - Using queue mode : byHost3572 2014-12-11 10:40:08,327 INFO fetcher.Fetcher - Using queue mode : byHost3573 2014-12-11 10:40:08,327 INFO fetcher.Fetcher - fetching http://hxjh.zqgame.com/ (queue crawl delay=5000ms)3574 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - fetching http://lt.zqgame.com/ (queue crawl delay=5000ms)3575 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - Using queue mode : byHost3576 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - fetching http://zscq.zqgame.com/ (queue crawl delay=5000ms)3577 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - Using queue mode : byHost3578 2014-12-11 10:40:08,329 INFO fetcher.Fetcher - fetching http://lj2.zqgame.com/ (queue crawl delay=5000ms)3523 2014-12-11 10:39:58,434 INFO solr.SolrIndexWriter - Indexing 1 documents3524 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: content dest: content3525 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: title dest: title3526 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: host dest: host3527 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: segment dest: segment3528 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: boost dest: boost3529 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: digest dest: digest3530 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3531 2014-12-11 10:40:00,130 INFO indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:033532 2014-12-11 10:40:00,830 INFO indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:003533 2014-12-11 10:40:01,101 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3534 2014-12-11 10:40:01,748 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3535 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: content dest: content 14550 2014-12-11 10:59:29,551 INFO fetcher.Fetcher - fetching http://pay.zqgame.com/pay/toPayPage/dxpc/107 (queue crawl delay=5000ms) 14551 2014-12-11 10:59:29,703 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=49, fetchQueues.getQueueCount=1總結
以上是生活随笔為你收集整理的nutch1.9和solr4.5集成 输出信息的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 69 Three.js 导入Collad
- 下一篇: 深圳核芯物联蓝牙aoa技术培训线上线下齐