java实现汉字字典
環(huán)境:eclipsse, jdk1.6, 沒(méi)有使用第三方的包,都是JDK有的。
注意,項(xiàng)目源文件我都使用的是UTF-8的編碼格式,如果不是,代碼里面的漢字注釋會(huì)顯示亂碼。
設(shè)置UTF-8:windows->Preferences->General->Workspace 頁(yè)面上Text file encoding,選擇Other UTF-8
項(xiàng)目結(jié)構(gòu):
1.字典文件
dic.txt 下載地址:http://download.csdn.net/detail/wssiqi/5056993
這里只摘錄一部分內(nèi)容,里面共收錄了20902個(gè)漢字
[plain]?view plaincopy
- 19968,一,一,1,1,GGLL,A,yi1,yī??
- 19969,丁,一,2,12,SGH,AI,ding1,dīng,zheng1,zhēng??
- 19970,丂,一,2,15,GNV,AZVV,kao3,kǎo,qiao3,qiǎo,yu2,yú??
- 19971,七,一,2,15,AGN,HD,qi1,qī??
- 19972,丄,一,2,21,HGD,IAVV,shang4,shàng??
- 19973,丅,一,2,12,GHK,AIAA,xia4,xià??
- 19974,丆,一,2,13,DGT,GDAA,han3,hǎn??
- 19975,萬(wàn),一,3,153,DNV,,wan4,wàn,mo4,mò??
- 19976,丈,一,3,134,DYI,AOS,zhang4,zhàng??
- 19977,三,一,3,111,DGGG,CD,san1,sān??
- 19978,上,一,3,211,HHGG,IDA,shang3,shǎng,shang4,shàng??
- 19979,下,一,3,124,GHI,AID,xia4,xià??
- 19980,丌,一,3,132,GJK,AND,ji1,jī,qi2,qí??
- 19981,不,一,4,1324,GII,GI,fou3,fǒu,bu4,bù??
- 19982,與,一,3,151,GNGD,AZA,yu4,yù,yu3,yǔ,yu2,yú??
- 19983,丏,一,4,1255,GHNN,AIZY,mian3,miǎn??
- 19984,丐,一,4,1215,GHNV,AIZ,gai4,gài??
- 19985,丑,一,4,5211,NFD,XED,chou3,chǒu??
- 19986,丒,一,4,5341,VYGF,YDSA,chou3,chǒu??
2.Dic.java
[java]?view plaincopy
- package?com.siqi.dict;??
- ??
- import?java.io.BufferedReader;??
- import?java.io.ByteArrayInputStream;??
- import?java.io.File;??
- import?java.io.FileInputStream;??
- import?java.io.InputStreamReader;??
- import?java.nio.charset.Charset;??
- ??
- /**?
- ?*?漢字本地字典。?<br/>?
- ?*?本地字典數(shù)據(jù)來(lái)自于<a?href=http://www.zdic.net/search/?c=2>漢典</a>?
- ?*?實(shí)現(xiàn)了一下常用的需求,例如返回拼音,五筆,拼音首字母,筆畫(huà)數(shù)目,筆畫(huà)順序。?
- ?*??
- ?*?@author?siqi?
- ?*??
- ?*/??
- public?class?Dic?{??
- ??
- ????/**?
- ?????*?設(shè)置是否輸出調(diào)試信息?
- ?????*/??
- ????private?static?boolean?DEBUG?=?true;??
- ??
- ????/**?
- ?????*?默認(rèn)編碼?
- ?????*/??
- ????public?static?final?Charset?DEFAULT_CHARSET?=?Charset.forName("UTF-8");??
- ??
- ????/**?
- ?????*?漢字Unicode最小編碼?
- ?????*/??
- ????public?static?final?int?CN_U16_CODE_MIN?=?0x4e00;??
- ??
- ????/**?
- ?????*?漢字Unicode最大編碼?
- ?????*/??
- ????public?static?final?int?CN_U16_CODE_MAX?=?0x9fa5;??
- ??
- ????/**?
- ?????*?本地字典文件名?
- ?????*/??
- ????public?static?final?String?DIC_FILENAME?=?"dic.txt";??
- ??
- ????/**?
- ?????*?字典數(shù)據(jù)?
- ?????*/??
- ????public?static?byte[]?bytes?=?new?byte[0];??
- ??????
- ????/**?
- ?????*?字典漢字?jǐn)?shù)目?
- ?????*/??
- ????public?static?int?count?=?0;??
- ??
- ????/**?
- ?????*?漢字unicode值在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_UNICODE?=?0;??
- ????/**?
- ?????*?漢字在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_CHARACTER?=?1;??
- ????/**?
- ?????*?漢字部首在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_BUSHOU?=?2;??
- ????/**?
- ?????*?漢字筆畫(huà)在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_BIHUA?=?3;??
- ????/**?
- ?????*?漢字筆畫(huà)順序在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_BISHUN?=?4;??
- ????/**?
- ?????*?漢字五筆在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_WUBI?=?5;??
- ????/**?
- ?????*?漢字鄭碼在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_ZHENGMA?=?6;??
- ????/**?
- ?????*?第一個(gè)漢字拼音(英文字母)在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_PINYIN_EN?=?7;??
- ????/**?
- ?????*?第一個(gè)漢字拼音(中文字母)在一條漢字信息的位置<br/>?
- ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
- ?????*/??
- ????public?static?int?INDEX_PINYIN_CN?=?8;??
- ??
- ????/**?
- ?????*?裝載字典?
- ?????*/??
- ????static?{??
- ????????long?time?=?System.currentTimeMillis();??
- ??????????
- ????????try?{??
- ????????????LoadDictionary();??
- ????????????count?=?count();??
- ????????????if?(DEBUG)?{??
- ????????????????System.out.println("成功載入字典"?+?new?File(DIC_FILENAME).getCanonicalPath()?+?"?,用時(shí):"??
- ????????????????????????+?(System.currentTimeMillis()?-?time)?+?"毫秒,載入字符數(shù)"+count);??
- ????????????}??
- ????????}?catch?(Exception?e)?{??
- ????????????try?{??
- ????????????????System.out.println("載入字典失敗"?+?new?File(DIC_FILENAME).getCanonicalPath()+"\r\n");??
- ????????????}?catch?(Exception?e1)?{??
- ????????????}??
- ????????????e.printStackTrace();??
- ????????}??
- ??
- ????}??
- ??
- ????/**?
- ?????*?獲取漢字unicode值?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@return?返回漢字的unicode值?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetUnicode(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_UNICODE);??
- ????}??
- ??
- ????/**?
- ?????*?獲取拼音(英文字母)?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????單個(gè)漢字字符?
- ?????*?@return?返回漢字的英文字母拼音。如?"大"->"da4"。?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetPinyinEn(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_PINYIN_EN);??
- ????}??
- ??
- ????/**?
- ?????*?返回漢字字符串的拼音(英文字母)?
- ?????*??
- ?????*?@param?str?
- ?????*????????????漢字字符串?
- ?????*?@return?返回漢字字符串的拼音。將字符串中的漢字替換成拼音,其他字符不變。拼音中間會(huì)有空格。?注意,對(duì)于多音字,返回的拼音可能不正確。?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetPinyinEn(String?str)?throws?Exception?{??
- ????????StringBuffer?sb?=?new?StringBuffer();??
- ????????for?(int?i?=?0;?i?<?str.length();?i++)?{??
- ????????????char?ch?=?str.charAt(i);??
- ????????????if?(isChineseChar(ch))?{??
- ????????????????sb.append(GetPinyinEn(ch)?+?"?");??
- ????????????}?else?{??
- ????????????????sb.append(ch);??
- ????????????}??
- ????????}??
- ??
- ????????return?sb.toString().trim();??
- ????}??
- ??
- ????/**?
- ?????*?獲取拼音(中文字母)?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????單個(gè)漢字字符?
- ?????*?@return?返回漢字的中文字母拼音。如?"打"->"dǎ"。?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetPinyinCn(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_PINYIN_CN);??
- ????}??
- ??
- ????/**?
- ?????*?返回漢字字符串的拼音(中文字母)?
- ?????*??
- ?????*?@param?str?
- ?????*????????????漢字字符串?
- ?????*?@return?返回漢字字符串的拼音。將字符串中的漢字替換成拼音,其他字符不變。拼音中間會(huì)有空格。?注意,對(duì)于多音字,返回的拼音可能不正確。?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetPinyinCn(String?str)?throws?Exception?{??
- ????????StringBuffer?sb?=?new?StringBuffer();??
- ????????for?(int?i?=?0;?i?<?str.length();?i++)?{??
- ????????????char?ch?=?str.charAt(i);??
- ????????????if?(isChineseChar(ch))?{??
- ????????????????sb.append(GetPinyinCn(ch)?+?"?");??
- ????????????}?else?{??
- ????????????????sb.append(ch);??
- ????????????}??
- ????????}??
- ??
- ????????return?sb.toString().trim();??
- ????}??
- ??
- ????/**?
- ?????*?返回拼音首字母?
- ?????*??
- ?????*?@param?ch?
- ?????*?@return?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetFirstLetter(Character?ch)?throws?Exception?{??
- ????????if?(isChineseChar(ch))?{??
- ????????????return?GetPinyinEn(ch).substring(0,?1);??
- ????????}?else?{??
- ????????????return?"";??
- ????????}??
- ????}??
- ??
- ????/**?
- ?????*?返回漢字字符串拼音首字母,如果不是漢字,會(huì)被忽略掉。?
- ?????*??
- ?????*?@param?str?
- ?????*????????????漢字字符串?
- ?????*?@return?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetFirstLetter(String?str)?throws?Exception?{??
- ????????StringBuffer?sb?=?new?StringBuffer();??
- ????????for?(int?i?=?0;?i?<?str.length();?i++)?{??
- ????????????char?ch?=?str.charAt(i);??
- ????????????if?(isChineseChar(ch))?{??
- ????????????????sb.append(GetFirstLetter(ch));??
- ????????????}??
- ????????}??
- ??
- ????????return?sb.toString().trim();??
- ????}??
- ??
- ????/**?
- ?????*?獲取漢字部首?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@return?返回漢字的部首?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetBushou(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_BUSHOU);??
- ????}??
- ??
- ????/**?
- ?????*?獲取漢字筆畫(huà)數(shù)目?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@return?返回漢字的筆畫(huà)數(shù)目?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetBihua(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_BIHUA);??
- ????}??
- ??
- ????/**?
- ?????*?獲取漢字筆畫(huà)順序?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@return?返回漢字的筆畫(huà)順序?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetBishun(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_BISHUN);??
- ????}??
- ??
- ????/**?
- ?????*?獲取漢字五筆?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@return?返回漢字五筆?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetWubi(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_WUBI);??
- ????}??
- ??
- ????/**?
- ?????*?獲取漢字鄭碼?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@return?返回漢字鄭碼?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetZhengma(Character?ch)?throws?Exception?{??
- ????????return?GetCharInfo(ch,?INDEX_ZHENGMA);??
- ????}??
- ??
- ????/**?
- ?????*?從字典中獲取漢字信息?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????要查詢(xún)的漢字?
- ?????*?@return?返回漢字信息,如"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?<br/>?
- ?????*?????????第一是漢字unicode值<br/>?
- ?????*?????????第二是漢字<br/>?
- ?????*?????????第三是漢字部首<br/>?
- ?????*?????????第四是漢字筆畫(huà)<br/>?
- ?????*?????????第五是漢字筆畫(huà)順序("12345"分別代表"橫豎撇捺折")<br/>?
- ?????*?????????第六是漢字五筆<br/>?
- ?????*?????????第七是漢字鄭碼<br/>?
- ?????*?????????第八及以后是漢字的拼音(英文字母拼音和中文字母拼音)<br/>?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?String?GetCharInfo(Character?ch)?throws?Exception?{??
- ????????if?(!isChineseChar(ch))?{??
- ????????????throw?new?Exception("'"?+?ch?+?"'?不是一個(gè)漢字!");??
- ????????}??
- ??
- ????????String?result?=?"";??
- ??
- ????????ByteArrayInputStream?bais?=?new?ByteArrayInputStream(bytes);??
- ????????BufferedReader?br?=?new?BufferedReader(new?InputStreamReader(bais));??
- ??
- ????????String?strWord;??
- ????????while?((strWord?=?br.readLine())?!=?null)?{??
- ????????????if?(strWord.startsWith(String.valueOf(ch.hashCode())))?{??
- ????????????????result?=?strWord;??
- ????????????????break;??
- ????????????}??
- ????????}??
- ????????br.close();??
- ????????bais.close();??
- ??
- ????????return?result;??
- ????}??
- ??
- ????/**?
- ?????*?返回漢字信息?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@param?index?
- ?????*????????????信息所在的Index?
- ?????*?@return?
- ?????*?@throws?Exception?
- ?????*/??
- ????private?static?String?GetCharInfo(Character?ch,?int?index)?throws?Exception?{??
- ????????if?(!isChineseChar(ch))?{??
- ????????????throw?new?Exception("'"?+?ch?+?"'?不是一個(gè)漢字!");??
- ????????}??
- ??
- ????????//?獲取漢字信息??
- ????????String?charInfo?=?GetCharInfo(ch);??
- ??
- ????????String?result?=?"";??
- ????????try?{??
- ????????????result?=?charInfo.split(",")[index];??
- ????????}?catch?(Exception?e)?{??
- ????????????throw?new?Exception("請(qǐng)查看字典中"?+?ch?+?"漢字記錄是否正確!");??
- ????????}??
- ??
- ????????return?result;??
- ????}??
- ??
- ????/**?
- ?????*?載入字典文件到內(nèi)存。?
- ?????*?@throws?Exception??
- ?????*/??
- ????private?static?void?LoadDictionary()?throws?Exception?{??
- ????????File?file?=?new?File(DIC_FILENAME);??
- ????????bytes?=?new?byte[(int)?file.length()];??
- ????????FileInputStream?fis?=?new?FileInputStream(file);??
- ????????fis.read(bytes,?0,?bytes.length);??
- ????????fis.close();??
- ????}??
- ??
- ????/**?
- ?????*?判斷字符是否為漢字,在測(cè)試的時(shí)候,我發(fā)現(xiàn)漢字的字符的hashcode值?跟漢字Unicode?
- ?????*?16的值一樣,所以可以用hashcode來(lái)判斷是否為漢字。?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字?
- ?????*?@return?是漢字返回true,否則返回false。?
- ?????*/??
- ????public?static?boolean?isChineseChar(Character?ch)?{??
- ????????if?(ch.hashCode()?>=?CN_U16_CODE_MIN??
- ????????????????&&?ch.hashCode()?<=?CN_U16_CODE_MAX)?{??
- ????????????return?true;??
- ????????}?else?{??
- ????????????return?false;??
- ????????}??
- ????}??
- ??
- ????/**?
- ?????*??
- ?????*?@return?返回字典包含的漢字?jǐn)?shù)目。?
- ?????*?@throws?Exception?
- ?????*/??
- ????private?static?int?count()?throws?Exception?{??
- ????????int?cnt?=?0;??
- ????????ByteArrayInputStream?bais?=?new?ByteArrayInputStream(bytes);??
- ????????BufferedReader?br?=?new?BufferedReader(new?InputStreamReader(bais));??
- ??
- ????????while?(br.readLine()?!=?null)?{??
- ????????????cnt++;??
- ????????}??
- ????????br.close();??
- ????????bais.close();??
- ??
- ????????return?cnt;??
- ????}??
- }??
3.Sample.java
如何使用字典
[java]?view plaincopy
- package?com.siqi.dict;??
- ??
- /**?
- ?*?包含兩個(gè)實(shí)例,示例如何獲取漢字的拼音等信息。?
- ?*?@author?siqi?
- ?*?
- ?*/??
- public?class?Sample?{??
- ??
- ????/**?
- ?????*?字典使用實(shí)例?
- ?????*??
- ?????*?@param?args?
- ?????*/??
- ????public?static?void?main(String[]?args)?{??
- ????????try?{??
- ????????????long?time?=?System.currentTimeMillis();??
- ??
- ????????????char?ch?=?'打';??
- ????????????//漢字單個(gè)字符??
- ????????????System.out.println("====打字信息開(kāi)始====");??
- ????????????System.out.println("首字母:"+Dic.GetFirstLetter(ch));??
- ????????????System.out.println("拼音(中):"+Dic.GetPinyinCn(ch));??
- ????????????System.out.println("拼音(英):"+Dic.GetPinyinEn(ch));??
- ????????????System.out.println("部首:"+Dic.GetBushou(ch));??
- ????????????System.out.println("筆畫(huà)數(shù)目:"+Dic.GetBihua(ch));??
- ????????????System.out.println("筆畫(huà):"+Dic.GetBishun(ch));??
- ????????????System.out.println("五筆:"+Dic.GetWubi(ch));??
- ????????????System.out.println("====打字信息結(jié)束====");??
- ??????????????
- ????????????//漢字字符串??
- ????????????System.out.println("\r\n====漢字字符串====");??
- ????????????System.out.println(Dic.GetPinyinEn("返回漢字字符串的拼音。"));??
- ????????????System.out.println(Dic.GetPinyinCn("返回漢字字符串的拼音。"));??
- ????????????System.out.println(Dic.GetFirstLetter("返回漢字字符串的拼音。"));??
- ????????????System.out.println("====漢字字符串====\r\n");??
- ??????????????
- ????????????System.out.println("用時(shí):"+(System.currentTimeMillis()-time)+"毫秒");??
- ??????????????
- ????????}?catch?(Exception?e)?{??
- ????????????e.printStackTrace();??
- ????????}??
- ??
- ????}??
- }??
4.結(jié)果
[html]?view plaincopy
- ====打字信息開(kāi)始====??
- 成功載入字典C:\workspaces\01_java\DictLocal\dic.txt?,用時(shí):15毫秒,載入字符數(shù)20902??
- 首字母:d??
- 拼音(中):dǎ??
- 拼音(英):da3??
- 部首:扌??
- 筆畫(huà)數(shù)目:5??
- 筆畫(huà):12112??
- 五筆:RSH??
- ====打字信息結(jié)束====??
- ??
- ====漢字字符串====??
- fan3?hui2?han4?zi4?zi4?fu2?chuan4?di2?pin1?yin1?。??
- fǎn?huí?hàn?zì?zì?fú?chuàn?dí?pīn?yīn?。??
- fhhzzfcdpy??
- ====漢字字符串====??
- ??
- Memory(Used/Total)?:?1539/15872?KB??
- 用時(shí):218毫秒??
待會(huì)再上傳如何獲取字典文件的,我是通過(guò)收集http://www.zdic.net/zd/的網(wǎng)頁(yè)來(lái)獲取的
=============補(bǔ)充,如何獲取漢字的信息================
=============所有的信息都是從漢典網(wǎng)站上獲取的=========
目錄結(jié)構(gòu)為:
環(huán)境:eclipsse, jdk1.6, 沒(méi)有使用第三方的包,都是JDK有的。
注意,項(xiàng)目源文件我都使用的是UTF-8的編碼格式,如果不是,代碼里面的漢字注釋會(huì)顯示亂碼。
設(shè)置UTF-8:windows->Preferences->General->Workspace 頁(yè)面上Text file encoding,選擇Other UTF-8
包說(shuō)明:
com.siqi.http
? ? Httpclient.Java是我寫(xiě)的一個(gè)簡(jiǎn)單的獲取網(wǎng)頁(yè)的類(lèi),用來(lái)獲取網(wǎng)頁(yè)內(nèi)容;
com.siqi.dict
? ? DictMain.java用來(lái)下載漢字網(wǎng)頁(yè),從中獲取漢字的拼音信息,并保存到data.dat中
? ? DownloadThread.java用來(lái)下載網(wǎng)頁(yè)(多線程)
com.siqi.pinyin
? ? PinYin.java在執(zhí)行過(guò)DictMain.java后,會(huì)生成一個(gè)data.dat,把這個(gè)文件拷貝到com.siqi.pinyin包下面,就可以調(diào)用PinYin.java里面的函數(shù)得到漢字的拼音了
? ? PinYinEle.java一個(gè)漢字->拼音->Unicode的模型
源碼:
Httpclient.java 可以用來(lái)獲取網(wǎng)頁(yè),可以的到網(wǎng)頁(yè)內(nèi)容,網(wǎng)頁(yè)編碼和網(wǎng)頁(yè)的header,簡(jiǎn)版
[java]?view plaincopy
- package?com.siqi.http;??
- ??
- import?java.io.IOException;??
- import?java.io.InputStream;??
- import?java.net.Socket;??
- import?java.net.URLEncoder;??
- import?java.util.regex.Matcher;??
- import?java.util.regex.Pattern;??
- ??
- /**?
- ?*?使用SOCKET實(shí)現(xiàn)簡(jiǎn)單的網(wǎng)頁(yè)GET和POST?
- ?*??
- ?*?@author?siqi?
- ?*??
- ?*/??
- public?class?Httpclient?{??
- ??
- ????/**?
- ?????*?processUrl?參數(shù)?HTTP?GET?
- ?????*/??
- ????public?static?final?int?METHOD_GET?=?0;??
- ????/**?
- ?????*?processUrl?參數(shù)?HTTP?POST?
- ?????*/??
- ????public?static?final?int?METHOD_POST?=?1;??
- ????/**?
- ?????*?HTTP?GET的報(bào)頭,簡(jiǎn)化版?
- ?????*/??
- ????public?static?final?String?HEADER_GET?=?"GET?%s?HTTP/1.0\r\nHOST:?%s\r\n\r\n";??
- ????/**?
- ?????*?HTTP?POST的報(bào)頭,簡(jiǎn)化版?
- ?????*/??
- ????public?static?final?String?HEADER_POST?=?"POST?%s?HTTP/1.0\r\nHOST:?%s\r\nContent-Length:?0\r\n\r\n";??
- ????/**?
- ?????*?網(wǎng)頁(yè)報(bào)頭和內(nèi)容的分割符?
- ?????*/??
- ????public?static?final?String?CONTENT_SEPARATOR?=?"\r\n\r\n";??
- ????/**?
- ?????*?網(wǎng)頁(yè)請(qǐng)求響應(yīng)內(nèi)容byte?
- ?????*/??
- ????private?byte[]?bytes?=?new?byte[0];??
- ????/**?
- ?????*?網(wǎng)頁(yè)報(bào)頭?
- ?????*/??
- ????private?String?header?=?"";??
- ????/**?
- ?????*?網(wǎng)頁(yè)內(nèi)容?
- ?????*/??
- ????private?String?content?=?"";??
- ??
- ????/**?
- ?????*?網(wǎng)頁(yè)編碼,默認(rèn)為UTF-8?
- ?????*/??
- ????public?static?final?String?CHARSET_DEFAULT?=?"UTF-8";??
- ????/**?
- ?????*?網(wǎng)頁(yè)編碼?
- ?????*/??
- ????private?String?charset?=?CHARSET_DEFAULT;??
- ??
- ????/**?
- ?????*?使用Httpclient的例子?
- ?????*??
- ?????*?@param?args?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?static?void?main(String[]?args)?throws?Exception?{??
- ????????Httpclient?httpclient?=?new?Httpclient();??
- ????????//?請(qǐng)求百度首頁(yè)(手機(jī)版)??
- ????????httpclient.processUrl("http://m.baidu.com/");??
- ????????System.out.println("獲取網(wǎng)頁(yè)http://m.baidu.com/");??
- ????????System.out.println("報(bào)頭為:\r\n"?+?httpclient.getHeader());??
- ????????System.out.println("內(nèi)容為:\r\n"?+?httpclient.getContent());??
- ????????System.out.println("編碼為:\r\n"?+?httpclient.getCharset());??
- ????????System.out.println("************************************");??
- ??
- ????????//?使用百度搜索"中國(guó)"(手機(jī)版)??
- ????????//?這是手機(jī)百度搜索框的源碼?<input?id="word"?type="text"?size="20"?maxlength="64"??
- ????????//?name="word">??
- ????????String?url?=?String.format("http://m.baidu.com/s?word=%s",??
- ????????????????URLEncoder.encode("中國(guó)",?CHARSET_DEFAULT));??
- ????????httpclient.processUrl(url,?METHOD_POST);??
- ????????System.out.println("獲取網(wǎng)頁(yè)http://m.baidu.com/s?word=中國(guó)");??
- ????????System.out.println("報(bào)頭為:\r\n"?+?httpclient.getHeader());??
- ????????System.out.println("內(nèi)容為:\r\n"?+?httpclient.getContent());??
- ????????System.out.println("編碼為:\r\n"?+?httpclient.getCharset());??
- ????}??
- ??
- ????/**?
- ?????*?初始化,設(shè)置所有變量為默認(rèn)值?
- ?????*/??
- ????private?void?init()?{??
- ????????this.bytes?=?new?byte[0];??
- ????????this.charset?=?CHARSET_DEFAULT;??
- ????????this.header?=?"";??
- ????????this.content?=?"";??
- ??
- ????}??
- ??
- ????/**?
- ?????*?獲取網(wǎng)頁(yè)報(bào)頭header?
- ?????*??
- ?????*?@return?
- ?????*/??
- ????public?String?getHeader()?{??
- ????????return?header;??
- ????}??
- ??
- ????/**?
- ?????*?獲取網(wǎng)頁(yè)內(nèi)容content?
- ?????*??
- ?????*?@return?
- ?????*/??
- ????public?String?getContent()?{??
- ????????return?content;??
- ????}??
- ??
- ????/**?
- ?????*?獲取網(wǎng)頁(yè)編碼?
- ?????*??
- ?????*?@return?
- ?????*/??
- ????public?String?getCharset()?{??
- ????????return?charset;??
- ????}??
- ??
- ????/**?
- ?????*?請(qǐng)求網(wǎng)頁(yè)內(nèi)容(使用HTTP?GET)?
- ?????*??
- ?????*?@param?url?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?void?processUrl(String?url)?throws?Exception?{??
- ????????processUrl(url,?METHOD_GET);??
- ????}??
- ??
- ????/**?
- ?????*?使用Socket請(qǐng)求(獲取)一個(gè)網(wǎng)頁(yè)。<br/>?
- ?????*?例如:<br/>?
- ?????*?processUrl("http://www.baidu.com/",?METHOD_GET)會(huì)獲取百度首頁(yè);<br/>?
- ?????*??
- ?????*?@param?url?
- ?????*????????????這個(gè)網(wǎng)頁(yè)或者網(wǎng)頁(yè)內(nèi)容的地址?
- ?????*?@param?method?
- ?????*????????????請(qǐng)求網(wǎng)頁(yè)的方法:?METHOD_GET或者M(jìn)ETHOD_POST?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?void?processUrl(String?url,?int?method)?throws?Exception?{??
- ??
- ????????init();??
- ??
- ????????//?url?=?"http://www.zdic.net/search/?c=2&q=%E5%A4%A7";??
- ????????//?規(guī)范化鏈接,當(dāng)網(wǎng)址為http://www.baidu.com時(shí),將網(wǎng)址變?yōu)?#xff1a;http://www.baidu.com/??
- ????????Matcher?mat?=?Pattern.compile("https?://[^/]+").matcher(url);??
- ????????if?(mat.find()?&&?mat.group().equals(url))?{??
- ????????????url?+=?"/";??
- ????????}??
- ??
- ????????Socket?socket?=?new?Socket(getHostUrl(url),?80);?//?設(shè)置要連接的服務(wù)器地址??
- ????????socket.setSoTimeout(3000);?//?設(shè)置超時(shí)時(shí)間為3秒??
- ??
- ????????String?request?=?null;??
- ????????//?構(gòu)造請(qǐng)求,詳情請(qǐng)參考HTTP協(xié)議(RFC2616)??
- ????????if?(method?==?METHOD_POST)?{??
- ????????????request?=?String.format(HEADER_POST,?getSubUrl(url),??
- ????????????????????getHostUrl(url));??
- ????????}?else?{??
- ????????????request?=?String??
- ????????????????????.format(HEADER_GET,?getSubUrl(url),?getHostUrl(url));??
- ????????}??
- ??
- ????????socket.getOutputStream().write(request.getBytes());//?發(fā)送請(qǐng)求??
- ??
- ????????this.bytes?=?InputStream2ByteArray(socket.getInputStream());//?讀取響應(yīng)??
- ??
- ????????//?獲取網(wǎng)頁(yè)編碼,我們只需要測(cè)試查找前4096個(gè)字節(jié),一般編碼信息都會(huì)在里面找到??
- ????????String?temp?=?new?String(this.bytes,?0,??
- ????????????????bytes.length?<?4096???bytes.length?:?4096);??
- ????????mat?=?Pattern.compile("(?<=<meta.{0,100}?charset=)[a-z-0-9]*",??
- ????????????????Pattern.CASE_INSENSITIVE).matcher(temp);??
- ????????if?(mat.find())?{??
- ????????????this.charset?=?mat.group();??
- ????????}?else?{??
- ????????????this.charset?=?CHARSET_DEFAULT;??
- ????????}??
- ??
- ????????//?用正確的編碼得到網(wǎng)頁(yè)報(bào)頭和內(nèi)容??
- ????????temp?=?new?String(this.bytes,?this.charset);??
- ????????int?headerEnd?=?temp.indexOf(CONTENT_SEPARATOR);??
- ????????this.header?=?temp.substring(0,?headerEnd);??
- ????????this.content?=?temp.substring(headerEnd?+?CONTENT_SEPARATOR.length(),??
- ????????????????temp.length());??
- ??
- ????????socket.close();?//?關(guān)閉socket??
- ????}??
- ??
- ????/**?
- ?????*?根據(jù)網(wǎng)址,獲取服務(wù)器地址<br/>?
- ?????*?例如:<br/>?
- ?????*?http://m.weathercn.com/common/province.jsp?
- ?????*?<p>?
- ?????*?返回:<br/>?
- ?????*?m.weathercn.com?
- ?????*??
- ?????*?@param?url?
- ?????*????????????網(wǎng)址?
- ?????*?@return?
- ?????*/??
- ????public?static?String?getHostUrl(String?url)?{??
- ????????String?host?=?"";??
- ????????Matcher?mat?=?Pattern.compile("(?<=https?://).+?(?=/)").matcher(url);??
- ????????if?(mat.find())?{??
- ????????????host?=?mat.group();??
- ????????}??
- ??
- ????????return?host;??
- ????}??
- ??
- ????/**?
- ?????*?根據(jù)網(wǎng)址,獲取網(wǎng)頁(yè)路徑?例如:<br/>?
- ?????*?http://m.weathercn.com/common/province.jsp?
- ?????*?<p>?
- ?????*?返回:<br/>?
- ?????*?/common/province.jsp?
- ?????*??
- ?????*?@param?url?
- ?????*?@return?如果沒(méi)有獲取到網(wǎng)頁(yè)路徑,返回"";?
- ?????*/??
- ????public?static?String?getSubUrl(String?url)?{??
- ????????String?subUrl?=?"";??
- ????????Matcher?mat?=?Pattern.compile("https?://.+?(?=/)").matcher(url);??
- ????????if?(mat.find())?{??
- ????????????subUrl?=?url.substring(mat.group().length());??
- ????????}??
- ??
- ????????return?subUrl;??
- ????}??
- ??
- ????/**?
- ?????*?將b1和b2兩個(gè)byte數(shù)組拼接成一個(gè),?結(jié)果=b1+b2?
- ?????*??
- ?????*?@param?b1?
- ?????*?@param?b2?
- ?????*?@return?
- ?????*/??
- ????public?static?byte[]?ByteArrayCat(byte[]?b1,?byte[]?b2)?{??
- ????????byte[]?b?=?new?byte[b1.length?+?b2.length];??
- ????????System.arraycopy(b1,?0,?b,?0,?b1.length);??
- ????????System.arraycopy(b2,?0,?b,?b1.length,?b2.length);??
- ????????return?b;??
- ????}??
- ??
- ????/**?
- ?????*?讀取輸入流并轉(zhuǎn)為byte數(shù)組,不返回字符串,?是因?yàn)檩斎肓鞯木幋a不確定,錯(cuò)誤的編碼會(huì)造成亂碼。?
- ?????*??
- ?????*?@param?is?
- ?????*????????????輸入流inputstream?
- ?????*?@return?字符串?
- ?????*?@throws?IOException?
- ?????*/??
- ????public?static?byte[]?InputStream2ByteArray(InputStream?is)??
- ????????????throws?IOException?{??
- ????????byte[]?b?=?new?byte[0];??
- ????????byte[]?bb?=?new?byte[4096];?//?緩沖區(qū)??
- ??
- ????????int?len?=?0;??
- ????????while?((len?=?is.read(bb))?!=?-1)?{??
- ????????????byte[]?newb?=?new?byte[b.length?+?len];??
- ????????????System.arraycopy(b,?0,?newb,?0,?b.length);??
- ????????????System.arraycopy(bb,?0,?newb,?b.length,?len);??
- ????????????b?=?newb;??
- ????????}??
- ??
- ????????return?b;??
- ????}??
- }??
DictMain.java
[java]?view plaincopy
- package?com.siqi.dict;??
- ??
- import?java.io.File;??
- import?java.io.FileReader;??
- import?java.io.FileWriter;??
- import?java.io.IOException;??
- import?java.util.regex.Matcher;??
- import?java.util.regex.Pattern;??
- ??
- /**?
- ?*?從漢典下載漢字網(wǎng)頁(yè),并提取拼音信息?
- ?*?@author?siqi?
- ?*?
- ?*/??
- public?class?DictMain?{??
- ????/**?
- ?????*?網(wǎng)頁(yè)保存路徑?
- ?????*/??
- ????public?static?final?String?SAVEPATH?=?"dict/pages/";??
- ????/**?
- ?????*?下載的漢字網(wǎng)頁(yè)名稱(chēng)?
- ?????*/??
- ????public?static?final?String?FILEPATH?=?SAVEPATH?+?"%s.html";??
- ??????
- ????/**?
- ?????*?字典數(shù)據(jù)文件名稱(chēng)?
- ?????*/??
- ????public?static?final?String?DATA_FILENAME?=?"data.txt";??
- ??????
- ????/**?
- ?????*?漢字unicode最小?
- ?????*/??
- ????public?static?final?int?UNICODE_MIN?=?0x4E00;??
- ??????
- ????/**?
- ?????*?漢字unicode最大?
- ?????*/??
- ????public?static?final?int?UNICODE_MAX?=?0x9FFF;??
- ??????
- ????/**?
- ?????*?準(zhǔn)備工作:?
- ?????*?1.從漢典網(wǎng)站下載所有漢字的頁(yè)面,注意,不要在eclipse中打開(kāi)保存頁(yè)面的文件夾,?
- ?????*?因?yàn)槊總€(gè)漢字一個(gè)頁(yè)面,總共有20000+個(gè)頁(yè)面,容易卡死eclipse?
- ?????*?2.從漢字頁(yè)面獲取漢字拼音信息,生成data.dat文件?
- ?????*?3.生成的data.dat復(fù)制到com.siqi.pinyin下面?
- ?????*?4.可以使用com.siqi.pinyin.PinYin.java了?
- ?????*/??
- ????static{??
- ????????//?下載網(wǎng)頁(yè)??
- ????????for?(int?i?=?UNICODE_MIN;?i?<=?UNICODE_MAX;?i++)?{??
- ????????????//?檢查是否已經(jīng)存在??
- ????????????String?filePath?=?String.format(FILEPATH,?i);?//?文件名??
- ????????????File?file?=?new?File(filePath);??
- ????????????if?(!file.exists())?{??
- ????????????????new?DownloadThread(i).start();??
- ????????????}??
- ????????}??
- ??????????
- ????????//解析網(wǎng)頁(yè),得到拼音信息,并保存到data.dat??
- ????????StringBuffer?sb?=?new?StringBuffer();??
- ????????for?(int?i?=?UNICODE_MIN;?i?<=?UNICODE_MAX;?i++)?{??
- ????????????String?word?=?new?String(Character.toChars(i));??
- ????????????String?pinyin?=?getPinYinFromWebpageFile(String.format(FILEPATH,?i));??
- ????????????String?str?=?String.format("%s,%s,%s\r\n",?i,word,pinyin);??
- ????????????System.out.print(str);??
- ????????????sb.append(str);??
- ????????}??
- ??????????
- ????????//保存到data.dat??
- ????????try?{??
- ????????????FileWriter?fw?=?new?FileWriter(DATA_FILENAME);??
- ????????????fw.write(sb.toString());??
- ????????????fw.close();??
- ????????}?catch?(IOException?e)?{??
- ????????????e.printStackTrace();??
- ????????}??
- ??????????
- ????}??
- ??????
- ????public?static?void?main(String[]?args){??
- ??????????
- ????????System.out.println("All?prepared!");??
- ????}??
- ??????
- ????/**?
- ?????*?從網(wǎng)頁(yè)文件獲取拼音信息?
- ?????*?@param?file?
- ?????*?@return?
- ?????*/??
- ????private?static?String?getPinYinFromWebpageFile(String?file)?{??
- ????????try?{??
- ??????????????
- ????????????char[]?buff?=?new?char[(int)?new?File(file).length()];??
- ??????????????
- ????????????FileReader?reader?=?new?FileReader(file);??
- ????????????reader.read(buff);??
- ????????????reader.close();??
- ??????????????
- ????????????String?content?=?new?String(buff);??
- ????????????//?spf("yi1")??
- ????????????Matcher?mat?=?Pattern.compile("(?<=spf\\(\")[a-z1-4]{0,100}",??
- ????????????????????Pattern.CASE_INSENSITIVE).matcher(content);??
- ????????????if?(mat.find())?{??
- ????????????????return?mat.group();??
- ????????????}??
- ????????????//<span?class="dicpy">cal</span>?spf("xin1")??
- ????????????mat?=?Pattern.compile("(?<=class=\"dicpy\">)[a-z1-4]{0,100}",??
- ????????????????????Pattern.CASE_INSENSITIVE).matcher(content);??
- ????????????if?(mat.find())?{??
- ????????????????return?mat.group();??
- ????????????}??
- ????????}?catch?(Exception?e)?{??
- ????????????e.printStackTrace();??
- ????????}??
- ??????????
- ????????return?"";??
- ??
- ????}??
- }??
DownloadThread.java
[java]?view plaincopy
- package?com.siqi.dict;??
- ??
- import?java.io.File;??
- import?java.io.FileWriter;??
- import?java.net.URLEncoder;??
- import?java.util.regex.Matcher;??
- import?java.util.regex.Pattern;??
- ??
- import?com.siqi.http.Httpclient;??
- ??
- /**?
- ?*?將漢字頁(yè)面從漢典網(wǎng)站下載下來(lái),存儲(chǔ)到本地?
- ?*?http://www.zdic.net/search/?c=2?
- ?*?@author?siqi?
- ?*?
- ?*/??
- public?class?DownloadThread?extends?Thread{??
- ??????
- ????/**?
- ?????*?線程最大數(shù)目?
- ?????*/??
- ????public?static?int?THREAD_MAX?=?10;??
- ??????
- ????/**?
- ?????*?下載最大重復(fù)次數(shù)?
- ?????*/??
- ????public?static?int?RETRY_MAX?=?5;??
- ??????
- ????/**?
- ?????*?漢典網(wǎng)站搜索網(wǎng)址?
- ?????*/??
- ????public?static?String?SEARCH_URL?=?"http://www.zdic.net/search/?q=%s";??
- ??????
- ????/**?
- ?????*?當(dāng)前線程數(shù)目?
- ?????*/??
- ????private?static?int?threadCnt?=?0;??
- ??????
- ????/**?
- ?????*?當(dāng)前線程處理漢字的unicode編碼?
- ?????*/??
- ????private?int?unicode?=?0;??
- ??????
- ????/**?
- ?????*?如果PATH文件夾不存在,那么創(chuàng)建它?
- ?????*/??
- ????static{??
- ????????try?{??
- ????????????File?file?=?new?File(DictMain.SAVEPATH);??
- ????????????if?(!file.exists())?{??
- ????????????????file.mkdirs();??
- ????????????}??
- ????????}?catch?(Exception?e)?{??
- ??
- ????????}??
- ????}??
- ??????
- ????/**?
- ?????*?返回當(dāng)前線程數(shù)量?
- ?????*?@param?i?修改當(dāng)前線程數(shù)量?threadCnt?+=?i;?
- ?????*?@return?返回修改后線程數(shù)量?
- ?????*/??
- ????public?static?synchronized?int?threadCnt(int?i){??
- ????????threadCnt?+=?i;??
- ????????return?threadCnt;??
- ????}??
- ??????
- ????/**?
- ?????*?下載UNICODE編碼為unicode的漢字網(wǎng)頁(yè)?
- ?????*?@param?unicode?
- ?????*/??
- ????public?DownloadThread(int?unicode){??
- ????????//等待,直到當(dāng)前線程數(shù)量小于THREAD_MAX??
- ????????while(threadCnt(0)>THREAD_MAX){??
- ????????????try?{??
- ????????????????Thread.sleep(500);??
- ????????????}?catch?(InterruptedException?e)?{??
- ????????????}??
- ????????}??
- ??????????
- ????????threadCnt(1);???//線程數(shù)量+1??
- ????????this.unicode?=?unicode;??
- ????}??
- ??
- ????@Override??
- ????public?void?run()?{??
- ????????long?t1?=?System.currentTimeMillis();?//?記錄時(shí)間??
- ??
- ????????String?filePath?=?String.format(DictMain.FILEPATH,?unicode);?//?文件名??
- ??
- ????????String?word?=?new?String(Character.toChars(unicode));?//?將unicode轉(zhuǎn)換為數(shù)字??
- ??
- ????????boolean?downloaded?=?false;??
- ????????int?retryCnt?=?0;?//?下載失敗重復(fù)次數(shù)??
- ????????while?(!downloaded?&&?retryCnt?<?RETRY_MAX)?{??
- ????????????try?{??
- ????????????????String?content?=?DownloadPage(word);??
- ????????????????SaveToFile(filePath,?content);??
- ????????????????downloaded?=?true;??
- ??
- ????????????????threadCnt(-1);??
- ????????????????System.out.println(String.format("%s,?%s,?下載成功!線程數(shù)目:%s?用時(shí):%s",??
- ????????????????????????unicode,?word,?threadCnt(0),?System.currentTimeMillis()??
- ????????????????????????????????-?t1));??
- ????????????????return;??
- ????????????}?catch?(Exception?e)?{??
- ????????????????retryCnt++;??
- ????????????}??
- ????????}??
- ??
- ????????threadCnt(-1);??
- ????????System.err.println(String.format("%s,?%s,?下載失敗!線程數(shù)目:%s?用時(shí):%s",?unicode,??
- ????????????????word,?threadCnt(0),?System.currentTimeMillis()?-?t1));??
- ????}??
- ??????
- ????/**?
- ?????*?在漢典網(wǎng)站上查找漢字,返回漢字字典頁(yè)面內(nèi)容?
- ?????*?@param?word?
- ?????*?@return?
- ?????*?@throws?Exception?
- ?????*/??
- ????public?String?DownloadPage(String?word)?throws?Exception{??
- ????????//查找word??
- ????????Httpclient?httpclient?=?new?Httpclient();??
- ????????String?url?=?String.format(SEARCH_URL,?URLEncoder.encode(word,?"UTF-8"));??
- ????????httpclient.processUrl(url,?Httpclient.METHOD_POST);??
- ??????????
- ????????//返回的是一個(gè)跳轉(zhuǎn)頁(yè)??
- ????????//獲取跳轉(zhuǎn)的鏈接??
- ????????Matcher?mat?=?Pattern.compile("(?<=HREF=\")[^\"]+").matcher(httpclient.getContent());??
- ????????if(mat.find()){??
- ????????????httpclient.processUrl(mat.group());??
- ????????}??
- ??????????
- ????????return?httpclient.getContent();??
- ????}??
- ??????
- ????/**?
- ?????*?將內(nèi)容content寫(xiě)入file文件?
- ?????*?@param?file?
- ?????*?@param?content?
- ?????*/??
- ????public?void?SaveToFile(String?file,?String?content){??
- ????????try?{??
- ????????????FileWriter?fw?=?new?FileWriter(file);??
- ????????????fw.write(content);??
- ????????????fw.close();??
- ????????}?catch?(Exception?e)?{??
- ????????????e.printStackTrace();??
- ????????}??
- ????}??
- }??
PinYin.java
[java]?view plaincopy
- package?com.siqi.pinyin;??
- ??
- import?java.io.BufferedReader;??
- import?java.io.InputStreamReader;??
- import?java.util.HashMap;??
- import?java.util.Map;??
- ??
- public?class?PinYin?{??
- ??
- ????private?static?Map<Integer,?PinYinEle>?map?=?new?HashMap<Integer,?PinYinEle>();??
- ??
- ????/**?
- ?????*?載入pinyin數(shù)據(jù)文件?
- ?????*/??
- ????static?{??
- ????????try?{??
- ????????????BufferedReader?bReader?=?new?BufferedReader(new?InputStreamReader(??
- ????????????????????PinYin.class.getResourceAsStream("data.dat")));??
- ????????????String?aLine?=?null;??
- ????????????while?((aLine?=?bReader.readLine())?!=?null)?{??
- ????????????????PinYinEle?ele?=?new?PinYinEle(aLine);??
- ????????????????map.put(ele.getUnicode(),?ele);??
- ????????????}??
- ????????????bReader.close();??
- ????????}?catch?(Exception?e)?{??
- ????????????e.printStackTrace();??
- ????????}??
- ????}??
- ??
- ????/**?
- ?????*?去掉注釋可以測(cè)試一下?
- ?????*??
- ?????*?@param?args?
- ?????*/??
- ????public?static?void?main(String[]?args)?{??
- ????????System.out.println(" 包含聲調(diào):"?+?PinYin.getPinYin("大家haome12345"));??
- ????????System.out.println("不包含聲調(diào):"?+?PinYin.getPinYin("大家haome12345",?false));??
- ????}??
- ??
- ????/**?
- ?????*?獲取漢字字符串的拼音,containsNumber是否獲取拼音中的聲調(diào)1、2、3、4?
- ?????*??
- ?????*?@param?str?
- ?????*?@param?containsNumber?
- ?????*????????????true?=?包含聲調(diào),false?=?不包含聲調(diào)?
- ?????*?@return?
- ?????*/??
- ????public?static?String?getPinYin(String?str,?boolean?containsNumber)?{??
- ????????StringBuffer?sb?=?new?StringBuffer();??
- ????????for?(Character?ch?:?str.toCharArray())?{??
- ????????????sb.append(getPinYin(ch,?containsNumber));??
- ????????}??
- ??
- ????????return?sb.toString();??
- ????}??
- ??
- ????/**?
- ?????*?獲取字符串的拼音?
- ?????*??
- ?????*?@param?str?
- ?????*?@return?
- ?????*/??
- ????public?static?String?getPinYin(String?str)?{??
- ????????StringBuffer?sb?=?new?StringBuffer();??
- ????????for?(Character?ch?:?str.toCharArray())?{??
- ????????????sb.append(getPinYin(ch));??
- ????????}??
- ??
- ????????return?sb.toString();??
- ????}??
- ??
- ????/**?
- ?????*?獲取單個(gè)漢字的拼音,包含聲調(diào)?
- ?????*??
- ?????*?@param?ch?
- ?????*?@return?
- ?????*/??
- ????public?static?String?getPinYin(Character?ch)?{??
- ????????return?getPinYin(ch,?true);??
- ????}??
- ??
- ????/**?
- ?????*?獲取單個(gè)漢字的拼音?
- ?????*??
- ?????*?@param?ch?
- ?????*????????????漢字.?如果輸入非漢字,返回ch.?如果輸入null,返回空字符串;?
- ?????*?@param?containsNumber?
- ?????*????????????true?=?包含聲調(diào),false?=?不包含聲調(diào)?
- ?????*?@return?
- ?????*/??
- ????public?static?String?getPinYin(Character?ch,?boolean?containsNumber)?{??
- ????????if?(ch?!=?null)?{??
- ????????????int?code?=?ch.hashCode();??
- ????????????if?(map.containsKey(code))?{??
- ????????????????if?(containsNumber)?{??
- ????????????????????return?map.get(code).getPinyin();??
- ????????????????}?else?{??
- ????????????????????return?map.get(code).getPinyin().replaceAll("[0-9]",?"");??
- ????????????????}??
- ????????????}?else?{??
- ????????????????return?ch.toString();??
- ????????????}??
- ????????}??
- ????????return?"";??
- ????}??
- }??
PinYinEle.java
[java]?view plaincopy
- package?com.siqi.pinyin;??
- ??
- public?class?PinYinEle?{??
- ????private?int?unicode;??
- ????private?String?ch;??
- ????private?String?pinyin;??
- ??????
- ????public?PinYinEle(){}??
- ??????
- ????public?PinYinEle(String?str){??
- ????????if(str!=null){??
- ????????????String[]?strs?=?str.split(",");??
- ????????????if(strs.length?==?3){??
- ????????????????try{??
- ????????????????this.unicode?=?Integer.parseInt(strs[0]);??
- ????????????????}catch(Exception?e){??
- ??????????????????????
- ????????????????}??
- ????????????????this.ch?=?strs[1];??
- ????????????????this.pinyin?=?strs[2];??
- ????????????}??
- ????????}??
- ??????????
- ????}??
- ??????
- ????public?int?getUnicode()?{??
- ????????return?unicode;??
- ????}??
- ????public?void?setUnicode(int?unicode)?{??
- ????????this.unicode?=?unicode;??
- ????}??
- ????public?String?getCh()?{??
- ????????return?ch;??
- ????}??
- ????public?void?setCh(String?ch)?{??
- ????????this.ch?=?ch;??
- ????}??
- ????public?String?getPinyin()?{??
- ????????return?pinyin;??
- ????}??
- ????public?void?setPinyin(String?pinyin)?{??
- ????????this.pinyin?=?pinyin;??
- ????}??
- ??????
- ??????
- }??
生成的data.dat里面內(nèi)容(部分)為:
[java]?view plaincopy
- 19968,一,yi1??
- 19969,丁,ding1??
- 19970,丂,kao3??
- 19971,七,qi1??
- 19972,丄,shang4??
- 19973,丅,xia4??
- 19974,丆,han3??
- 19975,萬(wàn),wan4??
- 19976,丈,zhang4??
- 19977,三,san1??
- 19978,上,shang4??
- 19979,下,xia4??
- 19980,丌,qi2??
- 19981,不,bu4??
運(yùn)行DictMain.java結(jié)果
執(zhí)行時(shí)間可能會(huì)有幾十分鐘到幾小時(shí)不等,總共會(huì)下載200+M的網(wǎng)頁(yè)(20000+個(gè)網(wǎng)頁(yè)),每次運(yùn)行都會(huì)先判斷以前下載過(guò)沒(méi)有,所以結(jié)束掉程序不會(huì)有影響
顯示All prepared!表示已經(jīng)準(zhǔn)備好了,刷新項(xiàng)目文件夾,可以看到網(wǎng)頁(yè)保持在dict/pages下面,不建議在elipse中打開(kāi)那個(gè)文件夾,因?yàn)槔锩嬗?萬(wàn)多個(gè)文件,會(huì)卡死eclipse,
還可以看到生成了data.txt文件,改為data.dat并復(fù)制到pinyin文件夾下面
運(yùn)行PinYin.java
可以看到"大家haome12345"的拼音:
[java]?view plaincopy
- 包含聲調(diào):da4jia1haome12345??
- 包含聲調(diào):dajiahaome12345??
上面只是顯示了如何獲取拼音,獲取筆畫(huà)等的方法類(lèi)似,在這里就不演示了。
總結(jié)
以上是生活随笔為你收集整理的java实现汉字字典的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: java 调用scala 类_如何使用j
- 下一篇: 来自魅友的肯定!领克08预售两小时近百魅