日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 综合教程 >内容正文

综合教程

java实现汉字字典

發(fā)布時(shí)間:2023/12/2 综合教程 27 生活家
生活随笔 收集整理的這篇文章主要介紹了 java实现汉字字典 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

環(huán)境:eclipsse, jdk1.6, 沒(méi)有使用第三方的包,都是JDK有的。

注意,項(xiàng)目源文件我都使用的是UTF-8的編碼格式,如果不是,代碼里面的漢字注釋會(huì)顯示亂碼。

設(shè)置UTF-8:windows->Preferences->General->Workspace 頁(yè)面上Text file encoding,選擇Other UTF-8

項(xiàng)目結(jié)構(gòu):

1.字典文件

dic.txt 下載地址:http://download.csdn.net/detail/wssiqi/5056993

這里只摘錄一部分內(nèi)容,里面共收錄了20902個(gè)漢字

[plain]?view plaincopy

  1. 19968,一,一,1,1,GGLL,A,yi1,yī??
  2. 19969,丁,一,2,12,SGH,AI,ding1,dīng,zheng1,zhēng??
  3. 19970,丂,一,2,15,GNV,AZVV,kao3,kǎo,qiao3,qiǎo,yu2,yú??
  4. 19971,七,一,2,15,AGN,HD,qi1,qī??
  5. 19972,丄,一,2,21,HGD,IAVV,shang4,shàng??
  6. 19973,丅,一,2,12,GHK,AIAA,xia4,xià??
  7. 19974,丆,一,2,13,DGT,GDAA,han3,hǎn??
  8. 19975,萬(wàn),一,3,153,DNV,,wan4,wàn,mo4,mò??
  9. 19976,丈,一,3,134,DYI,AOS,zhang4,zhàng??
  10. 19977,三,一,3,111,DGGG,CD,san1,sān??
  11. 19978,上,一,3,211,HHGG,IDA,shang3,shǎng,shang4,shàng??
  12. 19979,下,一,3,124,GHI,AID,xia4,xià??
  13. 19980,丌,一,3,132,GJK,AND,ji1,jī,qi2,qí??
  14. 19981,不,一,4,1324,GII,GI,fou3,fǒu,bu4,bù??
  15. 19982,與,一,3,151,GNGD,AZA,yu4,yù,yu3,yǔ,yu2,yú??
  16. 19983,丏,一,4,1255,GHNN,AIZY,mian3,miǎn??
  17. 19984,丐,一,4,1215,GHNV,AIZ,gai4,gài??
  18. 19985,丑,一,4,5211,NFD,XED,chou3,chǒu??
  19. 19986,丒,一,4,5341,VYGF,YDSA,chou3,chǒu??

2.Dic.java

[java]?view plaincopy

  1. package?com.siqi.dict;??
  2. ??
  3. import?java.io.BufferedReader;??
  4. import?java.io.ByteArrayInputStream;??
  5. import?java.io.File;??
  6. import?java.io.FileInputStream;??
  7. import?java.io.InputStreamReader;??
  8. import?java.nio.charset.Charset;??
  9. ??
  10. /**?
  11. ?*?漢字本地字典。?<br/>?
  12. ?*?本地字典數(shù)據(jù)來(lái)自于<a?href=http://www.zdic.net/search/?c=2>漢典</a>?
  13. ?*?實(shí)現(xiàn)了一下常用的需求,例如返回拼音,五筆,拼音首字母,筆畫(huà)數(shù)目,筆畫(huà)順序。?
  14. ?*??
  15. ?*?@author?siqi?
  16. ?*??
  17. ?*/??
  18. public?class?Dic?{??
  19. ??
  20. ????/**?
  21. ?????*?設(shè)置是否輸出調(diào)試信息?
  22. ?????*/??
  23. ????private?static?boolean?DEBUG?=?true;??
  24. ??
  25. ????/**?
  26. ?????*?默認(rèn)編碼?
  27. ?????*/??
  28. ????public?static?final?Charset?DEFAULT_CHARSET?=?Charset.forName("UTF-8");??
  29. ??
  30. ????/**?
  31. ?????*?漢字Unicode最小編碼?
  32. ?????*/??
  33. ????public?static?final?int?CN_U16_CODE_MIN?=?0x4e00;??
  34. ??
  35. ????/**?
  36. ?????*?漢字Unicode最大編碼?
  37. ?????*/??
  38. ????public?static?final?int?CN_U16_CODE_MAX?=?0x9fa5;??
  39. ??
  40. ????/**?
  41. ?????*?本地字典文件名?
  42. ?????*/??
  43. ????public?static?final?String?DIC_FILENAME?=?"dic.txt";??
  44. ??
  45. ????/**?
  46. ?????*?字典數(shù)據(jù)?
  47. ?????*/??
  48. ????public?static?byte[]?bytes?=?new?byte[0];??
  49. ??????
  50. ????/**?
  51. ?????*?字典漢字?jǐn)?shù)目?
  52. ?????*/??
  53. ????public?static?int?count?=?0;??
  54. ??
  55. ????/**?
  56. ?????*?漢字unicode值在一條漢字信息的位置<br/>?
  57. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  58. ?????*/??
  59. ????public?static?int?INDEX_UNICODE?=?0;??
  60. ????/**?
  61. ?????*?漢字在一條漢字信息的位置<br/>?
  62. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  63. ?????*/??
  64. ????public?static?int?INDEX_CHARACTER?=?1;??
  65. ????/**?
  66. ?????*?漢字部首在一條漢字信息的位置<br/>?
  67. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  68. ?????*/??
  69. ????public?static?int?INDEX_BUSHOU?=?2;??
  70. ????/**?
  71. ?????*?漢字筆畫(huà)在一條漢字信息的位置<br/>?
  72. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  73. ?????*/??
  74. ????public?static?int?INDEX_BIHUA?=?3;??
  75. ????/**?
  76. ?????*?漢字筆畫(huà)順序在一條漢字信息的位置<br/>?
  77. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  78. ?????*/??
  79. ????public?static?int?INDEX_BISHUN?=?4;??
  80. ????/**?
  81. ?????*?漢字五筆在一條漢字信息的位置<br/>?
  82. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  83. ?????*/??
  84. ????public?static?int?INDEX_WUBI?=?5;??
  85. ????/**?
  86. ?????*?漢字鄭碼在一條漢字信息的位置<br/>?
  87. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  88. ?????*/??
  89. ????public?static?int?INDEX_ZHENGMA?=?6;??
  90. ????/**?
  91. ?????*?第一個(gè)漢字拼音(英文字母)在一條漢字信息的位置<br/>?
  92. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  93. ?????*/??
  94. ????public?static?int?INDEX_PINYIN_EN?=?7;??
  95. ????/**?
  96. ?????*?第一個(gè)漢字拼音(中文字母)在一條漢字信息的位置<br/>?
  97. ?????*?漢字信息,例:"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?
  98. ?????*/??
  99. ????public?static?int?INDEX_PINYIN_CN?=?8;??
  100. ??
  101. ????/**?
  102. ?????*?裝載字典?
  103. ?????*/??
  104. ????static?{??
  105. ????????long?time?=?System.currentTimeMillis();??
  106. ??????????
  107. ????????try?{??
  108. ????????????LoadDictionary();??
  109. ????????????count?=?count();??
  110. ????????????if?(DEBUG)?{??
  111. ????????????????System.out.println("成功載入字典"?+?new?File(DIC_FILENAME).getCanonicalPath()?+?"?,用時(shí):"??
  112. ????????????????????????+?(System.currentTimeMillis()?-?time)?+?"毫秒,載入字符數(shù)"+count);??
  113. ????????????}??
  114. ????????}?catch?(Exception?e)?{??
  115. ????????????try?{??
  116. ????????????????System.out.println("載入字典失敗"?+?new?File(DIC_FILENAME).getCanonicalPath()+"\r\n");??
  117. ????????????}?catch?(Exception?e1)?{??
  118. ????????????}??
  119. ????????????e.printStackTrace();??
  120. ????????}??
  121. ??
  122. ????}??
  123. ??
  124. ????/**?
  125. ?????*?獲取漢字unicode值?
  126. ?????*??
  127. ?????*?@param?ch?
  128. ?????*????????????漢字?
  129. ?????*?@return?返回漢字的unicode值?
  130. ?????*?@throws?Exception?
  131. ?????*/??
  132. ????public?static?String?GetUnicode(Character?ch)?throws?Exception?{??
  133. ????????return?GetCharInfo(ch,?INDEX_UNICODE);??
  134. ????}??
  135. ??
  136. ????/**?
  137. ?????*?獲取拼音(英文字母)?
  138. ?????*??
  139. ?????*?@param?ch?
  140. ?????*????????????單個(gè)漢字字符?
  141. ?????*?@return?返回漢字的英文字母拼音。如?"大"->"da4"。?
  142. ?????*?@throws?Exception?
  143. ?????*/??
  144. ????public?static?String?GetPinyinEn(Character?ch)?throws?Exception?{??
  145. ????????return?GetCharInfo(ch,?INDEX_PINYIN_EN);??
  146. ????}??
  147. ??
  148. ????/**?
  149. ?????*?返回漢字字符串的拼音(英文字母)?
  150. ?????*??
  151. ?????*?@param?str?
  152. ?????*????????????漢字字符串?
  153. ?????*?@return?返回漢字字符串的拼音。將字符串中的漢字替換成拼音,其他字符不變。拼音中間會(huì)有空格。?注意,對(duì)于多音字,返回的拼音可能不正確。?
  154. ?????*?@throws?Exception?
  155. ?????*/??
  156. ????public?static?String?GetPinyinEn(String?str)?throws?Exception?{??
  157. ????????StringBuffer?sb?=?new?StringBuffer();??
  158. ????????for?(int?i?=?0;?i?<?str.length();?i++)?{??
  159. ????????????char?ch?=?str.charAt(i);??
  160. ????????????if?(isChineseChar(ch))?{??
  161. ????????????????sb.append(GetPinyinEn(ch)?+?"?");??
  162. ????????????}?else?{??
  163. ????????????????sb.append(ch);??
  164. ????????????}??
  165. ????????}??
  166. ??
  167. ????????return?sb.toString().trim();??
  168. ????}??
  169. ??
  170. ????/**?
  171. ?????*?獲取拼音(中文字母)?
  172. ?????*??
  173. ?????*?@param?ch?
  174. ?????*????????????單個(gè)漢字字符?
  175. ?????*?@return?返回漢字的中文字母拼音。如?"打"->"dǎ"。?
  176. ?????*?@throws?Exception?
  177. ?????*/??
  178. ????public?static?String?GetPinyinCn(Character?ch)?throws?Exception?{??
  179. ????????return?GetCharInfo(ch,?INDEX_PINYIN_CN);??
  180. ????}??
  181. ??
  182. ????/**?
  183. ?????*?返回漢字字符串的拼音(中文字母)?
  184. ?????*??
  185. ?????*?@param?str?
  186. ?????*????????????漢字字符串?
  187. ?????*?@return?返回漢字字符串的拼音。將字符串中的漢字替換成拼音,其他字符不變。拼音中間會(huì)有空格。?注意,對(duì)于多音字,返回的拼音可能不正確。?
  188. ?????*?@throws?Exception?
  189. ?????*/??
  190. ????public?static?String?GetPinyinCn(String?str)?throws?Exception?{??
  191. ????????StringBuffer?sb?=?new?StringBuffer();??
  192. ????????for?(int?i?=?0;?i?<?str.length();?i++)?{??
  193. ????????????char?ch?=?str.charAt(i);??
  194. ????????????if?(isChineseChar(ch))?{??
  195. ????????????????sb.append(GetPinyinCn(ch)?+?"?");??
  196. ????????????}?else?{??
  197. ????????????????sb.append(ch);??
  198. ????????????}??
  199. ????????}??
  200. ??
  201. ????????return?sb.toString().trim();??
  202. ????}??
  203. ??
  204. ????/**?
  205. ?????*?返回拼音首字母?
  206. ?????*??
  207. ?????*?@param?ch?
  208. ?????*?@return?
  209. ?????*?@throws?Exception?
  210. ?????*/??
  211. ????public?static?String?GetFirstLetter(Character?ch)?throws?Exception?{??
  212. ????????if?(isChineseChar(ch))?{??
  213. ????????????return?GetPinyinEn(ch).substring(0,?1);??
  214. ????????}?else?{??
  215. ????????????return?"";??
  216. ????????}??
  217. ????}??
  218. ??
  219. ????/**?
  220. ?????*?返回漢字字符串拼音首字母,如果不是漢字,會(huì)被忽略掉。?
  221. ?????*??
  222. ?????*?@param?str?
  223. ?????*????????????漢字字符串?
  224. ?????*?@return?
  225. ?????*?@throws?Exception?
  226. ?????*/??
  227. ????public?static?String?GetFirstLetter(String?str)?throws?Exception?{??
  228. ????????StringBuffer?sb?=?new?StringBuffer();??
  229. ????????for?(int?i?=?0;?i?<?str.length();?i++)?{??
  230. ????????????char?ch?=?str.charAt(i);??
  231. ????????????if?(isChineseChar(ch))?{??
  232. ????????????????sb.append(GetFirstLetter(ch));??
  233. ????????????}??
  234. ????????}??
  235. ??
  236. ????????return?sb.toString().trim();??
  237. ????}??
  238. ??
  239. ????/**?
  240. ?????*?獲取漢字部首?
  241. ?????*??
  242. ?????*?@param?ch?
  243. ?????*????????????漢字?
  244. ?????*?@return?返回漢字的部首?
  245. ?????*?@throws?Exception?
  246. ?????*/??
  247. ????public?static?String?GetBushou(Character?ch)?throws?Exception?{??
  248. ????????return?GetCharInfo(ch,?INDEX_BUSHOU);??
  249. ????}??
  250. ??
  251. ????/**?
  252. ?????*?獲取漢字筆畫(huà)數(shù)目?
  253. ?????*??
  254. ?????*?@param?ch?
  255. ?????*????????????漢字?
  256. ?????*?@return?返回漢字的筆畫(huà)數(shù)目?
  257. ?????*?@throws?Exception?
  258. ?????*/??
  259. ????public?static?String?GetBihua(Character?ch)?throws?Exception?{??
  260. ????????return?GetCharInfo(ch,?INDEX_BIHUA);??
  261. ????}??
  262. ??
  263. ????/**?
  264. ?????*?獲取漢字筆畫(huà)順序?
  265. ?????*??
  266. ?????*?@param?ch?
  267. ?????*????????????漢字?
  268. ?????*?@return?返回漢字的筆畫(huà)順序?
  269. ?????*?@throws?Exception?
  270. ?????*/??
  271. ????public?static?String?GetBishun(Character?ch)?throws?Exception?{??
  272. ????????return?GetCharInfo(ch,?INDEX_BISHUN);??
  273. ????}??
  274. ??
  275. ????/**?
  276. ?????*?獲取漢字五筆?
  277. ?????*??
  278. ?????*?@param?ch?
  279. ?????*????????????漢字?
  280. ?????*?@return?返回漢字五筆?
  281. ?????*?@throws?Exception?
  282. ?????*/??
  283. ????public?static?String?GetWubi(Character?ch)?throws?Exception?{??
  284. ????????return?GetCharInfo(ch,?INDEX_WUBI);??
  285. ????}??
  286. ??
  287. ????/**?
  288. ?????*?獲取漢字鄭碼?
  289. ?????*??
  290. ?????*?@param?ch?
  291. ?????*????????????漢字?
  292. ?????*?@return?返回漢字鄭碼?
  293. ?????*?@throws?Exception?
  294. ?????*/??
  295. ????public?static?String?GetZhengma(Character?ch)?throws?Exception?{??
  296. ????????return?GetCharInfo(ch,?INDEX_ZHENGMA);??
  297. ????}??
  298. ??
  299. ????/**?
  300. ?????*?從字典中獲取漢字信息?
  301. ?????*??
  302. ?????*?@param?ch?
  303. ?????*????????????要查詢(xún)的漢字?
  304. ?????*?@return?返回漢字信息,如"25171,打,扌,5,12112,RSH,DAI,da3,dǎ,da2,dá"?<br/>?
  305. ?????*?????????第一是漢字unicode值<br/>?
  306. ?????*?????????第二是漢字<br/>?
  307. ?????*?????????第三是漢字部首<br/>?
  308. ?????*?????????第四是漢字筆畫(huà)<br/>?
  309. ?????*?????????第五是漢字筆畫(huà)順序("12345"分別代表"橫豎撇捺折")<br/>?
  310. ?????*?????????第六是漢字五筆<br/>?
  311. ?????*?????????第七是漢字鄭碼<br/>?
  312. ?????*?????????第八及以后是漢字的拼音(英文字母拼音和中文字母拼音)<br/>?
  313. ?????*?@throws?Exception?
  314. ?????*/??
  315. ????public?static?String?GetCharInfo(Character?ch)?throws?Exception?{??
  316. ????????if?(!isChineseChar(ch))?{??
  317. ????????????throw?new?Exception("'"?+?ch?+?"'?不是一個(gè)漢字!");??
  318. ????????}??
  319. ??
  320. ????????String?result?=?"";??
  321. ??
  322. ????????ByteArrayInputStream?bais?=?new?ByteArrayInputStream(bytes);??
  323. ????????BufferedReader?br?=?new?BufferedReader(new?InputStreamReader(bais));??
  324. ??
  325. ????????String?strWord;??
  326. ????????while?((strWord?=?br.readLine())?!=?null)?{??
  327. ????????????if?(strWord.startsWith(String.valueOf(ch.hashCode())))?{??
  328. ????????????????result?=?strWord;??
  329. ????????????????break;??
  330. ????????????}??
  331. ????????}??
  332. ????????br.close();??
  333. ????????bais.close();??
  334. ??
  335. ????????return?result;??
  336. ????}??
  337. ??
  338. ????/**?
  339. ?????*?返回漢字信息?
  340. ?????*??
  341. ?????*?@param?ch?
  342. ?????*????????????漢字?
  343. ?????*?@param?index?
  344. ?????*????????????信息所在的Index?
  345. ?????*?@return?
  346. ?????*?@throws?Exception?
  347. ?????*/??
  348. ????private?static?String?GetCharInfo(Character?ch,?int?index)?throws?Exception?{??
  349. ????????if?(!isChineseChar(ch))?{??
  350. ????????????throw?new?Exception("'"?+?ch?+?"'?不是一個(gè)漢字!");??
  351. ????????}??
  352. ??
  353. ????????//?獲取漢字信息??
  354. ????????String?charInfo?=?GetCharInfo(ch);??
  355. ??
  356. ????????String?result?=?"";??
  357. ????????try?{??
  358. ????????????result?=?charInfo.split(",")[index];??
  359. ????????}?catch?(Exception?e)?{??
  360. ????????????throw?new?Exception("請(qǐng)查看字典中"?+?ch?+?"漢字記錄是否正確!");??
  361. ????????}??
  362. ??
  363. ????????return?result;??
  364. ????}??
  365. ??
  366. ????/**?
  367. ?????*?載入字典文件到內(nèi)存。?
  368. ?????*?@throws?Exception??
  369. ?????*/??
  370. ????private?static?void?LoadDictionary()?throws?Exception?{??
  371. ????????File?file?=?new?File(DIC_FILENAME);??
  372. ????????bytes?=?new?byte[(int)?file.length()];??
  373. ????????FileInputStream?fis?=?new?FileInputStream(file);??
  374. ????????fis.read(bytes,?0,?bytes.length);??
  375. ????????fis.close();??
  376. ????}??
  377. ??
  378. ????/**?
  379. ?????*?判斷字符是否為漢字,在測(cè)試的時(shí)候,我發(fā)現(xiàn)漢字的字符的hashcode值?跟漢字Unicode?
  380. ?????*?16的值一樣,所以可以用hashcode來(lái)判斷是否為漢字。?
  381. ?????*??
  382. ?????*?@param?ch?
  383. ?????*????????????漢字?
  384. ?????*?@return?是漢字返回true,否則返回false。?
  385. ?????*/??
  386. ????public?static?boolean?isChineseChar(Character?ch)?{??
  387. ????????if?(ch.hashCode()?>=?CN_U16_CODE_MIN??
  388. ????????????????&&?ch.hashCode()?<=?CN_U16_CODE_MAX)?{??
  389. ????????????return?true;??
  390. ????????}?else?{??
  391. ????????????return?false;??
  392. ????????}??
  393. ????}??
  394. ??
  395. ????/**?
  396. ?????*??
  397. ?????*?@return?返回字典包含的漢字?jǐn)?shù)目。?
  398. ?????*?@throws?Exception?
  399. ?????*/??
  400. ????private?static?int?count()?throws?Exception?{??
  401. ????????int?cnt?=?0;??
  402. ????????ByteArrayInputStream?bais?=?new?ByteArrayInputStream(bytes);??
  403. ????????BufferedReader?br?=?new?BufferedReader(new?InputStreamReader(bais));??
  404. ??
  405. ????????while?(br.readLine()?!=?null)?{??
  406. ????????????cnt++;??
  407. ????????}??
  408. ????????br.close();??
  409. ????????bais.close();??
  410. ??
  411. ????????return?cnt;??
  412. ????}??
  413. }??

3.Sample.java

如何使用字典

[java]?view plaincopy

  1. package?com.siqi.dict;??
  2. ??
  3. /**?
  4. ?*?包含兩個(gè)實(shí)例,示例如何獲取漢字的拼音等信息。?
  5. ?*?@author?siqi?
  6. ?*?
  7. ?*/??
  8. public?class?Sample?{??
  9. ??
  10. ????/**?
  11. ?????*?字典使用實(shí)例?
  12. ?????*??
  13. ?????*?@param?args?
  14. ?????*/??
  15. ????public?static?void?main(String[]?args)?{??
  16. ????????try?{??
  17. ????????????long?time?=?System.currentTimeMillis();??
  18. ??
  19. ????????????char?ch?=?'打';??
  20. ????????????//漢字單個(gè)字符??
  21. ????????????System.out.println("====打字信息開(kāi)始====");??
  22. ????????????System.out.println("首字母:"+Dic.GetFirstLetter(ch));??
  23. ????????????System.out.println("拼音(中):"+Dic.GetPinyinCn(ch));??
  24. ????????????System.out.println("拼音(英):"+Dic.GetPinyinEn(ch));??
  25. ????????????System.out.println("部首:"+Dic.GetBushou(ch));??
  26. ????????????System.out.println("筆畫(huà)數(shù)目:"+Dic.GetBihua(ch));??
  27. ????????????System.out.println("筆畫(huà):"+Dic.GetBishun(ch));??
  28. ????????????System.out.println("五筆:"+Dic.GetWubi(ch));??
  29. ????????????System.out.println("====打字信息結(jié)束====");??
  30. ??????????????
  31. ????????????//漢字字符串??
  32. ????????????System.out.println("\r\n====漢字字符串====");??
  33. ????????????System.out.println(Dic.GetPinyinEn("返回漢字字符串的拼音。"));??
  34. ????????????System.out.println(Dic.GetPinyinCn("返回漢字字符串的拼音。"));??
  35. ????????????System.out.println(Dic.GetFirstLetter("返回漢字字符串的拼音。"));??
  36. ????????????System.out.println("====漢字字符串====\r\n");??
  37. ??????????????
  38. ????????????System.out.println("用時(shí):"+(System.currentTimeMillis()-time)+"毫秒");??
  39. ??????????????
  40. ????????}?catch?(Exception?e)?{??
  41. ????????????e.printStackTrace();??
  42. ????????}??
  43. ??
  44. ????}??
  45. }??

4.結(jié)果

[html]?view plaincopy

  1. ====打字信息開(kāi)始====??
  2. 成功載入字典C:\workspaces\01_java\DictLocal\dic.txt?,用時(shí):15毫秒,載入字符數(shù)20902??
  3. 首字母:d??
  4. 拼音(中):dǎ??
  5. 拼音(英):da3??
  6. 部首:扌??
  7. 筆畫(huà)數(shù)目:5??
  8. 筆畫(huà):12112??
  9. 五筆:RSH??
  10. ====打字信息結(jié)束====??
  11. ??
  12. ====漢字字符串====??
  13. fan3?hui2?han4?zi4?zi4?fu2?chuan4?di2?pin1?yin1?。??
  14. fǎn?huí?hàn?zì?zì?fú?chuàn?dí?pīn?yīn?。??
  15. fhhzzfcdpy??
  16. ====漢字字符串====??
  17. ??
  18. Memory(Used/Total)?:?1539/15872?KB??
  19. 用時(shí):218毫秒??

待會(huì)再上傳如何獲取字典文件的,我是通過(guò)收集http://www.zdic.net/zd/的網(wǎng)頁(yè)來(lái)獲取的

=============補(bǔ)充,如何獲取漢字的信息================

=============所有的信息都是從漢典網(wǎng)站上獲取的=========

目錄結(jié)構(gòu)為:

環(huán)境:eclipsse, jdk1.6, 沒(méi)有使用第三方的包,都是JDK有的。

注意,項(xiàng)目源文件我都使用的是UTF-8的編碼格式,如果不是,代碼里面的漢字注釋會(huì)顯示亂碼。

設(shè)置UTF-8:windows->Preferences->General->Workspace 頁(yè)面上Text file encoding,選擇Other UTF-8

包說(shuō)明:

com.siqi.http

? ? Httpclient.Java是我寫(xiě)的一個(gè)簡(jiǎn)單的獲取網(wǎng)頁(yè)的類(lèi),用來(lái)獲取網(wǎng)頁(yè)內(nèi)容;

com.siqi.dict

? ? DictMain.java用來(lái)下載漢字網(wǎng)頁(yè),從中獲取漢字的拼音信息,并保存到data.dat中

? ? DownloadThread.java用來(lái)下載網(wǎng)頁(yè)(多線程)

com.siqi.pinyin

? ? PinYin.java在執(zhí)行過(guò)DictMain.java后,會(huì)生成一個(gè)data.dat,把這個(gè)文件拷貝到com.siqi.pinyin包下面,就可以調(diào)用PinYin.java里面的函數(shù)得到漢字的拼音了

? ? PinYinEle.java一個(gè)漢字->拼音->Unicode的模型

源碼:

Httpclient.java 可以用來(lái)獲取網(wǎng)頁(yè),可以的到網(wǎng)頁(yè)內(nèi)容,網(wǎng)頁(yè)編碼和網(wǎng)頁(yè)的header,簡(jiǎn)版

[java]?view plaincopy

  1. package?com.siqi.http;??
  2. ??
  3. import?java.io.IOException;??
  4. import?java.io.InputStream;??
  5. import?java.net.Socket;??
  6. import?java.net.URLEncoder;??
  7. import?java.util.regex.Matcher;??
  8. import?java.util.regex.Pattern;??
  9. ??
  10. /**?
  11. ?*?使用SOCKET實(shí)現(xiàn)簡(jiǎn)單的網(wǎng)頁(yè)GET和POST?
  12. ?*??
  13. ?*?@author?siqi?
  14. ?*??
  15. ?*/??
  16. public?class?Httpclient?{??
  17. ??
  18. ????/**?
  19. ?????*?processUrl?參數(shù)?HTTP?GET?
  20. ?????*/??
  21. ????public?static?final?int?METHOD_GET?=?0;??
  22. ????/**?
  23. ?????*?processUrl?參數(shù)?HTTP?POST?
  24. ?????*/??
  25. ????public?static?final?int?METHOD_POST?=?1;??
  26. ????/**?
  27. ?????*?HTTP?GET的報(bào)頭,簡(jiǎn)化版?
  28. ?????*/??
  29. ????public?static?final?String?HEADER_GET?=?"GET?%s?HTTP/1.0\r\nHOST:?%s\r\n\r\n";??
  30. ????/**?
  31. ?????*?HTTP?POST的報(bào)頭,簡(jiǎn)化版?
  32. ?????*/??
  33. ????public?static?final?String?HEADER_POST?=?"POST?%s?HTTP/1.0\r\nHOST:?%s\r\nContent-Length:?0\r\n\r\n";??
  34. ????/**?
  35. ?????*?網(wǎng)頁(yè)報(bào)頭和內(nèi)容的分割符?
  36. ?????*/??
  37. ????public?static?final?String?CONTENT_SEPARATOR?=?"\r\n\r\n";??
  38. ????/**?
  39. ?????*?網(wǎng)頁(yè)請(qǐng)求響應(yīng)內(nèi)容byte?
  40. ?????*/??
  41. ????private?byte[]?bytes?=?new?byte[0];??
  42. ????/**?
  43. ?????*?網(wǎng)頁(yè)報(bào)頭?
  44. ?????*/??
  45. ????private?String?header?=?"";??
  46. ????/**?
  47. ?????*?網(wǎng)頁(yè)內(nèi)容?
  48. ?????*/??
  49. ????private?String?content?=?"";??
  50. ??
  51. ????/**?
  52. ?????*?網(wǎng)頁(yè)編碼,默認(rèn)為UTF-8?
  53. ?????*/??
  54. ????public?static?final?String?CHARSET_DEFAULT?=?"UTF-8";??
  55. ????/**?
  56. ?????*?網(wǎng)頁(yè)編碼?
  57. ?????*/??
  58. ????private?String?charset?=?CHARSET_DEFAULT;??
  59. ??
  60. ????/**?
  61. ?????*?使用Httpclient的例子?
  62. ?????*??
  63. ?????*?@param?args?
  64. ?????*?@throws?Exception?
  65. ?????*/??
  66. ????public?static?void?main(String[]?args)?throws?Exception?{??
  67. ????????Httpclient?httpclient?=?new?Httpclient();??
  68. ????????//?請(qǐng)求百度首頁(yè)(手機(jī)版)??
  69. ????????httpclient.processUrl("http://m.baidu.com/");??
  70. ????????System.out.println("獲取網(wǎng)頁(yè)http://m.baidu.com/");??
  71. ????????System.out.println("報(bào)頭為:\r\n"?+?httpclient.getHeader());??
  72. ????????System.out.println("內(nèi)容為:\r\n"?+?httpclient.getContent());??
  73. ????????System.out.println("編碼為:\r\n"?+?httpclient.getCharset());??
  74. ????????System.out.println("************************************");??
  75. ??
  76. ????????//?使用百度搜索"中國(guó)"(手機(jī)版)??
  77. ????????//?這是手機(jī)百度搜索框的源碼?<input?id="word"?type="text"?size="20"?maxlength="64"??
  78. ????????//?name="word">??
  79. ????????String?url?=?String.format("http://m.baidu.com/s?word=%s",??
  80. ????????????????URLEncoder.encode("中國(guó)",?CHARSET_DEFAULT));??
  81. ????????httpclient.processUrl(url,?METHOD_POST);??
  82. ????????System.out.println("獲取網(wǎng)頁(yè)http://m.baidu.com/s?word=中國(guó)");??
  83. ????????System.out.println("報(bào)頭為:\r\n"?+?httpclient.getHeader());??
  84. ????????System.out.println("內(nèi)容為:\r\n"?+?httpclient.getContent());??
  85. ????????System.out.println("編碼為:\r\n"?+?httpclient.getCharset());??
  86. ????}??
  87. ??
  88. ????/**?
  89. ?????*?初始化,設(shè)置所有變量為默認(rèn)值?
  90. ?????*/??
  91. ????private?void?init()?{??
  92. ????????this.bytes?=?new?byte[0];??
  93. ????????this.charset?=?CHARSET_DEFAULT;??
  94. ????????this.header?=?"";??
  95. ????????this.content?=?"";??
  96. ??
  97. ????}??
  98. ??
  99. ????/**?
  100. ?????*?獲取網(wǎng)頁(yè)報(bào)頭header?
  101. ?????*??
  102. ?????*?@return?
  103. ?????*/??
  104. ????public?String?getHeader()?{??
  105. ????????return?header;??
  106. ????}??
  107. ??
  108. ????/**?
  109. ?????*?獲取網(wǎng)頁(yè)內(nèi)容content?
  110. ?????*??
  111. ?????*?@return?
  112. ?????*/??
  113. ????public?String?getContent()?{??
  114. ????????return?content;??
  115. ????}??
  116. ??
  117. ????/**?
  118. ?????*?獲取網(wǎng)頁(yè)編碼?
  119. ?????*??
  120. ?????*?@return?
  121. ?????*/??
  122. ????public?String?getCharset()?{??
  123. ????????return?charset;??
  124. ????}??
  125. ??
  126. ????/**?
  127. ?????*?請(qǐng)求網(wǎng)頁(yè)內(nèi)容(使用HTTP?GET)?
  128. ?????*??
  129. ?????*?@param?url?
  130. ?????*?@throws?Exception?
  131. ?????*/??
  132. ????public?void?processUrl(String?url)?throws?Exception?{??
  133. ????????processUrl(url,?METHOD_GET);??
  134. ????}??
  135. ??
  136. ????/**?
  137. ?????*?使用Socket請(qǐng)求(獲取)一個(gè)網(wǎng)頁(yè)。<br/>?
  138. ?????*?例如:<br/>?
  139. ?????*?processUrl("http://www.baidu.com/",?METHOD_GET)會(huì)獲取百度首頁(yè);<br/>?
  140. ?????*??
  141. ?????*?@param?url?
  142. ?????*????????????這個(gè)網(wǎng)頁(yè)或者網(wǎng)頁(yè)內(nèi)容的地址?
  143. ?????*?@param?method?
  144. ?????*????????????請(qǐng)求網(wǎng)頁(yè)的方法:?METHOD_GET或者M(jìn)ETHOD_POST?
  145. ?????*?@throws?Exception?
  146. ?????*/??
  147. ????public?void?processUrl(String?url,?int?method)?throws?Exception?{??
  148. ??
  149. ????????init();??
  150. ??
  151. ????????//?url?=?"http://www.zdic.net/search/?c=2&q=%E5%A4%A7";??
  152. ????????//?規(guī)范化鏈接,當(dāng)網(wǎng)址為http://www.baidu.com時(shí),將網(wǎng)址變?yōu)?#xff1a;http://www.baidu.com/??
  153. ????????Matcher?mat?=?Pattern.compile("https?://[^/]+").matcher(url);??
  154. ????????if?(mat.find()?&&?mat.group().equals(url))?{??
  155. ????????????url?+=?"/";??
  156. ????????}??
  157. ??
  158. ????????Socket?socket?=?new?Socket(getHostUrl(url),?80);?//?設(shè)置要連接的服務(wù)器地址??
  159. ????????socket.setSoTimeout(3000);?//?設(shè)置超時(shí)時(shí)間為3秒??
  160. ??
  161. ????????String?request?=?null;??
  162. ????????//?構(gòu)造請(qǐng)求,詳情請(qǐng)參考HTTP協(xié)議(RFC2616)??
  163. ????????if?(method?==?METHOD_POST)?{??
  164. ????????????request?=?String.format(HEADER_POST,?getSubUrl(url),??
  165. ????????????????????getHostUrl(url));??
  166. ????????}?else?{??
  167. ????????????request?=?String??
  168. ????????????????????.format(HEADER_GET,?getSubUrl(url),?getHostUrl(url));??
  169. ????????}??
  170. ??
  171. ????????socket.getOutputStream().write(request.getBytes());//?發(fā)送請(qǐng)求??
  172. ??
  173. ????????this.bytes?=?InputStream2ByteArray(socket.getInputStream());//?讀取響應(yīng)??
  174. ??
  175. ????????//?獲取網(wǎng)頁(yè)編碼,我們只需要測(cè)試查找前4096個(gè)字節(jié),一般編碼信息都會(huì)在里面找到??
  176. ????????String?temp?=?new?String(this.bytes,?0,??
  177. ????????????????bytes.length?<?4096???bytes.length?:?4096);??
  178. ????????mat?=?Pattern.compile("(?<=<meta.{0,100}?charset=)[a-z-0-9]*",??
  179. ????????????????Pattern.CASE_INSENSITIVE).matcher(temp);??
  180. ????????if?(mat.find())?{??
  181. ????????????this.charset?=?mat.group();??
  182. ????????}?else?{??
  183. ????????????this.charset?=?CHARSET_DEFAULT;??
  184. ????????}??
  185. ??
  186. ????????//?用正確的編碼得到網(wǎng)頁(yè)報(bào)頭和內(nèi)容??
  187. ????????temp?=?new?String(this.bytes,?this.charset);??
  188. ????????int?headerEnd?=?temp.indexOf(CONTENT_SEPARATOR);??
  189. ????????this.header?=?temp.substring(0,?headerEnd);??
  190. ????????this.content?=?temp.substring(headerEnd?+?CONTENT_SEPARATOR.length(),??
  191. ????????????????temp.length());??
  192. ??
  193. ????????socket.close();?//?關(guān)閉socket??
  194. ????}??
  195. ??
  196. ????/**?
  197. ?????*?根據(jù)網(wǎng)址,獲取服務(wù)器地址<br/>?
  198. ?????*?例如:<br/>?
  199. ?????*?http://m.weathercn.com/common/province.jsp?
  200. ?????*?<p>?
  201. ?????*?返回:<br/>?
  202. ?????*?m.weathercn.com?
  203. ?????*??
  204. ?????*?@param?url?
  205. ?????*????????????網(wǎng)址?
  206. ?????*?@return?
  207. ?????*/??
  208. ????public?static?String?getHostUrl(String?url)?{??
  209. ????????String?host?=?"";??
  210. ????????Matcher?mat?=?Pattern.compile("(?<=https?://).+?(?=/)").matcher(url);??
  211. ????????if?(mat.find())?{??
  212. ????????????host?=?mat.group();??
  213. ????????}??
  214. ??
  215. ????????return?host;??
  216. ????}??
  217. ??
  218. ????/**?
  219. ?????*?根據(jù)網(wǎng)址,獲取網(wǎng)頁(yè)路徑?例如:<br/>?
  220. ?????*?http://m.weathercn.com/common/province.jsp?
  221. ?????*?<p>?
  222. ?????*?返回:<br/>?
  223. ?????*?/common/province.jsp?
  224. ?????*??
  225. ?????*?@param?url?
  226. ?????*?@return?如果沒(méi)有獲取到網(wǎng)頁(yè)路徑,返回"";?
  227. ?????*/??
  228. ????public?static?String?getSubUrl(String?url)?{??
  229. ????????String?subUrl?=?"";??
  230. ????????Matcher?mat?=?Pattern.compile("https?://.+?(?=/)").matcher(url);??
  231. ????????if?(mat.find())?{??
  232. ????????????subUrl?=?url.substring(mat.group().length());??
  233. ????????}??
  234. ??
  235. ????????return?subUrl;??
  236. ????}??
  237. ??
  238. ????/**?
  239. ?????*?將b1和b2兩個(gè)byte數(shù)組拼接成一個(gè),?結(jié)果=b1+b2?
  240. ?????*??
  241. ?????*?@param?b1?
  242. ?????*?@param?b2?
  243. ?????*?@return?
  244. ?????*/??
  245. ????public?static?byte[]?ByteArrayCat(byte[]?b1,?byte[]?b2)?{??
  246. ????????byte[]?b?=?new?byte[b1.length?+?b2.length];??
  247. ????????System.arraycopy(b1,?0,?b,?0,?b1.length);??
  248. ????????System.arraycopy(b2,?0,?b,?b1.length,?b2.length);??
  249. ????????return?b;??
  250. ????}??
  251. ??
  252. ????/**?
  253. ?????*?讀取輸入流并轉(zhuǎn)為byte數(shù)組,不返回字符串,?是因?yàn)檩斎肓鞯木幋a不確定,錯(cuò)誤的編碼會(huì)造成亂碼。?
  254. ?????*??
  255. ?????*?@param?is?
  256. ?????*????????????輸入流inputstream?
  257. ?????*?@return?字符串?
  258. ?????*?@throws?IOException?
  259. ?????*/??
  260. ????public?static?byte[]?InputStream2ByteArray(InputStream?is)??
  261. ????????????throws?IOException?{??
  262. ????????byte[]?b?=?new?byte[0];??
  263. ????????byte[]?bb?=?new?byte[4096];?//?緩沖區(qū)??
  264. ??
  265. ????????int?len?=?0;??
  266. ????????while?((len?=?is.read(bb))?!=?-1)?{??
  267. ????????????byte[]?newb?=?new?byte[b.length?+?len];??
  268. ????????????System.arraycopy(b,?0,?newb,?0,?b.length);??
  269. ????????????System.arraycopy(bb,?0,?newb,?b.length,?len);??
  270. ????????????b?=?newb;??
  271. ????????}??
  272. ??
  273. ????????return?b;??
  274. ????}??
  275. }??


DictMain.java

[java]?view plaincopy

  1. package?com.siqi.dict;??
  2. ??
  3. import?java.io.File;??
  4. import?java.io.FileReader;??
  5. import?java.io.FileWriter;??
  6. import?java.io.IOException;??
  7. import?java.util.regex.Matcher;??
  8. import?java.util.regex.Pattern;??
  9. ??
  10. /**?
  11. ?*?從漢典下載漢字網(wǎng)頁(yè),并提取拼音信息?
  12. ?*?@author?siqi?
  13. ?*?
  14. ?*/??
  15. public?class?DictMain?{??
  16. ????/**?
  17. ?????*?網(wǎng)頁(yè)保存路徑?
  18. ?????*/??
  19. ????public?static?final?String?SAVEPATH?=?"dict/pages/";??
  20. ????/**?
  21. ?????*?下載的漢字網(wǎng)頁(yè)名稱(chēng)?
  22. ?????*/??
  23. ????public?static?final?String?FILEPATH?=?SAVEPATH?+?"%s.html";??
  24. ??????
  25. ????/**?
  26. ?????*?字典數(shù)據(jù)文件名稱(chēng)?
  27. ?????*/??
  28. ????public?static?final?String?DATA_FILENAME?=?"data.txt";??
  29. ??????
  30. ????/**?
  31. ?????*?漢字unicode最小?
  32. ?????*/??
  33. ????public?static?final?int?UNICODE_MIN?=?0x4E00;??
  34. ??????
  35. ????/**?
  36. ?????*?漢字unicode最大?
  37. ?????*/??
  38. ????public?static?final?int?UNICODE_MAX?=?0x9FFF;??
  39. ??????
  40. ????/**?
  41. ?????*?準(zhǔn)備工作:?
  42. ?????*?1.從漢典網(wǎng)站下載所有漢字的頁(yè)面,注意,不要在eclipse中打開(kāi)保存頁(yè)面的文件夾,?
  43. ?????*?因?yàn)槊總€(gè)漢字一個(gè)頁(yè)面,總共有20000+個(gè)頁(yè)面,容易卡死eclipse?
  44. ?????*?2.從漢字頁(yè)面獲取漢字拼音信息,生成data.dat文件?
  45. ?????*?3.生成的data.dat復(fù)制到com.siqi.pinyin下面?
  46. ?????*?4.可以使用com.siqi.pinyin.PinYin.java了?
  47. ?????*/??
  48. ????static{??
  49. ????????//?下載網(wǎng)頁(yè)??
  50. ????????for?(int?i?=?UNICODE_MIN;?i?<=?UNICODE_MAX;?i++)?{??
  51. ????????????//?檢查是否已經(jīng)存在??
  52. ????????????String?filePath?=?String.format(FILEPATH,?i);?//?文件名??
  53. ????????????File?file?=?new?File(filePath);??
  54. ????????????if?(!file.exists())?{??
  55. ????????????????new?DownloadThread(i).start();??
  56. ????????????}??
  57. ????????}??
  58. ??????????
  59. ????????//解析網(wǎng)頁(yè),得到拼音信息,并保存到data.dat??
  60. ????????StringBuffer?sb?=?new?StringBuffer();??
  61. ????????for?(int?i?=?UNICODE_MIN;?i?<=?UNICODE_MAX;?i++)?{??
  62. ????????????String?word?=?new?String(Character.toChars(i));??
  63. ????????????String?pinyin?=?getPinYinFromWebpageFile(String.format(FILEPATH,?i));??
  64. ????????????String?str?=?String.format("%s,%s,%s\r\n",?i,word,pinyin);??
  65. ????????????System.out.print(str);??
  66. ????????????sb.append(str);??
  67. ????????}??
  68. ??????????
  69. ????????//保存到data.dat??
  70. ????????try?{??
  71. ????????????FileWriter?fw?=?new?FileWriter(DATA_FILENAME);??
  72. ????????????fw.write(sb.toString());??
  73. ????????????fw.close();??
  74. ????????}?catch?(IOException?e)?{??
  75. ????????????e.printStackTrace();??
  76. ????????}??
  77. ??????????
  78. ????}??
  79. ??????
  80. ????public?static?void?main(String[]?args){??
  81. ??????????
  82. ????????System.out.println("All?prepared!");??
  83. ????}??
  84. ??????
  85. ????/**?
  86. ?????*?從網(wǎng)頁(yè)文件獲取拼音信息?
  87. ?????*?@param?file?
  88. ?????*?@return?
  89. ?????*/??
  90. ????private?static?String?getPinYinFromWebpageFile(String?file)?{??
  91. ????????try?{??
  92. ??????????????
  93. ????????????char[]?buff?=?new?char[(int)?new?File(file).length()];??
  94. ??????????????
  95. ????????????FileReader?reader?=?new?FileReader(file);??
  96. ????????????reader.read(buff);??
  97. ????????????reader.close();??
  98. ??????????????
  99. ????????????String?content?=?new?String(buff);??
  100. ????????????//?spf("yi1")??
  101. ????????????Matcher?mat?=?Pattern.compile("(?<=spf\\(\")[a-z1-4]{0,100}",??
  102. ????????????????????Pattern.CASE_INSENSITIVE).matcher(content);??
  103. ????????????if?(mat.find())?{??
  104. ????????????????return?mat.group();??
  105. ????????????}??
  106. ????????????//<span?class="dicpy">cal</span>?spf("xin1")??
  107. ????????????mat?=?Pattern.compile("(?<=class=\"dicpy\">)[a-z1-4]{0,100}",??
  108. ????????????????????Pattern.CASE_INSENSITIVE).matcher(content);??
  109. ????????????if?(mat.find())?{??
  110. ????????????????return?mat.group();??
  111. ????????????}??
  112. ????????}?catch?(Exception?e)?{??
  113. ????????????e.printStackTrace();??
  114. ????????}??
  115. ??????????
  116. ????????return?"";??
  117. ??
  118. ????}??
  119. }??


DownloadThread.java

[java]?view plaincopy

  1. package?com.siqi.dict;??
  2. ??
  3. import?java.io.File;??
  4. import?java.io.FileWriter;??
  5. import?java.net.URLEncoder;??
  6. import?java.util.regex.Matcher;??
  7. import?java.util.regex.Pattern;??
  8. ??
  9. import?com.siqi.http.Httpclient;??
  10. ??
  11. /**?
  12. ?*?將漢字頁(yè)面從漢典網(wǎng)站下載下來(lái),存儲(chǔ)到本地?
  13. ?*?http://www.zdic.net/search/?c=2?
  14. ?*?@author?siqi?
  15. ?*?
  16. ?*/??
  17. public?class?DownloadThread?extends?Thread{??
  18. ??????
  19. ????/**?
  20. ?????*?線程最大數(shù)目?
  21. ?????*/??
  22. ????public?static?int?THREAD_MAX?=?10;??
  23. ??????
  24. ????/**?
  25. ?????*?下載最大重復(fù)次數(shù)?
  26. ?????*/??
  27. ????public?static?int?RETRY_MAX?=?5;??
  28. ??????
  29. ????/**?
  30. ?????*?漢典網(wǎng)站搜索網(wǎng)址?
  31. ?????*/??
  32. ????public?static?String?SEARCH_URL?=?"http://www.zdic.net/search/?q=%s";??
  33. ??????
  34. ????/**?
  35. ?????*?當(dāng)前線程數(shù)目?
  36. ?????*/??
  37. ????private?static?int?threadCnt?=?0;??
  38. ??????
  39. ????/**?
  40. ?????*?當(dāng)前線程處理漢字的unicode編碼?
  41. ?????*/??
  42. ????private?int?unicode?=?0;??
  43. ??????
  44. ????/**?
  45. ?????*?如果PATH文件夾不存在,那么創(chuàng)建它?
  46. ?????*/??
  47. ????static{??
  48. ????????try?{??
  49. ????????????File?file?=?new?File(DictMain.SAVEPATH);??
  50. ????????????if?(!file.exists())?{??
  51. ????????????????file.mkdirs();??
  52. ????????????}??
  53. ????????}?catch?(Exception?e)?{??
  54. ??
  55. ????????}??
  56. ????}??
  57. ??????
  58. ????/**?
  59. ?????*?返回當(dāng)前線程數(shù)量?
  60. ?????*?@param?i?修改當(dāng)前線程數(shù)量?threadCnt?+=?i;?
  61. ?????*?@return?返回修改后線程數(shù)量?
  62. ?????*/??
  63. ????public?static?synchronized?int?threadCnt(int?i){??
  64. ????????threadCnt?+=?i;??
  65. ????????return?threadCnt;??
  66. ????}??
  67. ??????
  68. ????/**?
  69. ?????*?下載UNICODE編碼為unicode的漢字網(wǎng)頁(yè)?
  70. ?????*?@param?unicode?
  71. ?????*/??
  72. ????public?DownloadThread(int?unicode){??
  73. ????????//等待,直到當(dāng)前線程數(shù)量小于THREAD_MAX??
  74. ????????while(threadCnt(0)>THREAD_MAX){??
  75. ????????????try?{??
  76. ????????????????Thread.sleep(500);??
  77. ????????????}?catch?(InterruptedException?e)?{??
  78. ????????????}??
  79. ????????}??
  80. ??????????
  81. ????????threadCnt(1);???//線程數(shù)量+1??
  82. ????????this.unicode?=?unicode;??
  83. ????}??
  84. ??
  85. ????@Override??
  86. ????public?void?run()?{??
  87. ????????long?t1?=?System.currentTimeMillis();?//?記錄時(shí)間??
  88. ??
  89. ????????String?filePath?=?String.format(DictMain.FILEPATH,?unicode);?//?文件名??
  90. ??
  91. ????????String?word?=?new?String(Character.toChars(unicode));?//?將unicode轉(zhuǎn)換為數(shù)字??
  92. ??
  93. ????????boolean?downloaded?=?false;??
  94. ????????int?retryCnt?=?0;?//?下載失敗重復(fù)次數(shù)??
  95. ????????while?(!downloaded?&&?retryCnt?<?RETRY_MAX)?{??
  96. ????????????try?{??
  97. ????????????????String?content?=?DownloadPage(word);??
  98. ????????????????SaveToFile(filePath,?content);??
  99. ????????????????downloaded?=?true;??
  100. ??
  101. ????????????????threadCnt(-1);??
  102. ????????????????System.out.println(String.format("%s,?%s,?下載成功!線程數(shù)目:%s?用時(shí):%s",??
  103. ????????????????????????unicode,?word,?threadCnt(0),?System.currentTimeMillis()??
  104. ????????????????????????????????-?t1));??
  105. ????????????????return;??
  106. ????????????}?catch?(Exception?e)?{??
  107. ????????????????retryCnt++;??
  108. ????????????}??
  109. ????????}??
  110. ??
  111. ????????threadCnt(-1);??
  112. ????????System.err.println(String.format("%s,?%s,?下載失敗!線程數(shù)目:%s?用時(shí):%s",?unicode,??
  113. ????????????????word,?threadCnt(0),?System.currentTimeMillis()?-?t1));??
  114. ????}??
  115. ??????
  116. ????/**?
  117. ?????*?在漢典網(wǎng)站上查找漢字,返回漢字字典頁(yè)面內(nèi)容?
  118. ?????*?@param?word?
  119. ?????*?@return?
  120. ?????*?@throws?Exception?
  121. ?????*/??
  122. ????public?String?DownloadPage(String?word)?throws?Exception{??
  123. ????????//查找word??
  124. ????????Httpclient?httpclient?=?new?Httpclient();??
  125. ????????String?url?=?String.format(SEARCH_URL,?URLEncoder.encode(word,?"UTF-8"));??
  126. ????????httpclient.processUrl(url,?Httpclient.METHOD_POST);??
  127. ??????????
  128. ????????//返回的是一個(gè)跳轉(zhuǎn)頁(yè)??
  129. ????????//獲取跳轉(zhuǎn)的鏈接??
  130. ????????Matcher?mat?=?Pattern.compile("(?<=HREF=\")[^\"]+").matcher(httpclient.getContent());??
  131. ????????if(mat.find()){??
  132. ????????????httpclient.processUrl(mat.group());??
  133. ????????}??
  134. ??????????
  135. ????????return?httpclient.getContent();??
  136. ????}??
  137. ??????
  138. ????/**?
  139. ?????*?將內(nèi)容content寫(xiě)入file文件?
  140. ?????*?@param?file?
  141. ?????*?@param?content?
  142. ?????*/??
  143. ????public?void?SaveToFile(String?file,?String?content){??
  144. ????????try?{??
  145. ????????????FileWriter?fw?=?new?FileWriter(file);??
  146. ????????????fw.write(content);??
  147. ????????????fw.close();??
  148. ????????}?catch?(Exception?e)?{??
  149. ????????????e.printStackTrace();??
  150. ????????}??
  151. ????}??
  152. }??


PinYin.java

[java]?view plaincopy

  1. package?com.siqi.pinyin;??
  2. ??
  3. import?java.io.BufferedReader;??
  4. import?java.io.InputStreamReader;??
  5. import?java.util.HashMap;??
  6. import?java.util.Map;??
  7. ??
  8. public?class?PinYin?{??
  9. ??
  10. ????private?static?Map<Integer,?PinYinEle>?map?=?new?HashMap<Integer,?PinYinEle>();??
  11. ??
  12. ????/**?
  13. ?????*?載入pinyin數(shù)據(jù)文件?
  14. ?????*/??
  15. ????static?{??
  16. ????????try?{??
  17. ????????????BufferedReader?bReader?=?new?BufferedReader(new?InputStreamReader(??
  18. ????????????????????PinYin.class.getResourceAsStream("data.dat")));??
  19. ????????????String?aLine?=?null;??
  20. ????????????while?((aLine?=?bReader.readLine())?!=?null)?{??
  21. ????????????????PinYinEle?ele?=?new?PinYinEle(aLine);??
  22. ????????????????map.put(ele.getUnicode(),?ele);??
  23. ????????????}??
  24. ????????????bReader.close();??
  25. ????????}?catch?(Exception?e)?{??
  26. ????????????e.printStackTrace();??
  27. ????????}??
  28. ????}??
  29. ??
  30. ????/**?
  31. ?????*?去掉注釋可以測(cè)試一下?
  32. ?????*??
  33. ?????*?@param?args?
  34. ?????*/??
  35. ????public?static?void?main(String[]?args)?{??
  36. ????????System.out.println(" 包含聲調(diào):"?+?PinYin.getPinYin("大家haome12345"));??
  37. ????????System.out.println("不包含聲調(diào):"?+?PinYin.getPinYin("大家haome12345",?false));??
  38. ????}??
  39. ??
  40. ????/**?
  41. ?????*?獲取漢字字符串的拼音,containsNumber是否獲取拼音中的聲調(diào)1、2、3、4?
  42. ?????*??
  43. ?????*?@param?str?
  44. ?????*?@param?containsNumber?
  45. ?????*????????????true?=?包含聲調(diào),false?=?不包含聲調(diào)?
  46. ?????*?@return?
  47. ?????*/??
  48. ????public?static?String?getPinYin(String?str,?boolean?containsNumber)?{??
  49. ????????StringBuffer?sb?=?new?StringBuffer();??
  50. ????????for?(Character?ch?:?str.toCharArray())?{??
  51. ????????????sb.append(getPinYin(ch,?containsNumber));??
  52. ????????}??
  53. ??
  54. ????????return?sb.toString();??
  55. ????}??
  56. ??
  57. ????/**?
  58. ?????*?獲取字符串的拼音?
  59. ?????*??
  60. ?????*?@param?str?
  61. ?????*?@return?
  62. ?????*/??
  63. ????public?static?String?getPinYin(String?str)?{??
  64. ????????StringBuffer?sb?=?new?StringBuffer();??
  65. ????????for?(Character?ch?:?str.toCharArray())?{??
  66. ????????????sb.append(getPinYin(ch));??
  67. ????????}??
  68. ??
  69. ????????return?sb.toString();??
  70. ????}??
  71. ??
  72. ????/**?
  73. ?????*?獲取單個(gè)漢字的拼音,包含聲調(diào)?
  74. ?????*??
  75. ?????*?@param?ch?
  76. ?????*?@return?
  77. ?????*/??
  78. ????public?static?String?getPinYin(Character?ch)?{??
  79. ????????return?getPinYin(ch,?true);??
  80. ????}??
  81. ??
  82. ????/**?
  83. ?????*?獲取單個(gè)漢字的拼音?
  84. ?????*??
  85. ?????*?@param?ch?
  86. ?????*????????????漢字.?如果輸入非漢字,返回ch.?如果輸入null,返回空字符串;?
  87. ?????*?@param?containsNumber?
  88. ?????*????????????true?=?包含聲調(diào),false?=?不包含聲調(diào)?
  89. ?????*?@return?
  90. ?????*/??
  91. ????public?static?String?getPinYin(Character?ch,?boolean?containsNumber)?{??
  92. ????????if?(ch?!=?null)?{??
  93. ????????????int?code?=?ch.hashCode();??
  94. ????????????if?(map.containsKey(code))?{??
  95. ????????????????if?(containsNumber)?{??
  96. ????????????????????return?map.get(code).getPinyin();??
  97. ????????????????}?else?{??
  98. ????????????????????return?map.get(code).getPinyin().replaceAll("[0-9]",?"");??
  99. ????????????????}??
  100. ????????????}?else?{??
  101. ????????????????return?ch.toString();??
  102. ????????????}??
  103. ????????}??
  104. ????????return?"";??
  105. ????}??
  106. }??

PinYinEle.java

[java]?view plaincopy

  1. package?com.siqi.pinyin;??
  2. ??
  3. public?class?PinYinEle?{??
  4. ????private?int?unicode;??
  5. ????private?String?ch;??
  6. ????private?String?pinyin;??
  7. ??????
  8. ????public?PinYinEle(){}??
  9. ??????
  10. ????public?PinYinEle(String?str){??
  11. ????????if(str!=null){??
  12. ????????????String[]?strs?=?str.split(",");??
  13. ????????????if(strs.length?==?3){??
  14. ????????????????try{??
  15. ????????????????this.unicode?=?Integer.parseInt(strs[0]);??
  16. ????????????????}catch(Exception?e){??
  17. ??????????????????????
  18. ????????????????}??
  19. ????????????????this.ch?=?strs[1];??
  20. ????????????????this.pinyin?=?strs[2];??
  21. ????????????}??
  22. ????????}??
  23. ??????????
  24. ????}??
  25. ??????
  26. ????public?int?getUnicode()?{??
  27. ????????return?unicode;??
  28. ????}??
  29. ????public?void?setUnicode(int?unicode)?{??
  30. ????????this.unicode?=?unicode;??
  31. ????}??
  32. ????public?String?getCh()?{??
  33. ????????return?ch;??
  34. ????}??
  35. ????public?void?setCh(String?ch)?{??
  36. ????????this.ch?=?ch;??
  37. ????}??
  38. ????public?String?getPinyin()?{??
  39. ????????return?pinyin;??
  40. ????}??
  41. ????public?void?setPinyin(String?pinyin)?{??
  42. ????????this.pinyin?=?pinyin;??
  43. ????}??
  44. ??????
  45. ??????
  46. }??


生成的data.dat里面內(nèi)容(部分)為:

[java]?view plaincopy

  1. 19968,一,yi1??
  2. 19969,丁,ding1??
  3. 19970,丂,kao3??
  4. 19971,七,qi1??
  5. 19972,丄,shang4??
  6. 19973,丅,xia4??
  7. 19974,丆,han3??
  8. 19975,萬(wàn),wan4??
  9. 19976,丈,zhang4??
  10. 19977,三,san1??
  11. 19978,上,shang4??
  12. 19979,下,xia4??
  13. 19980,丌,qi2??
  14. 19981,不,bu4??

運(yùn)行DictMain.java結(jié)果

執(zhí)行時(shí)間可能會(huì)有幾十分鐘到幾小時(shí)不等,總共會(huì)下載200+M的網(wǎng)頁(yè)(20000+個(gè)網(wǎng)頁(yè)),每次運(yùn)行都會(huì)先判斷以前下載過(guò)沒(méi)有,所以結(jié)束掉程序不會(huì)有影響

顯示All prepared!表示已經(jīng)準(zhǔn)備好了,刷新項(xiàng)目文件夾,可以看到網(wǎng)頁(yè)保持在dict/pages下面,不建議在elipse中打開(kāi)那個(gè)文件夾,因?yàn)槔锩嬗?萬(wàn)多個(gè)文件,會(huì)卡死eclipse,

還可以看到生成了data.txt文件,改為data.dat并復(fù)制到pinyin文件夾下面

運(yùn)行PinYin.java

可以看到"大家haome12345"的拼音:

[java]?view plaincopy

  1. 包含聲調(diào):da4jia1haome12345??
  2. 包含聲調(diào):dajiahaome12345??

上面只是顯示了如何獲取拼音,獲取筆畫(huà)等的方法類(lèi)似,在這里就不演示了。

總結(jié)

以上是生活随笔為你收集整理的java实现汉字字典的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。