當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

实现读取txt文本统计文本单词出现次数

發(fā)布時(shí)間：2023/12/10 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了实现读取txt文本统计文本单词出现次数小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

//介紹：
//InputStream 是所有字節(jié)輸入流的超類，一般使用它的子類：FileInputStream等，它能輸出字節(jié)流；其他地方往應(yīng)用程序輸入數(shù)據(jù)，也就是從其他位置讀取數(shù)據(jù)在應(yīng)用程序中；
//InputStreamReader是字節(jié)流與字符流之間的橋梁，能將字節(jié)流輸出為字符流，并且能為字節(jié)流指定字符集，可輸出一個(gè)個(gè)的字符。
//FileInputStream 繼承于InputStream 用于讀取本地文件中的字節(jié)數(shù)據(jù),屬于文件操作。
//BufferedReader ：提供通用的緩沖方式文本讀取，readLine讀取一個(gè)文本行，從字符輸入流中讀取文本，緩沖各個(gè)字符，從而提供字符、數(shù)組和行的高效讀取。

//File 文件類文件可執(zhí)行的操作有創(chuàng)建、刪除、查看文件是否存在、查看文件是否包含內(nèi)容

我將讀取txt文本內(nèi)容并統(tǒng)計(jì)每個(gè)單詞出現(xiàn)次數(shù) 并將結(jié)果排序分成四個(gè)小問題

1）讀取文本內(nèi)容

2）將文本內(nèi)容分割成一個(gè)個(gè)字符串?dāng)?shù)組（單詞）并統(tǒng)計(jì)每個(gè)單詞出現(xiàn)的次數(shù)

3）對(duì)統(tǒng)計(jì)結(jié)果進(jìn)行排序

4）將結(jié)果輸出

完整的java代碼如下：

public class TxtFile {

? ? private String path;

? ? public TxtFile( ) {

? ? }

? ? /**
? ? ?* 用文檔的路徑構(gòu)造一個(gè)txtFile（txt文件）類
? ? ?* @param _path 文檔的路徑
? ? ?*/
? ? public TxtFile( String _path) {
? ? ? ? this.path = _path ;
? ? }

? ? public void setPath(String path) {
? ? ? ? this.path = path;
? ? }

? ? public String getPath() {
? ? ? ? return path;
? ? }

? ? /**
? ? ?* 功能：從txt文本中讀取所有行內(nèi)容并以一個(gè)字符串的形式將文本內(nèi)容返回
? ? ?* @return 字符串形式的文本內(nèi)容
? ? ?* @throws IOException
? ? ?*/
? ? public String readWords() throws IOException {

//將文件中內(nèi)容讀取到應(yīng)用程序中用的是輸入流（對(duì)應(yīng)用程序而言）
// 也就是將文件中的字節(jié)轉(zhuǎn)換成字符以緩沖方式進(jìn)行內(nèi)容的讀取利用輸入流讀取類讀取內(nèi)容
// 輸入流讀取創(chuàng)建的關(guān)鍵在于讀取的是那個(gè)文件文件的編碼為何種
//使用緩沖方式利用輸入流讀取對(duì)象進(jìn)行讀取多行讀取內(nèi)容
? ? ? ? String encoding = "GBK"; // 字符編碼(可解決中文亂碼問題 )
? ? ? ? File file = new File(this.path);

? ? ? ? String words = "";

? ? ? ? if (file.isFile() && file.exists()) {
? ? ? ? ? ? //輸入讀取流
? ? ? ? ? ? InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding);

? ? ? ? ? ? //通過默認(rèn)的帶有Buffer的字節(jié)輸入流與字符輸入流來多行讀取txt文本。
? ? ? ? ? ? BufferedReader bufferedReader = new BufferedReader(read);
? ? ? ? ? ? String lineTXT = null;

? ? ? ? ? ? //一行行讀取直到內(nèi)容為空因?yàn)閮?nèi)容以字符串的形式返回并用空格進(jìn)行分割所以在行末尾加上了空格
? ? ? ? ? ? while ((lineTXT = bufferedReader.readLine()) != null) {
? ? ? ? ? ? ? ? words += lineTXT.toString();
? ? ? ? ? ? ? ? words += " ";
? ? ? ? ? ? }
? ? ? ? ? ? read.close();

? ? ? ? }
? ? ? ? return words;
? ? }

? ? /**
? ? ?* 將文本的內(nèi)容分割成一個(gè)個(gè)單詞并統(tǒng)計(jì)每個(gè)單詞出現(xiàn)的次數(shù)
? ? ?* @param text 字符串形式的文本內(nèi)容
? ? ?* @return 返回鍵為單詞值為單詞出現(xiàn)的個(gè)數(shù)的 TreeMap類型的單詞統(tǒng)計(jì)情況
? ? ?*/
? ? public Map<String,Integer> countAWord(String text)
? ? {
? ? ? ? Map<String,Integer> map = new TreeMap<String,Integer>();
? ? ? ? String[] str = text.split(" ");
? ? ? ? for(int i=0;i<str.length; i++)
? ? ? ? {
? ? ? ? ? ? int num =1;
? ? ? ? ? ? for(int j=0;j<str.length ;j++)
? ? ? ? ? ? {
? ? ? ? ? ? ? ? if(j!=i && str[i].equals(str[j])) {
? ? ? ? ? ? ? ? ? ? num++;
? ? ? ? ? ? ? ? }
? ? ? ? ? ? }
? ? ? ? ? ? map.put(str[i],num) ;
? ? ? ? }
? ? ? ? return map ;
? ? }

? ? /**
? ? ?* 功能：將單詞出現(xiàn)次數(shù)情況按照出現(xiàn)次數(shù)進(jìn)行降序排序
? ? ?* @param map 文本內(nèi)容中每個(gè)單詞出現(xiàn)個(gè)數(shù)的統(tǒng)計(jì)情況存儲(chǔ)格式為TreeMap 鍵為單詞值為單詞出現(xiàn)個(gè)數(shù)
? ? ?* @return 經(jīng)過排序的單詞統(tǒng)計(jì)情況存儲(chǔ)格式為list
? ? ?*/
? ? public List<Map.Entry<String, Integer>> wordsSort(Map<String,Integer> map)
? ? {
? ? ? ? //Map是以Map.Entry<鍵,值>的形式進(jìn)行存儲(chǔ) 將map裝載到list中
? ? ? ? List<Map.Entry<String,Integer>> words = new ArrayList<Map.Entry<String, Integer>>(map.entrySet());

? ? ? ? //重寫sort方法比較器將兩個(gè)相鄰的list元素按照list中的map的值進(jìn)行排序 -降序和冒泡排序相似返回排序后的結(jié)果
? ? ? ? Collections.sort(words, new Comparator<Map.Entry<String, Integer>>() {

? ? ? ? ? ? public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
? ? ? ? ? ? ? ? //return (o2.getValue() - o1.getValue());
? ? ? ? ? ? ? ? return -(o1.getValue()).toString().compareTo(o2.getValue().toString());
? ? ? ? ? ? }
? ? ? ? });

? ? ? ? return words;
? ? }

? ? /**
? ? ?* 將單詞出現(xiàn)情況輸出至控制臺(tái)
? ? ?* @param words 排序后的單詞出現(xiàn)的統(tǒng)計(jì)情況存儲(chǔ)格式為list中元素時(shí)map
? ? ?*/
? ? public void print(List<Map.Entry<String, Integer>> words)
? ? {
? ? ? ? for (int i = 0; i < words.size(); i++) {

? ? ? ? ? ? System.out.printf("%.8s %d\n",words.get(i).getKey() ,words.get(i).getValue() );
? ? ? ? }
? ? }

}

主函數(shù)調(diào)用：

? ? ? ? TxtFile txtFile = new TxtFile("f:/data.txt") ;

? ? ? ? String words = txtFile.readWords() ;
? ? ? ? Map<String,Integer> map =null ;
? ? ? ? map = txtFile.countAWord(words) ;

? ? ? ? List<Map.Entry<String, Integer>> list = null;
? ? ? ? list = txtFile.wordsSort(map) ;
? ? ? ? txtFile.print(list) ;

總結(jié)

以上是生活随笔為你收集整理的实现读取txt文本统计文本单词出现次数的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：数据结构：(5)算法分析基础
下一篇：苹果手机6s运营商在哪里显示无服务器,i

编程问答

实现读取txt文本 统计文本单词出现次数

總結(jié)

实现读取txt文本统计文本单词出现次数