當前位置：首頁 > 编程语言 > python >内容正文

python

统计结合python_python 练习（一）代码统计工具的实现

發(fā)布時間：2023/12/15 python 44 豆豆

生活随笔收集整理的這篇文章主要介紹了统计结合python_python 练习（一）代码统计工具的实现小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

最近部門成立了一個python學習小組，旨在讓大家在做項目中開始成長起來，于是老大就給布置了第一個小任務：代碼統(tǒng)計工具，具體的需求如下：

需求：1. 能夠統(tǒng)計指定目錄下C++程序的代碼行數(shù)。2. C++程序文件包括.cpp和.h文件。3. 如果指定目錄下有子目錄，需要能夠遞歸遍歷所有子目錄。4. 能夠區(qū)分注釋和代碼。5. 不用考慮語句跨行問題。6. 輸出.cpp、.h文件個數(shù)、代碼行數(shù)、注釋行數(shù)及處理時間。7. 基于python3開發(fā)。

乍一看，感覺好像有點難度，一下要處理這么多的功能，又是統(tǒng)計特定文件的個數(shù)，又是要遍歷子目錄，還要統(tǒng)計注釋行。后面網(wǎng)上百度了一下，萬能的python提供的函數(shù)簡直是方便用到不行，os的walk函數(shù)實現(xiàn)了目錄的遍歷，os.path的splitext能得到文件后綴名，現(xiàn)在最主要的問題來了：怎么識別代碼里的注釋。

在c++里，注釋主要分行注釋（//）和塊注釋（/* */）, 起初有2個想法短暫的想法覺得可以實現(xiàn)這個功能

1. 讀入的每一行當成一個字符串，判斷是否存在行注釋符或者塊注釋符

2. 使用正則表達式

很快這個想法就覺得行不通了，就一個簡單的例子，如果一個代碼行是這樣的:

"// test /* */";

就一個簡單的字符串，不管是第一種還是第二種方法，對于諸如此類的奇怪寫法，都是非常不好處理的，還有以下的幾種奇葩的寫法：

//int c = 0; \

int b = 0;

實際上面的2行都應該算是注釋的，單單用方式1，方式2就已經(jīng)無法做了，因為每行之間不是獨立的，行與行之間可能還存在一定的關(guān)系，正在百感交集之時，開發(fā)哥哥向我拋出了一個關(guān)鍵字：狀態(tài)機。讀行一行的每一個字符，然后根據(jù)字符的不同，設置不同的狀態(tài)，很多的集成開發(fā)工具代碼行的統(tǒng)計都是采用此類方法，于是，我毅然決定采用此方法來實現(xiàn)注釋碼的統(tǒng)計。

在實現(xiàn)之處，確實有一種一頭霧水的感覺，因為不清楚到底有哪些狀態(tài)，于是用vs工具，結(jié)合開發(fā)哥哥提的意見，整了一個特別奇葩的cpp文檔，如下所示：

/*123*/

/*/**/

intg_i;/*" /*

int g_j;*/#include

int main(int argc, char *argv[])

{/*" test*/int i; //*/

int j; /*test*/

int k; //test

/*test*/ j = 0;const char *test = "test //";int m; /*/*

const char *test2 = "test /*";

const char *test3 = "*/ "test"; //*/

/** /

**/

/*//*//* */intn;const char *test4 = "/*"

"//"

" "

"";//t1 \

2222;

cout<<"i am \

loleina";

//"test\"

int a=0,\

c;//\test

b=1cout<<"i am \r loleima";return 0;

}

然后就發(fā)現(xiàn)了一些規(guī)律：當字符是 “ 或者是 / 或者是 * 或者是 \ 的時候，代碼行的狀態(tài)是有可能發(fā)生改變的，假設有3個狀態(tài)，初始狀態(tài)，正常狀態(tài)，字符串狀態(tài)，最開始打開一個文本，設置行狀態(tài)為初始狀態(tài)，讀入第一行，如果讀入的字符是” ，則表示進入字符串狀態(tài)，在字符串狀態(tài)內(nèi)，讀入字符// 或者 /* 這些都是不管的，只有當再次讀到”時，狀態(tài)才會發(fā)生改變，此時則變成正常狀態(tài)。如果讀字符是/ , 則有可能進入注釋狀態(tài)了，設定為預注釋狀態(tài)，如果下一個字符讀入的是/ ,則表示進入行注釋狀態(tài)，如果讀入的是*，則表示進入塊注釋狀態(tài)，在行注釋里，只需要考慮讀入換行符或者讀入拼接符\，其他字符的讀入都不會引起狀態(tài)的改變，而在塊注釋就跟簡單，只需要考慮正常退出即可。整理下思維邏輯后，弄了個狀態(tài)轉(zhuǎn)換圖,如下所示：

畫完狀態(tài)圖后，感覺狀態(tài)之間的轉(zhuǎn)換關(guān)系理得差不多了，但是還有2個問題沒解決，第一個問題：

1. 怎么計算有效代碼數(shù)?

2. 行與行之間如果有關(guān)聯(lián)關(guān)系的時候怎么處理？

第一個問題，可以為每行設置一個狀態(tài):is_effetive_code, 默認設置為false，但是只要在正常狀態(tài)下，讀入到一個有效字符（非/）就判斷當前行為有效代碼行，就設置bool值為true，在每行最后再判斷該值，如果是true，則代碼行加1；

第二個問題，只有代碼行處于拼接狀態(tài)和塊注釋狀態(tài)時，才會影響第二行第一個字符的狀態(tài)，所以在每行最后，保存當前狀態(tài)，如果當前狀態(tài)拼接狀態(tài)和塊注釋狀態(tài)時，設置第二行狀態(tài)為拼接狀態(tài)和塊注釋狀態(tài)就可以了。

問題基本都解決了，開始實現(xiàn)了，總共分2個部分，一個函數(shù)CountFies實現(xiàn)從路徑得到有效文件。一個函數(shù)ReadEffetiveCodeNumber則是統(tǒng)計有效文件的所有有效代碼：

def CountFies(path):

filesList=[]for root, dirs, files inos.walk(path):for file infiles:if (os.path.splitext(file)[1] == '.h' or os.path.splitext(file)[1] == '.cpp'):

filename=os.path.join(root, file)

filesList.append(filename)return filesList

def ReadEffetiveCodeNumber(filesList):

codes_numbers= 0

for i inrange(len(filesList)):

filename=filesList[i]

infp= open(filename,encoding = 'gbk', errors = 'ignore')

lines=infp.readlines()

row_cur_status=Status.Commonfor li inlines:

row_pre_status=Status.Init

li= li.strip("\r\t")if len(li) == 1:continueis_effective_code=Falsefor charli inli:if row_cur_status ==Status.Common:if charli == '/':

row_cur_status=Status.PreCommentcontinue

if charli == '\"':

row_cur_status=Status.CharStringcontinue

if charli == '\n':continue

if charli == ' ':continue

else:

is_effective_code=Truecontinueelif row_cur_status==Status.CharString:if charli == '\"':

row_cur_status=Status.Common

is_effective_code=Truecontinue

if charli == '\\':

row_pre_status=row_cur_status

row_cur_status=Status.PreCombinationcontinue

else:continueelif row_cur_status==Status.PreComment:if charli == '/':

row_cur_status=Status.LineCommentcontinue

if charli == '*':

row_cur_status=Status.BlockCommentscontinue

else:continueelif row_cur_status==Status.LineComment:if charli == '\n':

row_cur_status=Status.Commoncontinue

if charli == '\\':

row_pre_status=row_cur_status

row_cur_status=Status.PreCombinationcontinue

else:continueelif row_cur_status==Status.BlockComments:if charli == '*':

row_cur_status=Status.PreExitCommentcontinue

else:continueelif row_cur_status==Status.PreExitComment:if charli == '/':

row_cur_status=Status.Commoncontinue

else:

row_cur_status=Status.BlockCommentscontinueelif row_cur_status==Status.PreCombination:if charli == '\n':

row_cur_status=Status.Combinationelse:

row_cur_status=row_pre_statusif is_effective_code ==True:

codes_numbers+= 1

if row_cur_status not in(Status.BlockComments, Status.Combination):

row_cur_status=Status.Commonif row_cur_status ==Status.Combination:

row_cur_status=row_pre_status

infp.close()return codes_numbers

if __name__=="__main__":

userdir= input("input the directory :")while os.path.isdir(userdir) ==False:

userdir= input("it is wrong,input the directory again :")

#1.表示普通狀態(tài) 2.表示字符串狀態(tài) 3.表示預注釋狀態(tài) 4.表示行注釋狀態(tài) 5.表示塊注釋狀態(tài) 6.表示預退出塊注釋狀態(tài) 7.表示預拼接狀態(tài) 8.表示拼接模式

Status= enum(Init=0,Common=1, CharString=2, PreComment=3,LineComment=4,BlockComments=5,PreExitComment=6,PreCombination=7,Combination=8)

#1.計算給定路徑下存在有效文件(.cpp,.h)的個數(shù)，并把有效文件的絕對路徑保存在list列表返回

#2.讀取list下每個文件的有效代碼行數(shù)，返回總的有效代碼行數(shù)

# A.讀取每個文件的每行

# B.去掉每行的前后空格，以及空行

# C.依次讀取每行的每個字符并設置狀態(tài)

start=time.time()

filesList=CountFies(userdir)

codes_numbers=ReadEffetiveCodeNumber(filesList)

end=time.time()

dealTime= end -start

print("it has %d files(.cpp,.h) and all effective code lines count:%d" %(len(filesList),codes_numbers ))

print ("Code execution consumes %s seconds" %dealTime)

于是用簡單的方法就實現(xiàn)了整體的需求，小組內(nèi)總共有7位童鞋都寫了，但是絕大部分用的是方法1查找是否為代碼行，有一位童鞋是用方法2正則表達式來實現(xiàn)的，但大家都考慮不夠全面，只能統(tǒng)計所謂‘正常人’編寫的代碼行，而第二種用正則表達式實現(xiàn)的，在性能上還有很大的問題，大家基本都是1s或者1s內(nèi)計算完成，而正則花了近7s才計算完成。完了，老大點評了下，我這種是最佳實現(xiàn)方式，然后我笑了~~~

這次的練習，實際收獲還是挺大的，因為實際編寫的過程還遇到一些問題，比如：寫完代碼后，把代碼放在linux下運行老大指定的特定文件夾，結(jié)果程序崩掉了，提示是字符碼的問題，但是測試自己放上去的文件夾，沒事；放在linux上運行的時候，提示語亂碼。

最后，針對2周的相關(guān)學習，總結(jié)如下：

1. 讀批量文件時，應該提供異常捕獲機制。可能會由于文件內(nèi)的編解碼格式與讀入的格式不一致，會直接導致程序崩潰。

2. 盡量少的使用正則表達式，正則表達式使用多的時候，可能會存在性能上的某些瓶頸。

3. 學習os.walk函數(shù)：官網(wǎng)文檔：https://docs.python.org/2/library/os.html，屬于os模塊

os.walk(top, topdown=True, οnerrοr=None, followlinks=False)

Generate the file namesin a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

dirpathis a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).

4. 學習打開文件函數(shù) open。https://docs.python.org/2/library/io.html，屬于IO模塊。

io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)

Open file andreturn a corresponding stream. If the file cannot be opened, an IOError is raised.

5. 學習splitext 函數(shù)。https://docs.python.org/2/library/os.path.htm，l屬于IO-path模塊。

os.path.splitext(path)

Split the pathname path into a pair (root, ext) such that root+ ext == path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').

Changedin version 2.6: Earlier versions could produce an empty root when the only period was the first character.

6. 對python各個模塊的官網(wǎng)學習，顯得更加迫在眉睫，之前看各種文檔和別人的博客，總感覺少了什么，果然每次使用還是得去查官網(wǎng)文檔才能正確使用，接下來的一段日子，會靜下心去學習python官網(wǎng)的各類文檔，盡力每學一篇，再自己總結(jié)下~~~

總結(jié)

以上是生活随笔為你收集整理的统计结合python_python 练习（一）代码统计工具的实现的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：为什么我爱Golang
下一篇： python socket发包_pyth

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

统计结合python_python 练习（一）代码统计工具的实现

總結(jié)