當前位置：首頁 > 编程语言 > python >内容正文

python

【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 1.）（python/c/c++版）（笔记）

發布時間：2025/3/20 python 31 豆豆

生活随笔收集整理的這篇文章主要介紹了【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 1.）（python/c/c++版）（笔记）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原文：Let’s Build A Simple Interpreter. Part 1.

文章目錄

- 【編譯原理】讓我們來構建一個簡單的解釋器（Let’s Build A Simple Interpreter. Part 1.）（python/c/c++版）（part 1）
- - 將輸入字符串分解為token標記的過程稱為詞法分析（lexical analysis），詞法分析器（ lexical analyzer或 lexer ）也叫掃描器（scanner ）或標記器（tokenizer）。
  - 用c/c++實現（版本一：原版）
  - 版本二（版本一的升級版，[自認為]解決了版本一的一些混亂邏輯）

【編譯原理】讓我們來構建一個簡單的解釋器（Let’s Build A Simple Interpreter. Part 1.）（python/c/c++版）（part 1）

pascal代碼，我們要為這種代碼做一個解釋器

program factorial;function factorial(n: integer): longint; beginif n = 0 thenfactorial := 1elsefactorial := n * factorial(n - 1); end;varn: integer;beginfor n := 0 to 16 dowriteln(n, '! = ', factorial(n)); end.

首先我們先來實現pascal編譯器的一個小功能，兩個數的加法，我們準備用python實現，但是如果你想用其他語言實現也是可行的，這是它的代碼：

# -*- coding: utf-8 -*- """ @File : 1.py @Time : 2021/5/19 14:44 @Author : Dontla @Email : sxana@qq.com @Software: PyCharm """ # Token types # # EOF (end-of-file) token is used to indicate that # there is no more input left for lexical analysis INTEGER, PLUS, EOF = 'INTEGER', 'PLUS', 'EOF'class Token(object):def __init__(self, type_, value):# token type: INTEGER, PLUS, or EOFself.type = type_# token value: 0, 1, 2. 3, 4, 5, 6, 7, 8, 9, '+', or Noneself.value = valuedef __str__(self):"""String representation of the class instance.Examples:Token(INTEGER, 3)Token(PLUS '+')"""return 'Token({type}, {value})'.format(type=self.type,value=repr(self.value))def __repr__(self):return self.__str__()class Interpreter(object):def __init__(self, text):# 客戶端字符串輸入, 比如 "3+5"self.text = text# self.pos is an index into self.textself.pos = 0# current token instance（當前標記實例）self.current_token = Nonedef error(self):raise Exception('Error parsing input') # 語法分析輸入出錯def get_next_token(self):"""Lexical analyzer (also known as scanner or tokenizer)This method is responsible for breaking a sentenceapart into tokens. One token at a time.詞法分析器（也稱為掃描器scanner或標記器tokenizer）這個方法負責將一個句子分解成標記tokens。一次一個標記"""text = self.text# is self.pos index past the end of the self.text ?# if so, then return EOF token because there is no more# input left to convert into tokens# self.pos索引是否超過self.text的結尾？# 如果是，則返回EOF標記，因為沒有更多的標記# 向左輸入以轉換為標記if self.pos > len(text) - 1:return Token(EOF, None)# get a character at the position self.pos and decide# what token to create based on the single character# 在self.pos位置獲取一個字符，并根據單個字符決定要創建的標記current_char = text[self.pos]# if the character is a digit then convert it to# integer, create an INTEGER token, increment self.pos# index to point to the next character after the digit,# and return the INTEGER token# 如果字符是數字，則將其轉換為整型，創建整型標記，增加self.pos索引以指向數字后面的下一個字符，然后返回整型標記if current_char.isdigit(): # isdigit()函數，全是數字返回True，否則返回Falsetoken = Token(INTEGER, int(current_char)) # 創建一個tokenself.pos += 1return tokenif current_char == '+':token = Token(PLUS, current_char)self.pos += 1return tokenself.error()def eat(self, token_type):# compare the current token type with the passed token# type and if they match then "eat" the current token# and assign the next token to the self.current_token,# otherwise raise an exception.if self.current_token.type == token_type:self.current_token = self.get_next_token()else:self.error()def expr(self):"""expr -> INTEGER PLUS INTEGER"""# set current token to the first token taken from the inputself.current_token = self.get_next_token()# we expect the current token to be a single-digit integerleft = self.current_tokenself.eat(INTEGER)# we expect the current token to be a '+' tokenop = self.current_tokenself.eat(PLUS)# we expect the current token to be a single-digit integerright = self.current_tokenself.eat(INTEGER)# after the above call the self.current_token is set to# EOF token# at this point INTEGER PLUS INTEGER sequence of tokens# has been successfully found and the method can just# return the result of adding two integers, thus# effectively interpreting client inputresult = left.value + right.valuereturn resultdef main():while True:try:# 要在Python3下運行，請將“raw_input”調用替換為“input”# text = raw_input('calc> ')text = input('calc> ') # 獲取鍵盤輸入，參數為提示信息except EOFError: # 不知是什么異常breakif not text:continueinterpreter = Interpreter(text)result = interpreter.expr()print(result)if __name__ == '__main__':main()

運行結果：

D:\python_virtualenv\my_flask\Scripts\python.exe C:/Users/Administrator/Desktop/新建文件夾/1.py calc> 1+2 3 calc>

將輸入字符串分解為token標記的過程稱為詞法分析（lexical analysis），詞法分析器（ lexical analyzer或 lexer ）也叫掃描器（scanner ）或標記器（tokenizer）。

>>> from calc1 import Interpreter >>> >>> interpreter = Interpreter('3+5') >>> interpreter.get_next_token() Token(INTEGER, 3) >>> >>> interpreter.get_next_token() Token(PLUS, '+') >>> >>> interpreter.get_next_token() Token(INTEGER, 5) >>> >>> interpreter.get_next_token() Token(EOF, None) >>>

讓我們回顧一下您的解釋器如何評估算術表達式：

解釋器接受一個輸入字符串，比如說“3+5”
解釋器調用expr方法在詞法分析器get_next_token返回的標記流中查找結構。它試圖找到的結構是INTEGER PLUS INTEGER的形式。在確認結構后，它通過添加兩個INTEGER標記的值來解釋輸入，因為此時解釋器很清楚它需要做的是添加兩個整數 3 和 5。

檢查理解：
什么是解釋器interpreter？
什么是編譯器compiler？
解釋器和編譯器有什么區別？
什么是標記token？
將輸入分解為標記的過程的名稱是什么？（詞法分析）
進行詞法分析lexical analysis的解釋器的部分是什么？
解釋器或編譯器的那部分的其他常見名稱是什么？

用c/c++實現（版本一：原版）

#include <stdio.h> #include <stdlib.h> #include <memory.h> #include <string.h>struct Interpreter {char* text;int pos;struct Token (*get_next_token)(struct Interpreter*); };struct Token {int type;char value; };struct Token get_next_token(struct Interpreter* pipt) {if (pipt->pos > (strlen(pipt->text)-1)) {struct Token token = {3, '\0'};//3表示EOF，2表示+，1表示數字return token;}char current_char = pipt->text[pipt->pos];if (current_char>='0' && current_char<='9') {struct Token token = {1, current_char};pipt->pos++;return token;}if (current_char == '+') {struct Token token = { 2, current_char };pipt->pos++;return token;}printf("輸入非法！\n");exit(-1);//如果都不是以上的字符，則報錯并退出程序 }char eat(struct Token* pcurrent_token, struct Interpreter* pipt, int type) {char former_token_value = pcurrent_token->value;if (pcurrent_token->type == type) {*pcurrent_token = pipt->get_next_token(pipt);}else {printf("輸入非法！\n");exit(-1);}return former_token_value; }int expr(char* text) {struct Interpreter ipt = {text, 0, get_next_token};struct Token current_token = ipt.get_next_token(&ipt);char temp;temp = eat(&current_token, &ipt, 1);//斷言第一個字符是數字int left = temp - '0';eat(&current_token, &ipt, 2);//斷言第三個字符是加號temp = eat(&current_token, &ipt, 1);//斷言第三個字符是數字int right = temp - '0';int result = left + right;return result;}int main() {char text[10];while (1){printf("請輸入算式：\n");scanf_s("%s", text, sizeof(text));int result = expr(text);printf("= %d\n\n", result);}return 0; }

運行結果：

請輸入算式： 2+8 = 10請輸入算式： 1+5 = 6請輸入算式： 3+4 = 7請輸入算式：

版本二（版本一的升級版，[自認為]解決了版本一的一些混亂邏輯）

#include <stdio.h> #include <stdlib.h> #include <memory.h> #include <string.h>#define is_digital 0 #define is_plus 1 #define is_minus 2 #define is_EOF 3struct Token {int type;char value; };struct Interpreter {char* text;int pos;struct Token current_token;//struct Token(*get_next_token)(struct Interpreter*); };void error() {printf("輸入非法！\n");exit(-1); }void get_next_token(struct Interpreter* pipt) {if (pipt->pos > (strlen(pipt->text) - 1)) {pipt->current_token = { is_EOF, '\0' };return;//struct Token token = { is_EOF, '\0' };//3表示EOF，2表示+，1表示數字//pipt->current_token = token;}char current_char = pipt->text[pipt->pos];if (current_char >= '0' && current_char <= '9') {//struct Token token = { is_digital, current_char };pipt->current_token = { is_digital, current_char };pipt->pos++;return;//pipt->current_token = token;}if (current_char == '+') {//struct Token token = { is_plus , current_char };pipt->current_token = { is_plus, current_char };pipt->pos++;return;//pipt->current_token = token;}if (current_char == '-') {//struct Token token = { is_minus , current_char };pipt->current_token = { is_minus, current_char };pipt->pos++;return;//pipt->current_token = token;}error();//如果都不是以上的字符，則報錯并退出程序 }char eat(struct Interpreter* pipt, int type) {get_next_token(pipt);char token_value = pipt->current_token.value;if (pipt->current_token.type == type) {return token_value;}else {error();} }int expr(char* text) {struct Interpreter ipt = { text, 0 };//get_next_token(&ipt);char temp;temp = eat( &ipt, is_digital);//斷言第一個字符是數字int left = temp - '0';eat( &ipt, is_plus);//斷言第三個字符是加號temp = eat( &ipt, is_digital);//斷言第三個字符是數字int right = temp - '0';int result = left + right;return result;}int main() {char text[10];while (1){printf("請輸入算式：\n");scanf_s("%s", text, sizeof(text));int result = expr(text);printf("= %d\n\n", result);}return 0; }

運行結果：

請輸入算式： 3+6 = 9請輸入算式： 8+3 = 11請輸入算式：

總結

以上是生活随笔為你收集整理的【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 1.）（python/c/c++版）（笔记）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： C语言函数指针（结构体函数指针）
下一篇：【编译原理】让我们来构建一个简单的解释器

python

【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 1.）（python/c/c++版）（笔记）

文章目錄

【編譯原理】讓我們來構建一個簡單的解釋器（Let’s Build A Simple Interpreter. Part 1.）（python/c/c++版）（part 1）

將輸入字符串分解為token標記的過程稱為詞法分析（lexical analysis），詞法分析器（ lexical analyzer或 lexer ）也叫掃描器（scanner ）或標記器（tokenizer）。

用c/c++實現（版本一：原版）

版本二（版本一的升級版，[自認為]解決了版本一的一些混亂邏輯）

總結

將輸入字符串分解為token標記的過程稱為詞法分析（lexical analysis），詞法分析器（ lexical analyzer或 lexer ）也叫掃描器（scanner ）或標記器（tokenizer）。