日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

从零写一个编译器(九):语义分析之构造抽象语法树(AST)

發(fā)布時間:2023/12/20 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 从零写一个编译器(九):语义分析之构造抽象语法树(AST) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

項目的完整代碼在 C2j-Compiler

前言

在上一篇完成了符號表的構建,下一步就是輸出抽象語法樹(Abstract Syntax Tree,AST)

抽象語法樹(abstract syntax tree 或者縮寫為 AST),是源代碼的抽象語法結構的樹狀表現形式,這里特指編程語言的源代碼。樹上的每個節(jié)點都表示源代碼中的一種結構。

AST對于編譯器是至關重要的,現在的編譯型語言一般通過AST來生成IR,解釋型語言也可以不用虛擬機而直接遍歷AST來解釋執(zhí)行,之后要寫解釋器和編譯器都依賴這個AST

這一篇主要文件有:

  • AstBuilder.java
  • AstNode.java
  • AstNodeImpl.java
  • NodeKey.java
  • NodeFactory.java

主要數據結構

AST節(jié)點的表示

public interface AstNode {AstNode addChild(AstNode node);AstNode getParent();ArrayList<AstNode> getChildren();void setAttribute(NodeKey key, Object value);Object getAttribute(NodeKey key);boolean isChildrenReverse();void reverseChildren();AstNode copy();}

這是對AstNode接口的實現,并且繼承HashMap,這里的NodeKey是

TokenType, VALUE, SYMBOL, PRODUCTION, TEXT

對應的value,

  • TokenType就是非終結符的類型
  • Text用來存儲解析對象的文本信息
  • Symbol對應的就是變量的符號對象
  • Value是對應對象解析的值,比如int a = 1,那么value的值就為1
  • public class AstNodeImpl extends HashMap<NodeKey, Object> implements AstNode {private Token type;private AstNodeImpl parent;private ArrayList<AstNode> children;String name;private boolean isChildrenReverse = false;public AstNodeImpl(Token type) {this.type = type;this.parent = null;this.children = new ArrayList<>();setAttribute(NodeKey.TokenType, type);}@Overridepublic AstNode addChild(AstNode node) {if (node != null) {children.add(node);((AstNodeImpl) node).parent = this;}return node;}@Overridepublic AstNode getParent() {return parent;}@Overridepublic void reverseChildren() {if (isChildrenReverse) {return;}Collections.reverse(children);isChildrenReverse = true;}@Overridepublic boolean isChildrenReverse() {return isChildrenReverse;}@Overridepublic ArrayList<AstNode> getChildren() {reverseChildren();return children;}@Overridepublic void setAttribute(NodeKey key, Object value) {if (key == NodeKey.TEXT) {name = (String) value;}put(key, value);}@Overridepublic Object getAttribute(NodeKey key) {return get(key);}@Overridepublic String toString() {String info = "";if (get(NodeKey.VALUE) != null) {info += "Node Value is " + get(NodeKey.VALUE).toString();}if (get(NodeKey.TEXT) != null) {info += "\nNode Text is " + get(NodeKey.TEXT).toString();}if (get(NodeKey.SYMBOL) != null) {info += "\nNode Symbol is " + get(NodeKey.SYMBOL).toString();}return info + "\n Node Type is " + type.toString();}@Overridepublic AstNode copy() {AstNodeImpl copy = (AstNodeImpl) NodeFactory.createICodeNode(type);Set<Entry<NodeKey, Object>> attributes = entrySet();Iterator<Map.Entry<NodeKey, Object>> it = attributes.iterator();while (it.hasNext()) {Map.Entry<NodeKey, Object> attribute = it.next();copy.put(attribute.getKey(), attribute.getValue());}return copy;} }

    NodeFactory

    NodeFactory就是簡單的返回一個節(jié)點的實現

    public class NodeFactory {public static AstNode createICodeNode(Token type) {return new AstNodeImpl(type);} }

    構造AST

    AST的創(chuàng)建也是需要在語法分析過程中根據reduce操作進行操作的。也就是在takeActionForReduce方法中調用AstBuilder的buildSyntaxTree方法

    在AstBuilder里面還是需要兩個堆棧來輔助操作

    private Stack<AstNode> nodeStack = new Stack<>();private LRStateTableParser parser = null; private TypeSystem typeSystem = null; private Stack<Object> valueStack = null; private String functionName; private HashMap<String, AstNode> funcMap = new HashMap<>();private static AstBuilder instance;

    構造AST的主要邏輯在buildSyntaxTree方法里,需要注意的是有一些節(jié)點在解釋執(zhí)行和代碼生成的時候是不一樣的,有時代碼生成需要的節(jié)點解釋執(zhí)行的話并不需要

    在這里提一下UNARY這個非終結符,這個非終結符和NAME很像,但是它一般是代表進行運算和一些操作的時候,比如數組,++,--或者函數調用的時候

    其實構建AST的過程和符號表的構建過程有點兒類似,都是根據reduce操作來創(chuàng)建信息和組合信息,符號表是組合修飾符說明符等,而AST則是組合節(jié)點間的關系變成一棵樹

    我們只看幾個操作

    • Specifiers_DeclList_Semi_TO_Def

      這個節(jié)點需要注意的是,從堆棧的什么地方拿到Symbol,這個需要從reduce次數和推導式中得出
    * DEF -> SPECIFIERS DECL_LIST SEMI * DECL -> VAR_DECL * VAR_DECL -> NEW_NAME * | VAR_DECL LP RP * | VAR_DECL LP VAR_LIST RP * | LP VAR_DECL RP * | START VAR_DECL

    從推導式可以看出,DEF節(jié)點的符號應該在valueStack.size() - 3,但是DECL和VAR_DECL沒有做reduce操作,所以符號應該在valueStack.size() - 2。這其實和前面的符號表構建算出之前符號的位置是一樣的。

    • TO_UNARY

    這里則是變量、數字或者字符串的節(jié)點,如果是個變量的號,這個節(jié)點就需要一個Symbol的value了

    case SyntaxProductionInit.Number_TO_Unary: case SyntaxProductionInit.Name_TO_Unary: case SyntaxProductionInit.String_TO_Unary:node = NodeFactory.createICodeNode(Token.UNARY);if (production == SyntaxProductionInit.Name_TO_Unary) {assignSymbolToNode(node, text);}node.setAttribute(NodeKey.TEXT, text);break;

    其余的節(jié)點無非是把一些語句拆分它的邏輯然后組成節(jié)點,真正的求值部分像Name_TO_Unary比較少,更多是比如把一個if else塊分成if節(jié)點、判斷節(jié)點、else節(jié)點,之后再按照這棵樹進行解釋執(zhí)行或者代碼生成

    public AstNode buildSyntaxTree(int production, String text) {AstNode node = null;Symbol symbol = null;AstNode child = null;if (Start.STARTTYPE == Start.INTERPRETER) {int p1 = SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def;int p2 = SyntaxProductionInit.Def_To_DefList;int p3 = SyntaxProductionInit.DefList_Def_TO_DefList;boolean isReturn = production == p1 || production == p2 || production == p3;if (isReturn) {return null;}}switch (production) {case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:node = NodeFactory.createICodeNode(Token.DEF);symbol = (Symbol) valueStack.get(valueStack.size() - 2);node.setAttribute(NodeKey.SYMBOL, symbol);break;case SyntaxProductionInit.Def_To_DefList:node = NodeFactory.createICodeNode(Token.DEF_LIST);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.DefList_Def_TO_DefList:node = NodeFactory.createICodeNode(Token.DEF_LIST);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Number_TO_Unary:case SyntaxProductionInit.Name_TO_Unary:case SyntaxProductionInit.String_TO_Unary:node = NodeFactory.createICodeNode(Token.UNARY);if (production == SyntaxProductionInit.Name_TO_Unary) {assignSymbolToNode(node, text);}node.setAttribute(NodeKey.TEXT, text);break;case SyntaxProductionInit.Unary_LP_RP_TO_Unary:node = NodeFactory.createICodeNode(Token.UNARY);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Unary_LP_ARGS_RP_TO_Unary:node = NodeFactory.createICodeNode(Token.UNARY);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Unary_Incop_TO_Unary:case SyntaxProductionInit.Unary_DecOp_TO_Unary:case SyntaxProductionInit.LP_Expr_RP_TO_Unary:case SyntaxProductionInit.Start_Unary_TO_Unary:node = NodeFactory.createICodeNode(Token.UNARY);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Unary_LB_Expr_RB_TO_Unary:node = NodeFactory.createICodeNode(Token.UNARY);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Uanry_TO_Binary:node = NodeFactory.createICodeNode(Token.BINARY);child = nodeStack.pop();node.setAttribute(NodeKey.TEXT, child.getAttribute(NodeKey.TEXT));node.addChild(child);break;case SyntaxProductionInit.Binary_TO_NoCommaExpr:case SyntaxProductionInit.NoCommaExpr_Equal_NoCommaExpr_TO_NoCommaExpr:node = NodeFactory.createICodeNode(Token.NO_COMMA_EXPR);child = nodeStack.pop();String t = (String) child.getAttribute(NodeKey.TEXT);node.addChild(child);if (production == SyntaxProductionInit.NoCommaExpr_Equal_NoCommaExpr_TO_NoCommaExpr) {child = nodeStack.pop();t = (String) child.getAttribute(NodeKey.TEXT);node.addChild(child);}break;case SyntaxProductionInit.Binary_Plus_Binary_TO_Binary:case SyntaxProductionInit.Binary_DivOp_Binary_TO_Binary:case SyntaxProductionInit.Binary_Minus_Binary_TO_Binary:case SyntaxProductionInit.Binary_Start_Binary_TO_Binary:node = NodeFactory.createICodeNode(Token.BINARY);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Binary_RelOP_Binary_TO_Binray:node = NodeFactory.createICodeNode(Token.BINARY);node.addChild(nodeStack.pop());AstNode operator = NodeFactory.createICodeNode(Token.RELOP);operator.setAttribute(NodeKey.TEXT, parser.getRelOperatorText());node.addChild(operator);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.NoCommaExpr_TO_Expr:node = NodeFactory.createICodeNode(Token.EXPR);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Expr_Semi_TO_Statement:case SyntaxProductionInit.CompountStmt_TO_Statement:node = NodeFactory.createICodeNode(Token.STATEMENT);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.LocalDefs_TO_Statement:node = NodeFactory.createICodeNode(Token.STATEMENT);if (Start.STARTTYPE == Start.CODEGEN) {node.addChild(nodeStack.pop());}break;case SyntaxProductionInit.Statement_TO_StmtList:node = NodeFactory.createICodeNode(Token.STMT_LIST);if (nodeStack.size() > 0) {node.addChild(nodeStack.pop());}break;case SyntaxProductionInit.FOR_OptExpr_Test_EndOptExpr_Statement_TO_Statement:node = NodeFactory.createICodeNode(Token.STATEMENT);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.StmtList_Statement_TO_StmtList:node = NodeFactory.createICodeNode(Token.STMT_LIST);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Expr_TO_Test:node = NodeFactory.createICodeNode(Token.TEST);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.If_Test_Statement_TO_IFStatement:node = NodeFactory.createICodeNode(Token.IF_STATEMENT);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.IfElseStatemnt_Else_Statemenet_TO_IfElseStatement:node = NodeFactory.createICodeNode(Token.IF_ELSE_STATEMENT);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.While_LP_Test_Rp_TO_Statement:case SyntaxProductionInit.Do_Statement_While_Test_To_Statement:node = NodeFactory.createICodeNode(Token.STATEMENT);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Expr_Semi_TO_OptExpr:case SyntaxProductionInit.Semi_TO_OptExpr:node = NodeFactory.createICodeNode(Token.OPT_EXPR);if (production == SyntaxProductionInit.Expr_Semi_TO_OptExpr) {node.addChild(nodeStack.pop());}break;case SyntaxProductionInit.Expr_TO_EndOpt:node = NodeFactory.createICodeNode(Token.END_OPT_EXPR);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.LocalDefs_StmtList_TO_CompoundStmt:node = NodeFactory.createICodeNode(Token.COMPOUND_STMT);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.NewName_LP_RP_TO_FunctDecl:case SyntaxProductionInit.NewName_LP_VarList_RP_TO_FunctDecl:node = NodeFactory.createICodeNode(Token.FUNCT_DECL);node.addChild(nodeStack.pop());child = node.getChildren().get(0);functionName = (String) child.getAttribute(NodeKey.TEXT);symbol = assignSymbolToNode(node, functionName);break;case SyntaxProductionInit.NewName_TO_VarDecl:nodeStack.pop();break;case SyntaxProductionInit.NAME_TO_NewName:node = NodeFactory.createICodeNode(Token.NEW_NAME);node.setAttribute(NodeKey.TEXT, text);break;case SyntaxProductionInit.OptSpecifiers_FunctDecl_CompoundStmt_TO_ExtDef:node = NodeFactory.createICodeNode(Token.EXT_DEF);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());funcMap.put(functionName, node);break;case SyntaxProductionInit.NoCommaExpr_TO_Args:node = NodeFactory.createICodeNode(Token.ARGS);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.NoCommaExpr_Comma_Args_TO_Args:node = NodeFactory.createICodeNode(Token.ARGS);node.addChild(nodeStack.pop());node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Return_Semi_TO_Statement:node = NodeFactory.createICodeNode(Token.STATEMENT);break;case SyntaxProductionInit.Return_Expr_Semi_TO_Statement:node = NodeFactory.createICodeNode(Token.STATEMENT);node.addChild(nodeStack.pop());break;case SyntaxProductionInit.Unary_StructOP_Name_TO_Unary:node = NodeFactory.createICodeNode(Token.UNARY);node.addChild(nodeStack.pop());node.setAttribute(NodeKey.TEXT, text);break;default:break;}if (node != null) {node.setAttribute(NodeKey.PRODUCTION, production);nodeStack.push(node);}return node; }

    小結

    其實構造AST和創(chuàng)建符號表上非常相似,都是依據reduce操作的信息來完成。在AST的構建中的主要任務就是對源代碼語句里的邏輯進行分塊,比如對于一個ifelse語句:

    上面的圖是我依據這個意思話的,和上面構造出來的AST不完全一致

    另外我的github博客:https://dejavudwh.cn/

    轉載于:https://www.cnblogs.com/secoding/p/11379216.html

    總結

    以上是生活随笔為你收集整理的从零写一个编译器(九):语义分析之构造抽象语法树(AST)的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。