しょぼいマシンで自作プログラミング言語を作ってみる　（１５）

2025年1月3日 20:03

fmainブロック内に文字列を式として配置するまで

fmainブロックのAST化

　fmain宣言に続くコードブロックのAST化ができました。
　コメントアウトもコードブロックに追加できるようにしておきました。

  
fmain link {
    /# fmain block
    // comment oneline
    //< comment block //>
    /#< comment doc block /#>
}

　ASTを表示させると以下の通りになります。

-- to AST --
ast::Root(File)
 => Module(File)
    |-> filename: ../example/bootstrap/fmain2.trot
    |-> package name: main
    |-> module name: fmain2
     => fmain
        |-> entryPointId: link
         => Control block
            |-> beginLocation: [3,12]
            |-> endLocation: [8,1]
             => CommentOut For Doc
                |-> location: [4,5]
                |-> body: fmain block
             => CommentOut
                |-> location: [5,5]
                |-> body: comment oneline
             => CommentOut
                |-> beginLocation: [6,5]
                |-> endLocation: [6,25]
                |-> body: comment block
             => CommentOut For Doc
                |-> beginLocation: [7,5]
                |-> endLocation: [7,29]
                |-> body: comment doc block

　fmain宣言の子としてブロック部のノードを追加しています。
　そこに色々と順次ノードを追加できるようにします。

fmainブロックをLLVM IRに出力

　さて、fmainブロックのLLVM IRの生成は抜粋すると以下の感じになりました。

llvm::Value *AstNodeControlBlock::generateLlvmIr(IrGenContext *const context_) {
  llvm::IRBuilder<> *lBuilder = context_->getBuilder();
  if (hasChild()) {
    AstNode *child = getChild();
    child->generateLlvmIr(context_);
    while (child->hasNext()) {
      child = child->getNext();
      child->generateLlvmIr(context_);
    }
  }
  if (parent->getAstNodeType() == AstNodeTypeEnum::declareFmain &&
      !hasReturnVal()) {
    lBuilder->CreateRet(lBuilder->getInt32(0));
  }
  return nullptr;
}

　各ノードのgenerateLlvmIr()メソッドは戻り値をllvm::Valueのポインタにしました。
　理由としては文字列だったり、値をLLVM出力時に利用するときに値のポインタが必要になるからです。

　それとfmain宣言に付随するブロックの中に戻り値が設定していなかった場合は、自動的に戻り値を0にするようにしています。

　色々と話は脱線しましたが、実際にLLVM IRを表示した結果は以下の感じになりました。

-- ../example/bootstrap/fmain2.trot to LLVM IR --
; ModuleID = 'main.fmain2'
source_filename = "../example/bootstrap/fmain2.trot"

define i32 @main() {
entrypoint:
  ret i32 0
}

　出力された結果は、前回で出力した結果と同じになりました。

文字列型の解析

　文字列型の解析処理を追加しました。今回は生文字列型について下に記します。

生文字列について

　さて、ここで第１０回の解説にあった内容の修正をします。

　この話ではバッククォートは正規表現パターン型としていましたが、生文字列型という扱いにしました。
　正規表現パターンだけでなく、他の用途にも使えるかと思うので。

　生文字列における、バックスラッシュによるエスケープシーケンスの適用パターンは2つのみです。

改行前にある連続したバックスラッシュ。単独のバックスラッシュの後続に改行がある場合はバックスラッシュは読み取られず、行は連結される。連続バックスラッシュの場合は行は連結されず、バックスラッシュ1文字に置換される。
バッククォートの前のバックスラッシュ。生文字列のデリミタであるバッククォートの前にバックスラッシュを置くことで、生文字列の中にバッククォートを入れることが可能となる。

　ということで、以下のコードを変換してみます。



fmain link {
    /# fmain block
    // comment block
    ` \\ra\ \\
     \
 this is \
 string \\ \`\"\'
\o001013
\x192343
\u2831`;
}

　ASTに変換すると？


-- to AST --
ast::Root(File)
 => Module(File)
    |-> filename: ../example/bootstrap/fmain_rawstr1.trot
    |-> package name: main
    |-> module name: fmain_rawstr1
     => fmain
        |-> entryPointId: link
         => Control block
            |-> beginLocation: [3,12]
            |-> endLocation: [13,1]
             => CommentOut For Doc
                |-> location: [4,5]
                |-> body: fmain block
             => CommentOut
                |-> location: [5,5]
                |-> body: comment block
             => String
                |-> beginLocation: [6,5]
                |-> endLocation: [12,7]
                |-> value: " \\ra\ \
     this is string \\ `\"\'
\o001013
\x192343
\u2831"
                |-> length: 62

　こんな感じで改行前の連続バックスラッシュ、バッククォート前のバックスラッシュのみに文字列置換処理がされていますね。

　今回はここまで。次回はエスケープシーケンスも解釈した文字列型、文字型について、説明します。

しょぼいマシンで自作プログラミング言語を作ってみる （１５）