PRQL Compiler Architecture

    1. Lexing & Parsing: PRQL source text is split into tokens with the Chumsky parser named “lexer”. The stream of tokens is then parsed into an Abstract Syntax Tree (AST).

    2. Semantic Analysis: This stage resolves names (identifiers), extracts declarations, and determines frames (table columns in each step). A is declared containing the root module, which maps accessible names to their declarations.

      • Assign an ID to each node (Expr and Stmt).
      • Look up identifiers in the module and find the associated declaration. The identifier is replaced with a fully qualified name that guarantees a unique name in . In some cases, Expr::target is also set.
      • Convert function calls to transforms (from, derive, filter) from to TransformCall, which is more convenient for later processing.
      • Determine the type of expressions. If an expression is a reference to a table, use the frame of the table as the type. If it is a TransformCall, apply the transform to the input frame to obtain the resulting type. For simple expressions, try to infer from ExprKind.
    3. SQL Backend: This stage converts RQ into SQL. Each relation is transformed into an SQL query. Pipelines are analyzed and split into “AtomicPipelines” at appropriate positions, which can be represented by a single SELECT statement.

      This process is also called anchoring, as it anchors a column definition to a specific location in the output query.

      During this process, sql::context keeps track of:

      • Table instances in the query (to prevent mixing up multiple instances of the same table)
      • Column definitions, whether computed or a reference to a table column