PRQL Compiler Architecture
Lexing & Parsing: PRQL source text is split into tokens with the Chumsky parser named “lexer”. The stream of tokens is then parsed into an Abstract Syntax Tree (AST).
Semantic Analysis: This stage resolves names (identifiers), extracts declarations, and determines frames (table columns in each step). A is declared containing the root module, which maps accessible names to their declarations.
- Assign an ID to each node (
Expr
andStmt
). - Look up identifiers in the module and find the associated declaration. The identifier is replaced with a fully qualified name that guarantees a unique name in . In some cases,
Expr::target
is also set. - Convert function calls to transforms (
from
,derive
,filter
) from toTransformCall
, which is more convenient for later processing. - Determine the type of expressions. If an expression is a reference to a table, use the frame of the table as the type. If it is a
TransformCall
, apply the transform to the input frame to obtain the resulting type. For simple expressions, try to infer fromExprKind
.
- Assign an ID to each node (
SQL Backend: This stage converts RQ into SQL. Each relation is transformed into an SQL query. Pipelines are analyzed and split into “AtomicPipelines” at appropriate positions, which can be represented by a single SELECT statement.
This process is also called anchoring, as it anchors a column definition to a specific location in the output query.
During this process,
sql::context
keeps track of:- Table instances in the query (to prevent mixing up multiple instances of the same table)
- Column definitions, whether computed or a reference to a table column