Compiler Internals

A comprehensive technical deep-dive into the MathViz compiler architecture, implementation details, and optimization strategies. This document is intended for contributors, language implementers, and anyone interested in compiler design.

31,310

Lines of Code

Compiler Modules

Pipeline Stages

100+

AST Node Types

Overview

The MathViz compiler is a multi-stage source-to-source transpiler that transforms .mviz source files into optimized Python code. It follows a traditional compiler pipeline architecture with additional analysis passes for performance optimization.

Design Philosophy

The compiler prioritizes correctness, readable output, and seamless Python interoperability. Generated code is clean, idiomatic Python that can be inspected and debugged easily.

Key Design Decisions

Immutable AST: All AST nodes are frozen dataclasses for thread safety and predictability
Visitor Pattern: Tree traversal via the visitor pattern enables clean separation of concerns
Multi-pass Architecture: Separate analysis passes for type checking, purity, complexity, and parallelization
Optional Strictness: Type errors can be warnings or fatal errors depending on configuration
Zero Runtime Overhead: No MathViz runtime library required; output is standalone Python

Compilation Pipeline

The compiler executes the following stages in order. Each stage is independent and can be enabled/disabled via the CompilationPipeline configuration.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MathViz Compilation Pipeline                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Source Code (.mviz)                                                       │
│         │                                                                   │
│         ▼                                                                   │
│   ┌─────────────┐                                                           │
│   │   LEXER     │  Tokenization: Unicode math symbols, f-strings, keywords  │
│   │  lexer.py   │  Output: Token stream                                     │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │   PARSER    │  Recursive descent parser with operator precedence        │
│   │  parser.py  │  Output: Abstract Syntax Tree (AST)                       │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │   MODULE    │  Resolves `use` statements, loads dependencies            │
│   │   LOADER    │  Detects circular dependencies                            │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │    TYPE     │  Hindley-Milner style inference with bidirectional hints  │
│   │   CHECKER   │  Output: Type errors, function signatures                 │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ├────────────────────┬────────────────────┐                        │
│          ▼                    ▼                    ▼                        │
│   ┌─────────────┐      ┌─────────────┐      ┌─────────────┐                 │
│   │   PURITY    │      │ COMPLEXITY  │      │  PARALLEL   │                 │
│   │  ANALYZER   │      │  ANALYZER   │      │  ANALYZER   │                 │
│   └──────┬──────┘      └──────┬──────┘      └──────┬──────┘                 │
│          │                    │                    │                        │
│          └────────────────────┼────────────────────┘                        │
│                               │                                             │
│                               ▼                                             │
│                        ┌─────────────┐                                      │
│                        │    CODE     │  Emits Python with Numba decorators  │
│                        │  GENERATOR  │  Output: Executable Python code      │
│                        └──────┬──────┘                                      │
│                               │                                             │
│                               ▼                                             │
│                        Generated Python (.py)                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

File Structure

src/mathviz/compiler/
- __init__.py (31 KB) - Pipeline orchestration
- tokens.py (14 KB) - Token definitions
- lexer.py (28 KB) - Lexical analysis
- parser.py (115 KB) - Syntax parser
- ast_nodes.py (69 KB) - AST node definitions
- type_checker.py (143 KB) - Type inference
- codegen.py (122 KB) - Python code generation
- module_loader.py (22 KB) - Module resolution
- purity_analyzer.py (28 KB) - Side effect analysis
- complexity_analyzer.py (23 KB) - Big-O estimation
- parallel_analyzer.py (34 KB) - Loop parallelization
- jit_optimizer.py (63 KB) - Numba optimization
- vectorizer.py (63 KB) - SIMD vectorization
- memory_optimizer.py (61 KB) - Memory optimization
- const_fold.py (99 KB) - Constant folding
- linter.py (56 KB) - Static analysis
- diagnostics.py (25 KB) - Error formatting
- call_graph.py (24 KB) - Call graph analysis

Stage 1: Lexical Analysis

The lexer (lexer.py) transforms source code into a stream of tokens. It supports both ASCII and Unicode operators for mathematical notation.

Lexer Implementation

Token Categories

Category	Examples	Token Types
Keywords	`fn`, `let`, `struct`, `scene`	FN, LET, STRUCT, SCENE
Literals	`42`, `3.14`, `"hello"`, `true`	INTEGER, FLOAT, STRING, BOOLEAN
Math Unicode	`∈`, `∪`, `∩`, `⊆`, `π`	ELEMENT_OF, UNION, INTERSECTION, SUBSET, PI
Operators	`+`, `-`, `*`, `^`, `->`, `::`	PLUS, MINUS, STAR, CARET, THIN_ARROW, DOUBLE_COLON
F-strings	`f"x = {x}"`	FSTRING_START, FSTRING_PART, FSTRING_EXPR_*

Special Features

F-string Tokenization: Handles nested expressions in f"..." strings with proper brace matching
Range Syntax: Recognizes .. and ..= for exclusive/inclusive ranges
Doc Comments: Parses /// and /** */ documentation comments
Unicode Support: First-class support for mathematical symbols (∈, ∪, ∩, ⊆, ⊇, π, ∞)

          Lexer Token Example
        

          # Input: let x = {1, 2} ∪ {3, 4}
# Output tokens:
Token(LET, "let")
Token(IDENTIFIER, "x")
Token(ASSIGN, "=")
Token(LBRACE, "{")
Token(INTEGER, 1)
Token(COMMA, ",")
Token(INTEGER, 2)
Token(RBRACE, "}")
Token(UNION, "∪")        # Unicode operator!
Token(LBRACE, "{")
Token(INTEGER, 3)
Token(COMMA, ",")
Token(INTEGER, 4)
Token(RBRACE, "}")
Token(EOF, None)
        

Stage 2: Parsing

The parser (parser.py) is a hand-written recursive descent parser with Pratt parsing for expressions. It produces an immutable Abstract Syntax Tree.

Parser Implementation

Parsing Strategy

Recursive Descent: Top-down parsing for statements and declarations
Pratt Parsing: Operator precedence parsing for expressions
Error Recovery: Continues parsing after errors to report multiple issues
Location Tracking: Every AST node includes source location for error messages

Operator Precedence (Low to High)

Level	Operators	Associativity
1	`or`	Left
2	`and`	Left
3	`not`	Prefix
4	`==`, `!=`, `<`, `>`, `<=`, `>=`, `∈`, `⊆`	Left
5	`∪`, `∩`, `∖`	Left
6	`+`, `-`	Left
7	`*`, `/`, `%`	Left
8	`^`, `**`	Right
9	`-` (unary), `not`	Prefix
10	`.`, `::`, `()`, `[]`	Left

Abstract Syntax Tree

The AST (ast_nodes.py) consists of over 100 node types organized into hierarchies for expressions, statements, patterns, and types.

AST Design

Node Hierarchy

            ASTNode (abstract base)
├── Expression
│   ├── Literal (Integer, Float, String, Boolean, None)
│   ├── Identifier
│   ├── BinaryExpression
│   ├── UnaryExpression
│   ├── CallExpression
│   ├── MemberAccess
│   ├── IndexExpression
│   ├── LambdaExpression (PipeLambda)
│   ├── MatchExpression
│   ├── ConditionalExpression (ternary)
│   ├── RangeExpression
│   ├── CollectionLiteral (List, Set, Dict, Tuple)
│   └── FString
├── Statement
│   ├── LetStatement
│   ├── AssignmentStatement
│   ├── FunctionDef
│   ├── ClassDef / SceneDef
│   ├── StructDef / EnumDef / ImplBlock
│   ├── IfStatement / IfLetStatement
│   ├── ForStatement / WhileStatement / LoopStatement
│   ├── MatchStatement
│   ├── ReturnStatement / BreakStatement / ContinueStatement
│   └── UseStatement / ModuleDecl
├── Pattern (for pattern matching)
│   ├── LiteralPattern
│   ├── IdentifierPattern
│   ├── TuplePattern / ListPattern
│   ├── ConstructorPattern
│   ├── OrPattern / BindingPattern
│   └── WildcardPattern / RestPattern
└── TypeAnnotation
    ├── SimpleType
    ├── GenericType
    └── FunctionType
          

Immutability

All AST nodes are defined as frozen dataclasses with slots:

            @dataclass(frozen=True, slots=True)
class BinaryExpression(Expression):
    left: Expression
    operator: BinaryOperator
    right: Expression
    location: Optional[SourceLocation] = None

    def accept(self, visitor: ASTVisitor) -> Any:
        return visitor.visit_binary_expression(self)
          

Stage 3: Type Checking

The type checker (type_checker.py) implements bidirectional type inference with support for generics, structs, enums, and algebraic data types.

Type System

Type Representation

Type Class	Description	Examples
`PrimitiveType`	Built-in scalar types	Int, Float, Bool, String
`GenericTypeInstance`	Parameterized types	List[Int], Dict[String, Float], Optional[T]
`FunctionType`	Function signatures	(Int, Int) -> Int
`ClassType`	User-defined structs/classes	Point, Vector3D
`ArrayType`	NumPy-style arrays	Array[Float, 2]
`UnknownType`	Unresolved type variable	Used during inference

Inference Algorithm

Bidirectional: Combines synthesis (bottom-up) and checking (top-down)
Local Inference: Types propagate within functions without global constraint solving
Explicit Annotations: Type annotations serve as ground truth
Coercion: Implicit Int → Float promotion where safe

Built-in Types

            # Primitive types
INT_TYPE = PrimitiveType("Int")
FLOAT_TYPE = PrimitiveType("Float")
BOOL_TYPE = PrimitiveType("Bool")
STRING_TYPE = PrimitiveType("String")
NONE_TYPE = PrimitiveType("None")

# Math types
VEC_TYPE = ArrayType(FLOAT_TYPE, 1)  # Vector
MAT_TYPE = ArrayType(FLOAT_TYPE, 2)  # Matrix

# Special types
ANY_TYPE = AnyType()  # Accepts anything
          

Analysis Passes

Multiple analysis passes run in parallel to gather information for optimization and code generation.

Purity Analyzer

Determines function purity by tracking side effects. Pure functions are candidates for memoization and JIT compilation.

Side Effect Categories

IO: print, file operations
Mutation: Modifying non-local state
Manim: Animation calls (play, wait)
External: Calling impure functions

            class Purity(Enum):
    PURE = auto()           # No side effects
    READ_ONLY = auto()      # Reads external state
    IMPURE = auto()         # Has side effects
    UNKNOWN = auto()        # Cannot determine
          

Complexity Analyzer

Estimates Big-O algorithmic complexity by analyzing loop nesting and recursion patterns.

            class Complexity(Enum):
    O_1 = "O(1)"           # Constant
    O_LOG_N = "O(log n)"   # Logarithmic
    O_N = "O(n)"           # Linear
    O_N_LOG_N = "O(n log n)"
    O_N_SQUARED = "O(n²)"  # Quadratic
    O_N_CUBED = "O(n³)"    # Cubic
    O_2_N = "O(2ⁿ)"        # Exponential
          

Parallel Analyzer

Identifies loops that can be parallelized with Numba's prange. Analyzes data dependencies, reduction patterns, and loop-carried dependencies.

Parallelization Criteria

No loop-carried dependencies (each iteration independent)
Reduction variables detected (sum, product, min, max)
No I/O or mutation of shared state
Iteration count known or estimable

Stage 4: Code Generation

The code generator (codegen.py) transforms the AST into executable Python code, handling operator translation, Manim integration, and Numba decorators.

Code Generator

Operator Translation

MathViz	Python	Description
`x ^ y`	`x ** y`	Exponentiation
`A ∪ B`	`A \| B` or `A.union(B)`	Set union
`A ∩ B`	`A & B` or `A.intersection(B)`	Set intersection
`x ∈ S`	`x in S`	Set membership
`A ⊆ B`	`A.issubset(B)`	Subset
`sqrt(x)`	`np.sqrt(x)`	Square root
`sin(x)`	`np.sin(x)`	Trigonometric
`PI`	`np.pi`	Pi constant

Struct to Dataclass

            MathViz → Python
          

            // MathViz
struct Point {
    x: Float
    y: Float
}

impl Point {
    fn new(x: Float, y: Float) -> Point {
        return Point { x: x, y: y }
    }
}
            # Generated Python
@dataclass
class Point:
    x: float
    y: float

def Point_new(x: float, y: float) -> Point:
    return Point(x=x, y=y)
          

Scene to Manim Class

            MathViz → Python
          

            // MathViz
scene HelloWorld {
    fn construct(self) {
        let text = Text("Hello")
        play(Write(text))
    }
}
            # Generated Python
from manim import *

class HelloWorld(Scene):
    def construct(self):
        text = Text('Hello')
        self.play(Write(text))
          

Optimization Passes

The compiler includes multiple optimization passes that can be enabled for high-performance numerical code.

Numba JIT Integration

Functions that pass purity analysis are automatically decorated with @njit for Just-In-Time compilation to native code.

          # Automatic JIT decoration
@njit(cache=True)
def compute_sum(n: int) -> int:
    total = 0
    for i in range(0, n):
        total = (total + i)
    return total
        

Constant Folding

The const_fold.py module (99KB) implements multiple optimization passes:

Constant Folding: Evaluate constant expressions at compile time
Constant Propagation: Replace variables with known constant values
Dead Code Elimination: Remove unreachable code
Algebraic Simplification: Simplify expressions (x * 1 → x, x + 0 → x)
Strength Reduction: Replace expensive ops (x * 2 → x << 1)
Common Subexpression Elimination: Reuse repeated computations

Vectorization

The vectorizer.py module detects patterns that can be vectorized to NumPy operations for SIMD acceleration:

Element-wise array operations
Reduction operations (sum, product, min, max)
Stencil computations
Broadcasting patterns

Memory Optimization

The memory_optimizer.py module optimizes memory access patterns:

Buffer reuse analysis
In-place operation detection
Cache-friendly loop ordering
Temporary variable elimination
Memory pool generation

Summary

The MathViz compiler is a production-grade source-to-source transpiler with:

31,000+ lines of carefully crafted Python code
22 modules covering all aspects of compilation
8-stage pipeline with optional analysis passes
Full type inference with generics and algebraic data types
Automatic optimization via Numba JIT, vectorization, and constant folding
Rich diagnostics with source locations and suggestions

Contributing

The compiler is open source. Contributions are welcome! See the GitHub repository for contribution guidelines.