Compiler Internals

A comprehensive technical deep-dive into the MathViz compiler architecture, implementation details, and optimization strategies. This document is intended for contributors, language implementers, and anyone interested in compiler design.

31,310
Lines of Code
22
Compiler Modules
8
Pipeline Stages
100+
AST Node Types

Overview

The MathViz compiler is a multi-stage source-to-source transpiler that transforms .mviz source files into optimized Python code. It follows a traditional compiler pipeline architecture with additional analysis passes for performance optimization.

Design Philosophy

The compiler prioritizes correctness, readable output, and seamless Python interoperability. Generated code is clean, idiomatic Python that can be inspected and debugged easily.

Key Design Decisions

Compilation Pipeline

The compiler executes the following stages in order. Each stage is independent and can be enabled/disabled via the CompilationPipeline configuration.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MathViz Compilation Pipeline                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Source Code (.mviz)                                                       │
│         │                                                                   │
│         ▼                                                                   │
│   ┌─────────────┐                                                           │
│   │   LEXER     │  Tokenization: Unicode math symbols, f-strings, keywords  │
│   │  lexer.py   │  Output: Token stream                                     │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │   PARSER    │  Recursive descent parser with operator precedence        │
│   │  parser.py  │  Output: Abstract Syntax Tree (AST)                       │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │   MODULE    │  Resolves `use` statements, loads dependencies            │
│   │   LOADER    │  Detects circular dependencies                            │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ▼                                                                  │
│   ┌─────────────┐                                                           │
│   │    TYPE     │  Hindley-Milner style inference with bidirectional hints  │
│   │   CHECKER   │  Output: Type errors, function signatures                 │
│   └──────┬──────┘                                                           │
│          │                                                                  │
│          ├────────────────────┬────────────────────┐                        │
│          ▼                    ▼                    ▼                        │
│   ┌─────────────┐      ┌─────────────┐      ┌─────────────┐                 │
│   │   PURITY    │      │ COMPLEXITY  │      │  PARALLEL   │                 │
│   │  ANALYZER   │      │  ANALYZER   │      │  ANALYZER   │                 │
│   └──────┬──────┘      └──────┬──────┘      └──────┬──────┘                 │
│          │                    │                    │                        │
│          └────────────────────┼────────────────────┘                        │
│                               │                                             │
│                               ▼                                             │
│                        ┌─────────────┐                                      │
│                        │    CODE     │  Emits Python with Numba decorators  │
│                        │  GENERATOR  │  Output: Executable Python code      │
│                        └──────┬──────┘                                      │
│                               │                                             │
│                               ▼                                             │
│                        Generated Python (.py)                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
        

File Structure

  • src/mathviz/compiler/
    • __init__.py (31 KB) - Pipeline orchestration
    • tokens.py (14 KB) - Token definitions
    • lexer.py (28 KB) - Lexical analysis
    • parser.py (115 KB) - Syntax parser
    • ast_nodes.py (69 KB) - AST node definitions
    • type_checker.py (143 KB) - Type inference
    • codegen.py (122 KB) - Python code generation
    • module_loader.py (22 KB) - Module resolution
    • purity_analyzer.py (28 KB) - Side effect analysis
    • complexity_analyzer.py (23 KB) - Big-O estimation
    • parallel_analyzer.py (34 KB) - Loop parallelization
    • jit_optimizer.py (63 KB) - Numba optimization
    • vectorizer.py (63 KB) - SIMD vectorization
    • memory_optimizer.py (61 KB) - Memory optimization
    • const_fold.py (99 KB) - Constant folding
    • linter.py (56 KB) - Static analysis
    • diagnostics.py (25 KB) - Error formatting
    • call_graph.py (24 KB) - Call graph analysis

Stage 1: Lexical Analysis

The lexer (lexer.py) transforms source code into a stream of tokens. It supports both ASCII and Unicode operators for mathematical notation.

1

Lexer Implementation

Token Categories

Category Examples Token Types
Keywords fn, let, struct, scene FN, LET, STRUCT, SCENE
Literals 42, 3.14, "hello", true INTEGER, FLOAT, STRING, BOOLEAN
Math Unicode , , , , π ELEMENT_OF, UNION, INTERSECTION, SUBSET, PI
Operators +, -, *, ^, ->, :: PLUS, MINUS, STAR, CARET, THIN_ARROW, DOUBLE_COLON
F-strings f"x = {x}" FSTRING_START, FSTRING_PART, FSTRING_EXPR_*

Special Features

  • F-string Tokenization: Handles nested expressions in f"..." strings with proper brace matching
  • Range Syntax: Recognizes .. and ..= for exclusive/inclusive ranges
  • Doc Comments: Parses /// and /** */ documentation comments
  • Unicode Support: First-class support for mathematical symbols (∈, ∪, ∩, ⊆, ⊇, π, ∞)
Lexer Token Example
# Input: let x = {1, 2} ∪ {3, 4}
# Output tokens:
Token(LET, "let")
Token(IDENTIFIER, "x")
Token(ASSIGN, "=")
Token(LBRACE, "{")
Token(INTEGER, 1)
Token(COMMA, ",")
Token(INTEGER, 2)
Token(RBRACE, "}")
Token(UNION, "∪")        # Unicode operator!
Token(LBRACE, "{")
Token(INTEGER, 3)
Token(COMMA, ",")
Token(INTEGER, 4)
Token(RBRACE, "}")
Token(EOF, None)

Stage 2: Parsing

The parser (parser.py) is a hand-written recursive descent parser with Pratt parsing for expressions. It produces an immutable Abstract Syntax Tree.

2

Parser Implementation

Parsing Strategy

  • Recursive Descent: Top-down parsing for statements and declarations
  • Pratt Parsing: Operator precedence parsing for expressions
  • Error Recovery: Continues parsing after errors to report multiple issues
  • Location Tracking: Every AST node includes source location for error messages

Operator Precedence (Low to High)

Level Operators Associativity
1orLeft
2andLeft
3notPrefix
4==, !=, <, >, <=, >=, , Left
5, , Left
6+, -Left
7*, /, %Left
8^, **Right
9- (unary), notPrefix
10., ::, (), []Left

Abstract Syntax Tree

The AST (ast_nodes.py) consists of over 100 node types organized into hierarchies for expressions, statements, patterns, and types.

3

AST Design

Node Hierarchy

ASTNode (abstract base)
├── Expression
│   ├── Literal (Integer, Float, String, Boolean, None)
│   ├── Identifier
│   ├── BinaryExpression
│   ├── UnaryExpression
│   ├── CallExpression
│   ├── MemberAccess
│   ├── IndexExpression
│   ├── LambdaExpression (PipeLambda)
│   ├── MatchExpression
│   ├── ConditionalExpression (ternary)
│   ├── RangeExpression
│   ├── CollectionLiteral (List, Set, Dict, Tuple)
│   └── FString
├── Statement
│   ├── LetStatement
│   ├── AssignmentStatement
│   ├── FunctionDef
│   ├── ClassDef / SceneDef
│   ├── StructDef / EnumDef / ImplBlock
│   ├── IfStatement / IfLetStatement
│   ├── ForStatement / WhileStatement / LoopStatement
│   ├── MatchStatement
│   ├── ReturnStatement / BreakStatement / ContinueStatement
│   └── UseStatement / ModuleDecl
├── Pattern (for pattern matching)
│   ├── LiteralPattern
│   ├── IdentifierPattern
│   ├── TuplePattern / ListPattern
│   ├── ConstructorPattern
│   ├── OrPattern / BindingPattern
│   └── WildcardPattern / RestPattern
└── TypeAnnotation
    ├── SimpleType
    ├── GenericType
    └── FunctionType

Immutability

All AST nodes are defined as frozen dataclasses with slots:

@dataclass(frozen=True, slots=True)
class BinaryExpression(Expression):
    left: Expression
    operator: BinaryOperator
    right: Expression
    location: Optional[SourceLocation] = None

    def accept(self, visitor: ASTVisitor) -> Any:
        return visitor.visit_binary_expression(self)

Stage 3: Type Checking

The type checker (type_checker.py) implements bidirectional type inference with support for generics, structs, enums, and algebraic data types.

4

Type System

Type Representation

Type Class Description Examples
PrimitiveType Built-in scalar types Int, Float, Bool, String
GenericTypeInstance Parameterized types List[Int], Dict[String, Float], Optional[T]
FunctionType Function signatures (Int, Int) -> Int
ClassType User-defined structs/classes Point, Vector3D
ArrayType NumPy-style arrays Array[Float, 2]
UnknownType Unresolved type variable Used during inference

Inference Algorithm

  1. Bidirectional: Combines synthesis (bottom-up) and checking (top-down)
  2. Local Inference: Types propagate within functions without global constraint solving
  3. Explicit Annotations: Type annotations serve as ground truth
  4. Coercion: Implicit Int → Float promotion where safe

Built-in Types

# Primitive types
INT_TYPE = PrimitiveType("Int")
FLOAT_TYPE = PrimitiveType("Float")
BOOL_TYPE = PrimitiveType("Bool")
STRING_TYPE = PrimitiveType("String")
NONE_TYPE = PrimitiveType("None")

# Math types
VEC_TYPE = ArrayType(FLOAT_TYPE, 1)  # Vector
MAT_TYPE = ArrayType(FLOAT_TYPE, 2)  # Matrix

# Special types
ANY_TYPE = AnyType()  # Accepts anything

Analysis Passes

Multiple analysis passes run in parallel to gather information for optimization and code generation.

5

Purity Analyzer

Determines function purity by tracking side effects. Pure functions are candidates for memoization and JIT compilation.

Side Effect Categories

  • IO: print, file operations
  • Mutation: Modifying non-local state
  • Manim: Animation calls (play, wait)
  • External: Calling impure functions
class Purity(Enum):
    PURE = auto()           # No side effects
    READ_ONLY = auto()      # Reads external state
    IMPURE = auto()         # Has side effects
    UNKNOWN = auto()        # Cannot determine
6

Complexity Analyzer

Estimates Big-O algorithmic complexity by analyzing loop nesting and recursion patterns.

class Complexity(Enum):
    O_1 = "O(1)"           # Constant
    O_LOG_N = "O(log n)"   # Logarithmic
    O_N = "O(n)"           # Linear
    O_N_LOG_N = "O(n log n)"
    O_N_SQUARED = "O(n²)"  # Quadratic
    O_N_CUBED = "O(n³)"    # Cubic
    O_2_N = "O(2ⁿ)"        # Exponential
7

Parallel Analyzer

Identifies loops that can be parallelized with Numba's prange. Analyzes data dependencies, reduction patterns, and loop-carried dependencies.

Parallelization Criteria

  • No loop-carried dependencies (each iteration independent)
  • Reduction variables detected (sum, product, min, max)
  • No I/O or mutation of shared state
  • Iteration count known or estimable

Stage 4: Code Generation

The code generator (codegen.py) transforms the AST into executable Python code, handling operator translation, Manim integration, and Numba decorators.

8

Code Generator

Operator Translation

MathViz Python Description
x ^ yx ** yExponentiation
A ∪ BA | B or A.union(B)Set union
A ∩ BA & B or A.intersection(B)Set intersection
x ∈ Sx in SSet membership
A ⊆ BA.issubset(B)Subset
sqrt(x)np.sqrt(x)Square root
sin(x)np.sin(x)Trigonometric
PInp.piPi constant

Struct to Dataclass

MathViz → Python
// MathViz
struct Point {
    x: Float
    y: Float
}

impl Point {
    fn new(x: Float, y: Float) -> Point {
        return Point { x: x, y: y }
    }
}
# Generated Python
@dataclass
class Point:
    x: float
    y: float

def Point_new(x: float, y: float) -> Point:
    return Point(x=x, y=y)

Scene to Manim Class

MathViz → Python
// MathViz
scene HelloWorld {
    fn construct(self) {
        let text = Text("Hello")
        play(Write(text))
    }
}
# Generated Python
from manim import *

class HelloWorld(Scene):
    def construct(self):
        text = Text('Hello')
        self.play(Write(text))

Optimization Passes

The compiler includes multiple optimization passes that can be enabled for high-performance numerical code.

Numba JIT Integration

Functions that pass purity analysis are automatically decorated with @njit for Just-In-Time compilation to native code.

# Automatic JIT decoration
@njit(cache=True)
def compute_sum(n: int) -> int:
    total = 0
    for i in range(0, n):
        total = (total + i)
    return total

Constant Folding

The const_fold.py module (99KB) implements multiple optimization passes:

Vectorization

The vectorizer.py module detects patterns that can be vectorized to NumPy operations for SIMD acceleration:

Memory Optimization

The memory_optimizer.py module optimizes memory access patterns:

Summary

The MathViz compiler is a production-grade source-to-source transpiler with:

Contributing

The compiler is open source. Contributions are welcome! See the GitHub repository for contribution guidelines.