Architecture

ROOT-MCP Architecture

Overview

ROOT-MCP is a production-grade Model Context Protocol (MCP) server that enables AI assistants to interact with CERN ROOT files. It features a dual-mode architecture that balances simplicity and power, allowing users to choose between lightweight file operations and comprehensive physics analysis capabilities.

Design Philosophy

Zero-Config First: Start with --data-path or ROOT_MCP_DATA_PATH — no YAML file required
Configuration-Driven: Full control available through config.yaml for advanced use cases
Lazy Loading: Components loaded only when needed for memory efficiency
Runtime Flexibility: Switch between modes without server restart
Security First: All file operations validated through security checks; permissive-by-default for local use
Pure Python Core: Uses uproot for crash-resistant ROOT file access without requiring a ROOT installation
Optional Native ROOT: When a ROOT installation is available, additional tools unlock automatically

Architecture Tiers

Core Mode

Purpose: Lightweight file operations and basic statistics Dependencies: uproot, awkward, numpy, pandas, mcp Use Cases: File inspection, data reading, basic statistics, data export

Extended Mode

Purpose: Full physics analysis capabilities Dependencies: Core dependencies + scipy, matplotlib Use Cases: Histogram fitting, kinematics calculations, correlations, plotting

Native ROOT (Optional)

Purpose: Execute arbitrary PyROOT/C++ code; RDataFrame; ROOT macros Dependencies: Extended mode + a working ROOT/PyROOT installation Use Cases: Custom PyROOT scripts, RDataFrame event loops, C++ macro execution, RooFit Enable: Set features.enable_root: true in config.yaml

Mode Selection

Controlled via config.yaml:

server:
  mode: "extended"  # or "core"

features:
  enable_root: true  # optional — unlocks ROOT native tools

Runtime switching available via switch_mode tool without restart. Native ROOT tools appear automatically when both enable_root: true and a ROOT installation are detected.

System Architecture

┌─────────────────────────────────────────────────────────┐
│                    AI Assistant (Claude)                │
└────────────────────┬────────────────────────────────────┘
                     │ MCP Protocol (JSON-RPC)
┌────────────────────┴────────────────────────────────────┐
│                  ROOT-MCP Server                        │
│  ┌──────────────────────────────────────────────────┐  │
│  │            Mode-Aware Dispatcher                 │  │
│  │  • Tool routing based on current mode            │  │
│  │  • Dynamic component loading/unloading           │  │
│  │  • Mode validation and switching                 │  │
│  └────────────────┬─────────────────────────────────┘  │
│                   │                                     │
│  ┌────────────────┴─────────────────────────────────┐  │
│  │         Core Components (Always Loaded)          │  │
│  │  ┌────────────────────────────────────────────┐  │  │
│  │  │ I/O Layer                                  │  │  │
│  │  │  • FileManager: File caching & lifecycle  │  │  │
│  │  │  • TreeReader: TTree data reading         │  │  │
│  │  │  • HistogramReader: Histogram reading     │  │  │
│  │  │  • PathValidator: Security checks         │  │  │
│  │  │  • DataExporter: JSON/CSV/Parquet export  │  │  │
│  │  └────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────┐  │  │
│  │  │ Operations Layer                           │  │  │
│  │  │  • BasicStatistics: min/max/mean/std      │  │  │
│  │  │  • Basic histograms (no fitting)          │  │  │
│  │  └────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────┐  │  │
│  │  │ Core Tools                                 │  │  │
│  │  │  • DiscoveryTools: File/tree inspection   │  │  │
│  │  │  • DataAccessTools: Branch reading        │  │  │
│  │  └────────────────────────────────────────────┘  │  │
│  └──────────────────────────────────────────────────┘  │
│                   │                                     │
│  ┌────────────────┴─────────────────────────────────┐  │
│  │    Extended Components (Lazy Loaded)             │  │
│  │  ┌────────────────────────────────────────────┐  │  │
│  │  │ Analysis Layer                             │  │  │
│  │  │  • HistogramOperations: 1D/2D/Profile     │  │  │
│  │  │  • KinematicsOperations: 4-vectors, mass  │  │  │
│  │  │  • CorrelationAnalysis: Stats correlations│  │  │
│  │  │  • Fitting: 1D/2D model fitting           │  │  │
│  │  │  • Plotting: Visualization generation     │  │  │
│  │  └────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────┐  │  │
│  │  │ Extended Tools                             │  │  │
│  │  │  • AnalysisTools: High-level analysis     │  │  │
│  │  └────────────────────────────────────────────┘  │  │
│  └──────────────────────────────────────────────────┘  │
│                   │                                     │
│  ┌────────────────┴─────────────────────────────────┐  │
│  │  Native ROOT Components (Optional, Subprocess)  │  │
│  │  Requires: ROOT installation + enable_root: true│  │
│  │  ┌────────────────────────────────────────────┐  │  │
│  │  │ ROOT Native Layer                          │  │  │
│  │  │  • RootAvailability: version/feature probe │  │  │
│  │  │  • RootExecutor: subprocess runner        │  │  │
│  │  │  • ASTSandbox: code safety validation     │  │  │
│  │  │  • Templates: RDataFrame/RooFit boilerplat│  │  │
│  │  └────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────┐  │  │
│  │  │ Native ROOT Tools                          │  │  │
│  │  │  • run_root_code: arbitrary PyROOT code   │  │  │
│  │  │  • run_rdataframe: RDataFrame histograms  │  │  │
│  │  │  • run_root_macro: C++ macro execution    │  │  │
│  │  └────────────────────────────────────────────┘  │  │
│  └──────────────────────────────────────────────────┘  │
│                   │                                     │
│  ┌────────────────┴─────────────────────────────────┐  │
│  │         Common Utilities                         │  │
│  │  • LRUCache: Generic caching                    │  │
│  │  • Error types: Typed exceptions                │  │
│  │  • Utils: Path handling, formatting             │  │
│  └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
                     │
┌────────────────────┴────────────────────────────────────┐
│                  Storage Layer                          │
│  • Local files (file://)                               │
│  • Remote files (root://, http://, https://)           │
│  • XRootD protocol support (optional)                  │
└─────────────────────────────────────────────────────────┘

Directory Structure

src/root_mcp/
├── core/                    # Core mode components
│   ├── io/                  # File I/O operations
│   │   ├── file_manager.py  # File caching and lifecycle
│   │   ├── readers.py       # TTree and histogram readers
│   │   ├── validators.py    # Security validation
│   │   └── exporters.py     # Data export (JSON/CSV/Parquet)
│   ├── operations/          # Basic operations
│   │   └── basic_stats.py   # Statistics without scipy
│   └── tools/               # Core MCP tools
│       ├── discovery.py     # File/tree inspection
│       └── data_access.py   # Branch reading
├── extended/                # Extended mode components
│   ├── analysis/            # Advanced analysis
│   │   ├── histograms.py    # 1D/2D/Profile histograms
│   │   ├── kinematics.py    # 4-vector calculations
│   │   ├── correlations.py  # Statistical correlations
│   │   ├── fitting.py       # Model fitting (1D/2D)
│   │   ├── plotting.py      # Visualization
│   │   ├── operations.py    # High-level operations
│   │   └── expression.py    # Expression evaluation
│   ├── root_native/         # Optional native ROOT integration
│   │   ├── executor.py      # Subprocess executor with timeout
│   │   ├── sandbox.py       # AST-based code safety checks
│   │   └── templates.py     # Code templates (RDataFrame, RooFit…)
│   └── tools/               # Extended + native ROOT MCP tools
│       ├── analysis.py      # Analysis tool handlers
│       ├── plotting.py      # Plotting tool handlers
│       └── root_native.py   # run_root_code/rdataframe/macro handlers
├── common/                  # Shared utilities
│   ├── cache.py             # LRU cache implementation
│   ├── errors.py            # Error types
│   ├── root_availability.py # ROOT detection & version probing
│   └── utils.py             # Utility functions
├── config.py                # Configuration management
└── server.py                # Mode-aware MCP server + CLI

Component Details

Core Components

FileManager

Purpose: Centralized file handle management
Features:
- LRU cache for open file handles (configurable size)
- Protocol support: file://, root://, http://, https://
- Connection pooling for remote files
- Automatic cleanup and resource management
Security: All paths validated through PathValidator

TreeReader

Purpose: High-level TTree data access
Features:
- Columnar reading with awkward arrays
- Streaming support for large files (chunked reading)
- Selection expressions (cuts)
- Branch sampling and statistics
Performance: Lazy loading, column pruning

PathValidator

Purpose: Security validation for all file operations
Features:
- Allowed root directory enforcement (permissive when list is empty)
- Path traversal prevention
- Protocol auto-elevation from resource URI declarations
- Write operation validation (input ≠ output)
- Audit logging for write operations

Native ROOT Components

RootAvailability

Purpose: Detect ROOT installation and probe version/feature flags
Features: Subprocess-based detection; returns version string, RDataFrame support, RooFit, TMVA flags
Resilience: Server starts and works fully if ROOT is absent

RootExecutor

Purpose: Run PyROOT code in an isolated subprocess
Features:
- Configurable execution timeout
- Structured stdout/stderr capture
- Working directory isolation
- Non-blocking process management

ASTSandbox

Purpose: Validate Python/PyROOT code safety before execution
Checks: Blocks os, sys, subprocess, socket imports; dangerous builtins (exec, eval, __import__); unsafe attribute access patterns
Policy: Allow-list approach — only physics-relevant modules permitted

Templates

Purpose: Pre-built code templates for common ROOT patterns
Available: RDataFrame histogram, TCanvas plotting, RooFit mass fit, file writing, C++ macro wrapper, batch processing

DataExporter

Purpose: Export data to standard formats
Formats: JSON, CSV, Parquet
Features: Compression support, streaming for large datasets

BasicStatistics

Purpose: Statistics without scipy dependency
Operations: min, max, mean, std, median, percentiles
Features: Basic 1D histograms (no fitting)

Extended Components

HistogramOperations

Purpose: Advanced histogram creation
Features:
- 1D histograms with error propagation
- 2D histograms with proper weighting
- Profile histograms (mean of Y vs X)
- Configurable bin limits

KinematicsOperations

Purpose: Particle physics calculations
Features:
- Invariant mass from 4-vectors
- Dalitz plot variables
- Lorentz boost to CM frame

CorrelationAnalysis

Purpose: Statistical correlation analysis
Features:
- Pearson correlation
- Spearman rank correlation
- Correlation/covariance matrices
- Mutual information
- Significance testing

Fitting

Purpose: Model fitting to histograms
Models: Gaussian, exponential, polynomial, Crystal Ball, 2D Gaussian
Features:
- Fixed parameter support
- Bounds constraints
- Chi-square calculation
- Error propagation

Tool Categories

Core Tools (Always Available)

Tool	Description	Mode
`list_files`	List ROOT files in resource	Core
`inspect_file`	Get file structure and metadata	Core
`list_branches`	List branches in TTree	Core
`validate_file`	Check file integrity	Core
`read_branches`	Read branch data	Core
`get_branch_stats`	Compute basic statistics	Core
`export_data`	Export to JSON/CSV/Parquet	Core
`switch_mode`	Switch between core/extended	Core
`get_server_info`	Get server capabilities	Core

Extended Tools (Extended Mode Only)

Tool	Description	Requires
`compute_histogram`	Create 1D histogram with fitting	Extended
`compute_histogram_2d`	Create 2D histogram	Extended
`fit_histogram`	Fit model to histogram	Extended
`compute_invariant_mass`	Calculate invariant mass	Extended
`compute_correlation`	Correlation analysis	Extended
`histogram_arithmetic`	Histogram arithmetic operations	Extended
`plot_histogram_1d`	Create 1D plot	Extended
`plot_histogram_2d`	Create 2D plot	Extended

Configuration

Merge Priority

Settings are applied from four sources in ascending priority (later wins):

Built-in Pydantic defaults          (always lowest)
YAML config file                    (ROOT_MCP_CONFIG / --config / auto-discovery)
ROOT_MCP_* environment variables    (override everything from YAML)
CLI flags                           (always highest)

This means every field in config.yaml can be overridden in a container simply by setting an environment variable, and a user can always override that with a CLI flag. YAML users see no behaviour change unless they also set conflicting env vars.

Zero-Config Quick Start

No config file is required. Pass a data directory at invocation time:

root-mcp --data-path /path/to/data

Or via environment variables (fully env-driven — useful for containers):

export ROOT_MCP_DATA_PATH=/data
export ROOT_MCP_MODE=extended
export ROOT_MCP_EXPORT_PATH=/exports
export ROOT_MCP_ENABLE_ROOT=1
root-mcp

Generate a starter config pre-filled with the current directory:

root-mcp init --permissive

`apply_env_overrides` / `apply_cli_overrides`

Two functions in config.py implement the merge:

apply_env_overrides(config)     # reads all ROOT_MCP_* vars, mutates config in-place
apply_cli_overrides(config, args)  # applies parsed argparse.Namespace, in-place

Both are called in main() after load_config() and apply_data_paths() — env vars first, then CLI flags, so CLI always wins. Log level is the exception: it is applied before load_config() so that config-loading messages also respect the requested verbosity.

Full Configuration Structure

# QUICK START
server:
  mode: "extended"  # "core" or "extended"

resources:
  - name: "my_data"
    uri: "file:///path/to/data"
    allowed_patterns: ["*.root"]

features:
  enable_root: false    # set true to unlock run_root_code / run_rdataframe / run_root_macro
  enable_export: true

# ADVANCED
core:
  cache:
    enabled: true
    file_cache_size: 50
  limits:
    max_rows_per_call: 1_000_000
    max_export_rows: 10_000_000

extended:
  histogram:
    max_bins_1d: 10_000
    max_bins_2d: 1_000
  plotting:
    figure_width: 10.0
    figure_height: 6.0
    dpi: 100
  fitting_max_iterations: 10_000

security:
  allowed_roots: []   # empty = permissive (any local path); add explicit paths to restrict
  allowed_protocols: ["file", "root", "http", "https"]

output:
  export_base_path: "/path/to/exports"
  allowed_formats: ["json", "csv", "parquet"]

root_native:          # only relevant when enable_root: true
  execution_timeout: 60
  working_directory: "/tmp/root_mcp_native"

Security Model

Path Validation

Allowed Roots: When security.allowed_roots is non-empty, all file paths must be under those roots. An empty list (the default) permits access to any OS-readable path — permissive mode for local development.
Protocol Validation: Only allowed protocols can be used. Protocols are auto-elevated when a resources[].uri declares a remote scheme (e.g. root://, https://).
Path Traversal Prevention: .. and symlinks validated
Write Protection: Input and output paths must differ
Code Sandbox: Native ROOT code execution validated via AST analysis before subprocess launch — blocks dangerous imports, builtins, and attribute access

Resource Limits

Row Limits: Maximum rows per read operation
Export Limits: Maximum rows for export operations
Bin Limits: Maximum histogram bins (1D/2D)
Cache Limits: Maximum open file handles

Audit Trail

All write operations logged
Mode switches logged
Failed validation attempts logged

Performance Considerations

Memory Management

Lazy Loading: Extended components loaded only when needed
File Caching: LRU cache for file handles (configurable)
Streaming: Chunked reading for large files
Column Pruning: Read only requested branches

Optimization Strategies

Predicate Pushdown: Apply selections during read
Chunk Size Tuning: Configurable chunk sizes for streaming
Connection Pooling: Reuse connections for remote files
Compression: Support for compressed exports

Error Handling

Error Types

ROOTMCPError: Base exception
SecurityError: Security constraint violations
ValidationError: Input validation failures
FileOperationError: File I/O errors
AnalysisError: Analysis operation failures

Graceful Degradation

Extended mode falls back to core if dependencies missing
Clear error messages with hints for resolution
Mode-specific error handling

Future Considerations

Potential Enhancements

Caching Layer: Redis/memcached for distributed caching
Parallel Processing: Multi-file parallel operations
Query Optimization: Advanced query planning
Additional Formats: ROOT file writing (new files only)
Plugin System: Extensible analysis modules

Non-Goals

Modification of existing ROOT files
GUI/web interface
Full RooFit model library (templates cover the common cases)