Architecture
ROOT-MCP Architecture
Overview
ROOT-MCP is a production-grade Model Context Protocol (MCP) server that enables AI assistants to interact with CERN ROOT files. It features a dual-mode architecture that balances simplicity and power, allowing users to choose between lightweight file operations and comprehensive physics analysis capabilities.
Design Philosophy
Zero-Config First: Start with
--data-pathorROOT_MCP_DATA_PATH— no YAML file requiredConfiguration-Driven: Full control available through
config.yamlfor advanced use casesLazy Loading: Components loaded only when needed for memory efficiency
Runtime Flexibility: Switch between modes without server restart
Security First: All file operations validated through security checks; permissive-by-default for local use
Pure Python Core: Uses
uprootfor crash-resistant ROOT file access without requiring a ROOT installationOptional Native ROOT: When a ROOT installation is available, additional tools unlock automatically
Architecture Tiers
Core Mode
Purpose: Lightweight file operations and basic statistics
Dependencies: uproot, awkward, numpy, pandas, mcp
Use Cases: File inspection, data reading, basic statistics, data export
Extended Mode
Purpose: Full physics analysis capabilities
Dependencies: Core dependencies + scipy, matplotlib
Use Cases: Histogram fitting, kinematics calculations, correlations, plotting
Native ROOT (Optional)
Purpose: Execute arbitrary PyROOT/C++ code; RDataFrame; ROOT macros
Dependencies: Extended mode + a working ROOT/PyROOT installation
Use Cases: Custom PyROOT scripts, RDataFrame event loops, C++ macro execution, RooFit
Enable: Set features.enable_root: true in config.yaml
Mode Selection
Controlled via config.yaml:
server:
mode: "extended" # or "core"
features:
enable_root: true # optional — unlocks ROOT native tools
Runtime switching available via switch_mode tool without restart. Native ROOT tools appear automatically when both enable_root: true and a ROOT installation are detected.
System Architecture
┌─────────────────────────────────────────────────────────┐
│ AI Assistant (Claude) │
└────────────────────┬────────────────────────────────────┘
│ MCP Protocol (JSON-RPC)
┌────────────────────┴────────────────────────────────────┐
│ ROOT-MCP Server │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Mode-Aware Dispatcher │ │
│ │ • Tool routing based on current mode │ │
│ │ • Dynamic component loading/unloading │ │
│ │ • Mode validation and switching │ │
│ └────────────────┬─────────────────────────────────┘ │
│ │ │
│ ┌────────────────┴─────────────────────────────────┐ │
│ │ Core Components (Always Loaded) │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ I/O Layer │ │ │
│ │ │ • FileManager: File caching & lifecycle │ │ │
│ │ │ • TreeReader: TTree data reading │ │ │
│ │ │ • HistogramReader: Histogram reading │ │ │
│ │ │ • PathValidator: Security checks │ │ │
│ │ │ • DataExporter: JSON/CSV/Parquet export │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ Operations Layer │ │ │
│ │ │ • BasicStatistics: min/max/mean/std │ │ │
│ │ │ • Basic histograms (no fitting) │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ Core Tools │ │ │
│ │ │ • DiscoveryTools: File/tree inspection │ │ │
│ │ │ • DataAccessTools: Branch reading │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────┴─────────────────────────────────┐ │
│ │ Extended Components (Lazy Loaded) │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ Analysis Layer │ │ │
│ │ │ • HistogramOperations: 1D/2D/Profile │ │ │
│ │ │ • KinematicsOperations: 4-vectors, mass │ │ │
│ │ │ • CorrelationAnalysis: Stats correlations│ │ │
│ │ │ • Fitting: 1D/2D model fitting │ │ │
│ │ │ • Plotting: Visualization generation │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ Extended Tools │ │ │
│ │ │ • AnalysisTools: High-level analysis │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────┴─────────────────────────────────┐ │
│ │ Native ROOT Components (Optional, Subprocess) │ │
│ │ Requires: ROOT installation + enable_root: true│ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ ROOT Native Layer │ │ │
│ │ │ • RootAvailability: version/feature probe │ │ │
│ │ │ • RootExecutor: subprocess runner │ │ │
│ │ │ • ASTSandbox: code safety validation │ │ │
│ │ │ • Templates: RDataFrame/RooFit boilerplat│ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ Native ROOT Tools │ │ │
│ │ │ • run_root_code: arbitrary PyROOT code │ │ │
│ │ │ • run_rdataframe: RDataFrame histograms │ │ │
│ │ │ • run_root_macro: C++ macro execution │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────┴─────────────────────────────────┐ │
│ │ Common Utilities │ │
│ │ • LRUCache: Generic caching │ │
│ │ • Error types: Typed exceptions │ │
│ │ • Utils: Path handling, formatting │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
┌────────────────────┴────────────────────────────────────┐
│ Storage Layer │
│ • Local files (file://) │
│ • Remote files (root://, http://, https://) │
│ • XRootD protocol support (optional) │
└─────────────────────────────────────────────────────────┘
Directory Structure
src/root_mcp/
├── core/ # Core mode components
│ ├── io/ # File I/O operations
│ │ ├── file_manager.py # File caching and lifecycle
│ │ ├── readers.py # TTree and histogram readers
│ │ ├── validators.py # Security validation
│ │ └── exporters.py # Data export (JSON/CSV/Parquet)
│ ├── operations/ # Basic operations
│ │ └── basic_stats.py # Statistics without scipy
│ └── tools/ # Core MCP tools
│ ├── discovery.py # File/tree inspection
│ └── data_access.py # Branch reading
├── extended/ # Extended mode components
│ ├── analysis/ # Advanced analysis
│ │ ├── histograms.py # 1D/2D/Profile histograms
│ │ ├── kinematics.py # 4-vector calculations
│ │ ├── correlations.py # Statistical correlations
│ │ ├── fitting.py # Model fitting (1D/2D)
│ │ ├── plotting.py # Visualization
│ │ ├── operations.py # High-level operations
│ │ └── expression.py # Expression evaluation
│ ├── root_native/ # Optional native ROOT integration
│ │ ├── executor.py # Subprocess executor with timeout
│ │ ├── sandbox.py # AST-based code safety checks
│ │ └── templates.py # Code templates (RDataFrame, RooFit…)
│ └── tools/ # Extended + native ROOT MCP tools
│ ├── analysis.py # Analysis tool handlers
│ ├── plotting.py # Plotting tool handlers
│ └── root_native.py # run_root_code/rdataframe/macro handlers
├── common/ # Shared utilities
│ ├── cache.py # LRU cache implementation
│ ├── errors.py # Error types
│ ├── root_availability.py # ROOT detection & version probing
│ └── utils.py # Utility functions
├── config.py # Configuration management
└── server.py # Mode-aware MCP server + CLI
Component Details
Core Components
FileManager
Purpose: Centralized file handle management
Features:
LRU cache for open file handles (configurable size)
Protocol support: file://, root://, http://, https://
Connection pooling for remote files
Automatic cleanup and resource management
Security: All paths validated through PathValidator
TreeReader
Purpose: High-level TTree data access
Features:
Columnar reading with awkward arrays
Streaming support for large files (chunked reading)
Selection expressions (cuts)
Branch sampling and statistics
Performance: Lazy loading, column pruning
PathValidator
Purpose: Security validation for all file operations
Features:
Allowed root directory enforcement (permissive when list is empty)
Path traversal prevention
Protocol auto-elevation from resource URI declarations
Write operation validation (input ≠ output)
Audit logging for write operations
Native ROOT Components
RootAvailability
Purpose: Detect ROOT installation and probe version/feature flags
Features: Subprocess-based detection; returns version string, RDataFrame support, RooFit, TMVA flags
Resilience: Server starts and works fully if ROOT is absent
RootExecutor
Purpose: Run PyROOT code in an isolated subprocess
Features:
Configurable execution timeout
Structured stdout/stderr capture
Working directory isolation
Non-blocking process management
ASTSandbox
Purpose: Validate Python/PyROOT code safety before execution
Checks: Blocks
os,sys,subprocess,socketimports; dangerous builtins (exec,eval,__import__); unsafe attribute access patternsPolicy: Allow-list approach — only physics-relevant modules permitted
Templates
Purpose: Pre-built code templates for common ROOT patterns
Available: RDataFrame histogram, TCanvas plotting, RooFit mass fit, file writing, C++ macro wrapper, batch processing
DataExporter
Purpose: Export data to standard formats
Formats: JSON, CSV, Parquet
Features: Compression support, streaming for large datasets
BasicStatistics
Purpose: Statistics without scipy dependency
Operations: min, max, mean, std, median, percentiles
Features: Basic 1D histograms (no fitting)
Extended Components
HistogramOperations
Purpose: Advanced histogram creation
Features:
1D histograms with error propagation
2D histograms with proper weighting
Profile histograms (mean of Y vs X)
Configurable bin limits
KinematicsOperations
Purpose: Particle physics calculations
Features:
Invariant mass from 4-vectors
Dalitz plot variables
Lorentz boost to CM frame
CorrelationAnalysis
Purpose: Statistical correlation analysis
Features:
Pearson correlation
Spearman rank correlation
Correlation/covariance matrices
Mutual information
Significance testing
Fitting
Purpose: Model fitting to histograms
Models: Gaussian, exponential, polynomial, Crystal Ball, 2D Gaussian
Features:
Fixed parameter support
Bounds constraints
Chi-square calculation
Error propagation
Tool Categories
Core Tools (Always Available)
Tool |
Description |
Mode |
|---|---|---|
|
List ROOT files in resource |
Core |
|
Get file structure and metadata |
Core |
|
List branches in TTree |
Core |
|
Check file integrity |
Core |
|
Read branch data |
Core |
|
Compute basic statistics |
Core |
|
Export to JSON/CSV/Parquet |
Core |
|
Switch between core/extended |
Core |
|
Get server capabilities |
Core |
Extended Tools (Extended Mode Only)
Tool |
Description |
Requires |
|---|---|---|
|
Create 1D histogram with fitting |
Extended |
|
Create 2D histogram |
Extended |
|
Fit model to histogram |
Extended |
|
Calculate invariant mass |
Extended |
|
Correlation analysis |
Extended |
|
Histogram arithmetic operations |
Extended |
|
Create 1D plot |
Extended |
|
Create 2D plot |
Extended |
Configuration
Merge Priority
Settings are applied from four sources in ascending priority (later wins):
1. Built-in Pydantic defaults (always lowest)
2. YAML config file (ROOT_MCP_CONFIG / --config / auto-discovery)
3. ROOT_MCP_* environment variables (override everything from YAML)
4. CLI flags (always highest)
This means every field in config.yaml can be overridden in a container simply by setting an
environment variable, and a user can always override that with a CLI flag. YAML users
see no behaviour change unless they also set conflicting env vars.
Zero-Config Quick Start
No config file is required. Pass a data directory at invocation time:
root-mcp --data-path /path/to/data
Or via environment variables (fully env-driven — useful for containers):
export ROOT_MCP_DATA_PATH=/data
export ROOT_MCP_MODE=extended
export ROOT_MCP_EXPORT_PATH=/exports
export ROOT_MCP_ENABLE_ROOT=1
root-mcp
Generate a starter config pre-filled with the current directory:
root-mcp init --permissive
apply_env_overrides / apply_cli_overrides
Two functions in config.py implement the merge:
apply_env_overrides(config) # reads all ROOT_MCP_* vars, mutates config in-place
apply_cli_overrides(config, args) # applies parsed argparse.Namespace, in-place
Both are called in main() after load_config() and apply_data_paths() — env vars
first, then CLI flags, so CLI always wins. Log level is the exception: it is applied
before load_config() so that config-loading messages also respect the requested verbosity.
Full Configuration Structure
# QUICK START
server:
mode: "extended" # "core" or "extended"
resources:
- name: "my_data"
uri: "file:///path/to/data"
allowed_patterns: ["*.root"]
features:
enable_root: false # set true to unlock run_root_code / run_rdataframe / run_root_macro
enable_export: true
# ADVANCED
core:
cache:
enabled: true
file_cache_size: 50
limits:
max_rows_per_call: 1_000_000
max_export_rows: 10_000_000
extended:
histogram:
max_bins_1d: 10_000
max_bins_2d: 1_000
plotting:
figure_width: 10.0
figure_height: 6.0
dpi: 100
fitting_max_iterations: 10_000
security:
allowed_roots: [] # empty = permissive (any local path); add explicit paths to restrict
allowed_protocols: ["file", "root", "http", "https"]
output:
export_base_path: "/path/to/exports"
allowed_formats: ["json", "csv", "parquet"]
root_native: # only relevant when enable_root: true
execution_timeout: 60
working_directory: "/tmp/root_mcp_native"
Security Model
Path Validation
Allowed Roots: When
security.allowed_rootsis non-empty, all file paths must be under those roots. An empty list (the default) permits access to any OS-readable path — permissive mode for local development.Protocol Validation: Only allowed protocols can be used. Protocols are auto-elevated when a
resources[].urideclares a remote scheme (e.g.root://,https://).Path Traversal Prevention:
..and symlinks validatedWrite Protection: Input and output paths must differ
Code Sandbox: Native ROOT code execution validated via AST analysis before subprocess launch — blocks dangerous imports, builtins, and attribute access
Resource Limits
Row Limits: Maximum rows per read operation
Export Limits: Maximum rows for export operations
Bin Limits: Maximum histogram bins (1D/2D)
Cache Limits: Maximum open file handles
Audit Trail
All write operations logged
Mode switches logged
Failed validation attempts logged
Performance Considerations
Memory Management
Lazy Loading: Extended components loaded only when needed
File Caching: LRU cache for file handles (configurable)
Streaming: Chunked reading for large files
Column Pruning: Read only requested branches
Optimization Strategies
Predicate Pushdown: Apply selections during read
Chunk Size Tuning: Configurable chunk sizes for streaming
Connection Pooling: Reuse connections for remote files
Compression: Support for compressed exports
Error Handling
Error Types
ROOTMCPError: Base exceptionSecurityError: Security constraint violationsValidationError: Input validation failuresFileOperationError: File I/O errorsAnalysisError: Analysis operation failures
Graceful Degradation
Extended mode falls back to core if dependencies missing
Clear error messages with hints for resolution
Mode-specific error handling
Future Considerations
Potential Enhancements
Caching Layer: Redis/memcached for distributed caching
Parallel Processing: Multi-file parallel operations
Query Optimization: Advanced query planning
Additional Formats: ROOT file writing (new files only)
Plugin System: Extensible analysis modules
Non-Goals
Modification of existing ROOT files
GUI/web interface
Full RooFit model library (templates cover the common cases)