A First Account of the Impact of Ion Electromagnetic Dissociation on Event Exclusivity in Ultraperipheral LHC Collisions

▶

Low-Multiplicity Jets as Probes of GeV-Scale Light-Quark-Coupled Particles

▶

Probing the Tau Anomalous Magnetic Moment at Colliders: From Ultra-Peripheral Collisions to the Precision Frontier

Authors

Natascia Vignaroli

Abstract

The anomalous magnetic moment of the tau lepton, $a_τ$, represents a fundamental test of the Standard Model (SM) and a high-sensitivity probe for New Physics in the third generation of leptons. Due to the tau's extremely short lifetime, traditional spin-precession measurements remain inaccessible, necessitating innovative experimental strategies at high-energy colliders. This review provides a comprehensive overview of the current experimental landscape, highlighting the recent paradigm shift from LEP-era constraints to the unprecedented precision reached at the LHC. We emphasize the importance of Ultra-Peripheral Heavy-Ion Collisions (UPCs), which act as a ``photon-photon collider'' of extreme intensity. By leveraging the $Z^4$ enhancement of the coherent photon flux in Lead-Lead ($PbPb$) interactions, these collisions provide a theoretically robust ``quasi-static'' environment. These results are critically compared with the latest measurements from proton-proton collisions, including the recent CMS observation of the $γγ\to ττ$ process and the ATLAS constraints from the high-mass Drell-Yan tail. We evaluate their complementarity and the challenges related to Effective Field Theory validity at the TeV scale. Finally, we outline the future prospects for $a_τ$ at Belle II and the Future Circular Collider (FCC) stages. While FCC-hh in $PbPb$ mode provides a theoretically clean environment, its sensitivity remains limited to $\mathcal{O}(10^{-2})$. Conversely, the next generation of lepton facilities, specifically Belle II and FCC-ee, aims for the $\mathcal{O}(10^{-5})$ level, required to probe SM electroweak loop corrections. Long-term projections for a high-energy Muon Collider suggest a potential reach of $\mathcal{O}(10^{-6})$.

Comments: invited contribution to the MDPI special issue "Symmetry and Relativistic Heavy-Ion Collisions"

Submitted: 2026-04-21 ArXiv ID: 2604.19665v1

▶

Radon-induced backgrounds in the NEXT-100 experiment

Authors

NEXT Collaboration, C. Cortes-Parra, P. Novella, et al.

Abstract

The NEXT-100 detector at the LSC aims at the first competitive search for the \bbnonu decay using a high-pressure \Xe{136} electroluminescent time projection chamber. The first low-background run of NEXT-100 at 3.95 bar has been devoted to the measurement of the radon-induced backgrounds impacting this search. The contributions from both the internal and external airborne radon have been evaluated. The internal \Rn{222} activity is found to be (0.95$\pm$0.04(stat)$\pm$0.09(sys)) Bq/m$^3$, while no traces of \Rn{220} have been observed. Most of the \Rn{222} progeny plate-out on the surface of the cathode of the detector, leading to a rate of Rn-induced \Bi{214} of (0.97$\pm$0.05(stat)$\pm$0.10(sys)) Hz for visible energies above 400 keV. The corresponding background index in the \bbnonu region of interest is evaluated as (7.3$\pm$1.5(stat)$\pm$0.8(sys))$\times10^{-4}$ counts/(keV$\cdot$kg$\cdot$yr) after selection of the fully contained events. This background index is reduced to $\sim$4$\times10^{-5}$ counts/(keV$\cdot$kg$\cdot$yr) by applying a topological selection requiring only one double-electron-like track in the events. This value is one order of magnitude below the total radiogenic background expectation in NEXT-100. By analyzing the correlation of the airborne radon activity and the measured rate of events in NEXT-100, it is concluded that the detector operates in a virtualy radon-free environment thanks to the radon abatement system of the LSC.

Comments: 21 pages, 13 figures

Submitted: 2026-04-21 ArXiv ID: 2604.19616v2

▶

QCD-factorization amplitudes from flavour symmetries: beyond the $SU(3)$ symmetric case

▶

Search for quantum black holes in lepton+jet final states using proton-proton collisions at $\sqrt{s}=13.6$ TeV with the ATLAS detector

▶

Neural posterior estimation of the neutrino direction in IceCube using transformer-encoded normalizing flows on the sphere

RL-ABC: Reinforcement Learning for Accelerator Beamline Control

Authors

Anwar Ibrahim, Fedor Ratnikov, Maxim Kaledin, et al.

Abstract

Particle accelerator beamline optimization is a high-dimensional control problem traditionally requiring significant expert intervention. We present RLABC (Reinforcement Learning for Accelerator Beamline Control), an open-source Python framework that automatically transforms standard Elegant beamline configurations into reinforcement learning environments. RLABC integrates with the widely-used Elegant beam dynamics simulation code via SDDS-based interfaces, enabling researchers to apply modern RL algorithms to beamline optimization with minimal RL-specific development. The main contribution is a general methodology for formulating beamline tuning as a Markov decision process: RLABC automatically preprocesses lattice files to insert diagnostic watch points before each tunable element, constructs a 57-dimensional state representation from beam statistics, covariance information, and aperture constraints, and provides a configurable reward function for transmission optimization. The framework supports multiple RL algorithms through Stable-Baselines3 compatibility and implements stage learning strategies for improved training efficiency. Validation on a test beamline derived from the VEPP-5 injection complex (37 control parameters across 11 quadrupoles and 4 dipoles) demonstrates that the framework successfully enables RL-based optimization, with a Deep Deterministic Policy Gradient agent achieving 70.3\% particle transmission -- performance matching established methods such as differential evolution. The framework's stage learning capability allows decomposition of complex optimization problems into manageable subproblems, improving training efficiency. The complete framework, including configuration files and example notebooks, is available as open-source software to facilitate adoption and further research.

Submitted: 2026-04-21 ArXiv ID: 2604.19146v1

▶

Three-dimensional recoil-electron reconstruction using combined optical imaging and waveform readout for electron-tracking Compton cameras

Authors

Tomonori Ikeda, Tatsuya Sawano, Naomi Tsuji, et al.

Abstract

Accurate reconstruction of recoil-electron directions is critical for enhancing the point-spread function of electron-tracking Compton cameras (ETCCs) in gamma-ray imaging. Although full three-dimensional (3D) readout systems achieve high-precision reconstruction, they are impractical for large-area detectors because of the enormous data volume. This study proposes and demonstrates a practical alternative for inferring the 3D recoil-electron direction in Compton scattering. This method combines a high-resolution two-dimensional optical image, a one-dimensional waveform signal, and a deep-learning-based method through simulations. The proposed method achieved an angular resolution of approximately $44^\circ$ for the recoil-electron direction in the 40-50 keV range, corresponding to an improvement of a factor of about 1.3 compared with our previous strip-readout approach using pseudo-experimental data generated by Geant4 and MAGBOLTZ simulations for an argon-based gas time projection chamber. In addition, the starting-point resolution of the electron track was improved over the previous method across the 5-50 keV electron energy range. These results demonstrate that complementary information from the transverse image and longitudinal waveform can effectively recover the 3D track topology without requiring full 3D readout. The proposed approach provides a realistic pathway for improving ETCC imaging performance.

Comments: 23 pages, 15 figures

Submitted: 2026-04-21 ArXiv ID: 2604.19051v1

▶

High Energy Physics - Phenomenology

▶

Purely Quadratic Non-Gaussianity from Tachyonic Instability: Primordial Black Holes and Scalar-Induced Gravitational Waves

Authors

He-Xu Zhang, Mei Huang

Abstract

We investigate primordial black hole (PBH) formation in a cosmological scenario where curvature perturbations follow purely quadratic non-Gaussianity, $ζ= A(φ^2-\langleφ^2\rangle)$, arising from tachyonic instability in multi-component inflationary models. Within an extended Press-Schechter framework based on the compaction function, we derive the probability distribution of the linear compaction function and its asymptotic exponential tail, demonstrating that PBH abundance is exponentially sensitive not only to the amplitude of perturbations but also to the correlation coefficient $ρ$ between the smoothed field and its radial gradient. We further find that, for $A<0$, the spectral width of the curvature power spectrum plays a decisive role in avoiding PBH overproduction: broad spectra yield mildly negative $ρ$ and fail to suppress PBH formation, while sufficiently narrow spectra drive $ρ\to -1$, resulting in an exponential suppression while maintaining a sizable gravitational-wave signal. Thermal inflation provides a useful benchmark scenario with asteroid-mass PBH dark matter and high-frequency scalar-induced gravitational waves potentially detectable by future space-based interferometers, but its typically broad spectra make it challenging to reconcile PTA observations with PBH constraints.

Natascia Vignaroli

Abstract

Comments: invited contribution to the MDPI special issue "Symmetry and Relativistic Heavy-Ion Collisions"

Submitted: 2026-04-21 ArXiv ID: 2604.19665v1

▶

Finite-density equation of state of hot QCD using the complex Langevin equation

Authors

George K. Leontaris, Pramod Shukla

Abstract

Fibre inflation is one of the most attractive models realized in the type IIB orientifold compactification. It is embedded in the framework of L(arge) V(olume) S(cenarios) using a class of compactifying Calabi-Yau (CY) threefolds having K3-fibration. The standard single-field fibre inflation is driven by a fibre modulus which needs to travel a trans-Planckian distance of the order of ${\cal O}(5-8)$M$_p$ in the effective moduli space. The global embedding attempts using concrete CY orientifold setups have shown that Kähler cone conditions can generically induce some significantly tight bounds on the inflaton range, especially in the presence of a Swiss-Cheese structure via an exceptional rigid divisor in the CY threefold. Such field range bounds usually obstruct the inflationary plateau, leading to insufficient number of efolds during the inflationary dynamics. In this context, we review our recent work about the possibility of assisting multiple fibre moduli such that the burden of traveling the required trans-Planckian distance could be shared by multiple fields, and successful inflation could be realized before hitting (or being too close to) their respective individual Kähler cone boundaries.

Comments: 24 pages, Contribution to the Proceedings of the "School and Workshops on Elementary Particle Physics and Gravity" Corfu 2025

Submitted: 2026-04-21 ArXiv ID: 2604.19250v1

▶

Progress on the soft anomalous dimension in QCD

▶

Cosmological constraints on TeV-scale dark matter subcomponents decaying between recombination and reionisation

▶

CP-violating multi-field phase transitions and gravitational waves in a hidden NJL sector

Authors

Chang-Xin Liu

Abstract

We investigate the dynamics of a cosmological first-order phase transition (FOPT) and the associated stochastic gravitational wave background (SGWB) in a hidden strongly coupled sector described by an extended Nambu--Jona-Lasinio (NJL) model with $N_f = 3$ fermion flavors. The model incorporates a CP-violating six-fermion 't Hooft interaction, an explicit chiral symmetry breaking mass term, and chirally symmetric eight-fermion operators that stabilize the vacuum. We perform a multi-field analysis of the tunneling dynamics, going beyond conventional single-field approximations. The interplay between explicit symmetry breaking and CP violation induces a vacuum misalignment, resulting in a curved tunneling path and a spatially varying CP-violating background across the bubble wall. Furthermore, the intrinsically rapid transition rate characteristic of the NJL framework ($β/H \sim \mathcal{O}(10^4)$ in the parameter regions considered) leads to a strong suppression of gravitational wave production. As a result, the predicted SGWB remains well below the projected sensitivities of future space-based interferometers. Finally, the explicit symmetry breaking mass introduces a crucial energy bias between competing vacua, triggering the prompt collapse of transient domain wall configurations and thereby ensuring the cosmological viability of the model.

Submitted: 2026-04-21 ArXiv ID: 2604.19197v1

▶

Implications of the First JUNO Results for Dirac Neutrino Texture Zeros

▶

Vortex structures in electron-positron pair production by two-colored fields

Authors

Adiljan Sawut, Ying-Jun Li, Hong-Hao Fan, et al.

Abstract

We investigate the spin resolved vortex properties of electron positron pairs created from vacuum in time delayed, two color electromagnetic fields. By treating the temporal delay G as a continuous tuning parameter, we reveal a dynamic transition from interference-dominated domain patterns at G=0 to the nucleation of quantized vortex lattices at G=0.5. These topological structures exhibit a staggered arrangement analogous to von Karman vortex streets in fluid dynamics. We demonstrate that the momentum-space morphology is strictly governed by spin orbit selection rules, i.e., parallel spin configurations enforce a dipole-like connectivity, while anti-parallel configurations resolve into distinct quadrupole structures. This difference originates from the conservation of total angular momentum Jz, where the spin projection determines the required orbital angular momentum Lz of the created pairs. At large delays (G greater than 1), macroscopic vortex coherence dissolves into a chaotic phase landscope due to multi-channel interference, yet the spin-dependent nodal geometries remain robust. Our findings suggest that these topological signatures provide a high-fidelity diagnostic for the quantum dynamics of vacuum excitations in strong field QED.

Comments: 16 pages, 6 figures

Submitted: 2026-04-21 ArXiv ID: 2604.19002v1

▶

Fine-Tuning Small Reasoning Models for Quantum Field Theory

Authors

Nathaniel S. Woodward, Zhiqi Gao, Yurii Kvasiuk, et al.

Abstract

Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and $\sim$200M tokens of QFT reasoning traces.

Submitted: 2026-04-21 ArXiv ID: 2604.18936v1

▶

Machine Learning - Statistics

▶

Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints

Mihailo Stojnic

Abstract

In [97,99,100], an fl-RDT framework is introduced to characterize \emph{statistical computational gaps} (SCGs). Studying \emph{symmetric binary perceptrons} (SBPs), [100] obtained an \emph{algorithmic} threshold estimate $α_a\approx α_c^{(7)}\approx 1.6093$ at the 7th lifting level (for $κ=1$ margin), closely approaching $1.58$ local entropy (LE) prediction [18]. In this paper, we further connect parametric RDT to overlap gap properties (OGPs), another key geometric feature of the solution space. Specifically, for any positive integer $s$, we consider $s$-level ultrametric OGPs ($ult_s$-OGPs) and rigorously upper-bound the associated constraint densities $α_{ult_s}$. To achieve this, we develop an analytical union-bounding program consisting of combinatorial and probabilistic components. By casting the combinatorial part as a convex problem and the probabilistic part as a nested integration, we conduct numerical evaluations and obtain that the tightest bounds at the first two levels, $\barα_{ult_1} \approx 1.6578$ and $\barα_{ult_2} \approx 1.6219$, closely approach the 3rd and 4th lifting level parametric RDT estimates, $α_c^{(3)} \approx 1.6576$ and $α_c^{(4)} \approx 1.6218$. We also observe excellent agreement across other key parameters, including overlap values and the relative sizes of ultrametric clusters. Based on these observations, we propose several conjectures linking $ult$-OGP and parametric RDT. Specifically, we conjecture that algorithmic threshold $α_a=\lim_{s\rightarrow\infty} α_{ult_s} = \lim_{s\rightarrow\infty} \barα{ult_s} = \lim_{r\rightarrow\infty} α_{c}^{(r)}$, and $α_{ult_s} \leq α_{c}^{(s+2)}$ (with possible equality for some (maybe even all) $s$). Finally, we discuss the potential existence of a full isomorphism connecting all key parameters of $ult$-OGP and parametric RDT.

Submitted: 2026-04-21 ArXiv ID: 2604.19712v1

▶

Budgeted Online Influence Maximization

▶

Separating Geometry from Probability in the Analysis of Generalization

▶

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

▶

Heterogeneity-Aware Personalized Federated Learning for Industrial Predictive Analytics

Authors

Yuhan Hu, Xiaolei Fang

Abstract

Federated prognostics enable clients (e.g., companies, factories, and production lines) to collaboratively develop a failure time prediction model while keeping each client's data local and confidential. However, traditional federated models often assume homogeneity in the degradation processes across clients, an assumption that may not hold in many industrial settings. To overcome this, this paper proposes a personalized federated prognostic model designed to accommodate clients with heterogeneous degradation processes, allowing them to build tailored prognostic models. The prognostic model iteratively facilitates the underlying pairwise collaborations between clients with similar degradation patterns, which enhances the performance of personalized federated learning. To estimate parameters jointly using decentralized datasets, we develop a federated parameter estimation algorithm based on proximal gradient descent. The proposed approach addresses the limitations of existing federated prognostic models by simultaneously achieving model personalization, preserving data privacy, and providing comprehensive failure time distributions. The superiority of the proposed model is validated through extensive simulation studies and a case study using the turbofan engine degradation dataset from the NASA repository.

Submitted: 2026-04-21 ArXiv ID: 2604.19451v1

▶

Analytical Extraction of Conditional Sobol' Indices via Basis Decomposition of Polynomial Chaos Expansions

Authors

Shijie Zhong, Jiangfeng Fu

Abstract

In uncertainty quantification, evaluating sensitivity measures under specific conditions (i.e., conditional Sobol' indices) is essential for systems with parameterized responses, such as spatial fields or varying operating conditions. Traditional approaches often rely on point-wise modeling, which is computationally expensive and may lack consistency across the parameter space. This paper demonstrates that for a pre-trained global Polynomial Chaos Expansion (PCE) model, the analytical conditional Sobol' indices are inherently embedded within its basis functions. By leveraging the tensor-product property of PCE bases, we reformulate the global expansion into a set of analytical coefficient fields that depend on the conditioning variables. Based on the preservation of orthogonality under conditional probability measures, we derive closed-form expressions for conditional variances and Sobol' indices. This framework bypasses the need for repetitive modeling or additional sampling, transforming conditional sensitivity analysis into a purely algebraic post-processing step. Numerical benchmarks indicate that the proposed method ensures physical coherence and offers superior numerical robustness and computational efficiency compared to conventional point-wise approaches.

Comments: 11 pages, 2 figures

Submitted: 2026-04-21 ArXiv ID: 2604.19165v1

▶

Fast estimation of Gaussian mixture components via centering and singular value thresholding

▶

S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

Authors

Xuelin Zhang, Hong Chen, Yingjie Wang, et al.

Abstract

Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data and its corresponding graph Laplacian matrix. However, the graph Laplacian matrix depends heavily on the prespecified similarity metric and may lead to inappropriate penalties when dealing with redundant or noisy input variables. To address the above issues, this paper proposes a new \textit{Semi-Supervised Meta Additive Model (S$^2$MAM) based on a bilevel optimization scheme that automatically identifies informative variables, updates the similarity matrix, and simultaneously achieves interpretable predictions. Theoretical guarantees are provided for S$^2$MAM, including the computing convergence and the statistical generalization bound. Experimental assessments across 4 synthetic and 12 real-world datasets, with varying levels and categories of corruption, validate the robustness and interpretability of the proposed approach.

Submitted: 2026-04-21 ArXiv ID: 2604.19072v1

▶

Last-Iterate Guarantees for Learning in Co-coercive Games

▶

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

Authors

Julian Skifstad, Xinyue Annie Yang, Glen Chou

Abstract

Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure of transformer blocks, layer-wise dynamics across multiple LLM architectures and scales are well-approximated by locally-linear models. Exploiting this property, we model LLM inference as a linear time-varying dynamical system and adapt the classical linear quadratic regulator to compute feedback controllers using layer-wise Jacobians, steering activations toward desired semantic setpoints in closed-loop with minimal computational overhead and no offline training. We also derive theoretical bounds on setpoint tracking error, enabling formal guarantees on steering performance. Using a novel adaptive semantic feature setpoint signal, our method yields robust, fine-grained behavior control across models, scales, and tasks, including state-of-the-art modulation of toxicity, truthfulness, refusal, and arbitrary concepts, surpassing baseline steering methods. Our code is available at: https://github.com/trustworthyrobotics/lqr-activation-steering

Comments: Under review

Authors

Chengyu Huang, Sheng-Yen Chou, Zhengxin Zhang, et al.

Abstract

Self-play has recently emerged as a promising paradigm to train Large Language Models (LLMs). In self-play, the target LLM creates the task input (e.g., ask a question), which it then addresses itself by producing a task output (e.g., give an answer). A reward model evaluates the output, and the rewards are then used to train the LLM, typically via Reinforcement Learning (RL). Self-play incurs minimal supervision costs, and this is especially helpful for post-training LLMs, which require high-quality input-output pairs that traditionally have to be written by humans or expensive proprietary models. However, existing work explores self-play only for verifiable tasks such as math and coding. Instead, we seek to extend it to more realistic open-ended tasks. In particular, we propose POP, a self-play framework that uses the same LLM to synthesize evaluation rubrics, along with input-output pairs, for each example. The rubric is then used to evaluate outputs and train the model. We further ground the framework on a content-rich pretraining corpus to (1) ensure a generation-verification gap and reduce reward hacking, and (2) prevent mode collapse. On Qwen-2.5-7B, POP increases performance of both pretrained and instruction-tuned models, across different tasks ranging from long-form Healthcare QA to creative writing and instruction following.

Submitted: 2026-04-21 ArXiv ID: 2604.20051v1

▶

Separable Pathways for Causal Reasoning: How Architectural Scaffolding Enables Hypothesis-Space Restructuring in LLM Agents

▶

Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints

Authors

Konstantinos Ziliaskopoulos, Alexander Vinel

Abstract

Submitted: 2026-04-21 ArXiv ID: 2604.20031v1

▶

Replicable Bandits with UCB based Exploration

Authors

Rohan Deb, Udaya Ghai, Karan Singh, et al.

Abstract

We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $ρ$-replicable if two executions using shared internal randomness but independent reward realizations, produce the same action sequence with probability at least $1-ρ$. Prior work is primarily elimination-based and, in linear bandits with infinitely many actions, relies on discretization, leading to suboptimal dependence on the dimension $d$ and $ρ$. We develop optimistic alternatives for both settings. For stochastic multi-armed bandits, we propose RepUCB, a replicable batched UCB algorithm and show that it attains a regret $O\!\left(\frac{K^2\log^2 T}{ρ^2}\sum_{a:Δ_a>0}\left(Δ_a+\frac{\log(KT\log T)}{Δ_a}\right)\right)$. For stochastic linear bandits, we first introduce RepRidge, a replicable ridge regression estimator that satisfies both a confidence guarantee and a $ρ$-replicability guarantee. Beyond its role in our bandit algorithm, this estimator and its guarantees may also be of independent interest in other statistical estimation settings. We then use RepRidge to design RepLinUCB, a replicable optimistic algorithm for stochastic linear bandits, and show that its regret is bounded by $\widetilde{O}\!\big(\big(d+\frac{d^3}ρ\big)\sqrt{T}\big)$. This improves the best prior regret guarantee by a factor of $O(d/ρ)$, showing that our optimistic algorithm can substantially reduce the price of replicability.

Submitted: 2026-04-21 ArXiv ID: 2604.20024v1

▶

Statistics, Not Scale: Modular Medical Dialogue with Bayesian Belief Engine

Authors

Yusuf Kesmen, Fay Elhassan, Jiayi Ma, et al.

Abstract

Large language models are increasingly deployed as autonomous diagnostic agents, yet they conflate two fundamentally different capabilities: natural-language communication and probabilistic reasoning. We argue that this conflation is an architectural flaw, not an engineering shortcoming. We introduce BMBE (Bayesian Medical Belief Engine), a modular diagnostic dialogue framework that enforces a strict separation between language and reasoning: an LLM serves only as a sensor, parsing patient utterances into structured evidence and verbalising questions, while all diagnostic inference resides in a deterministic, auditable Bayesian engine. Because patient data never enters the LLM, the architecture is private by construction; because the statistical backend is a standalone module, it can be replaced per target population without retraining. This separation yields three properties no autonomous LLM can offer: calibrated selective diagnosis with a continuously adjustable accuracy-coverage tradeoff, a statistical separation gap where even a cheap sensor paired with the engine outperforms a frontier standalone model from the same family at a fraction of the cost, and robustness to adversarial patient communication styles that cause standalone doctors to collapse. We validate across empirical and LLM-generated knowledge bases against frontier LLMs, confirming the advantage is architectural, not informational.

Comments: 12 figures, 17 tables

Submitted: 2026-04-21 ArXiv ID: 2604.20022v1

▶

Continuous Semantic Caching for Low-Cost LLM Serving

Authors

Baran Atalar, Xutong Liu, Jinhang Zuo, et al.

Abstract

As Large Language Models (LLMs) become increasingly popular, caching responses so that they can be reused by users with semantically similar queries has become a vital strategy for reducing inference costs and latency. Existing caching frameworks have proposed to decide which query responses to cache by assuming a finite, known universe of discrete queries and learning their serving costs and arrival probabilities. As LLMs' pool of users and queries expands, however, such an assumption becomes increasingly untenable: real-world LLM queries reside in an infinite, continuous embedding space. In this paper, we establish the first rigorous theoretical framework for semantic LLM response caching in continuous query space under uncertainty. To bridge the gap between discrete optimization and continuous representation spaces, we introduce dynamic $ε$-net discretization coupled with Kernel Ridge Regression. This design enables the system to formally quantify estimation uncertainty and generalize partial feedback on LLM query costs across continuous semantic query neighborhoods. We develop both offline learning and online adaptive algorithms optimized to reduce switching costs incurred by changing the cached responses. We prove that our online algorithm achieves a sublinear regret bound against an optimal continuous oracle, which reduces to existing bounds for discrete query models. Extensive empirical evaluations demonstrate that our framework approximates the continuous optimal cache well while also reducing computational and switching overhead compared to existing methods.

Submitted: 2026-04-21 ArXiv ID: 2604.20021v1

▶

Multi-Objective Reinforcement Learning for Generating Covalent Inhibitor Candidates

Authors

Renee Gil

Abstract

Rational design of covalent inhibitors requires simultaneously optimizing multiple properties, such as binding affinity, target selectivity, or electrophilic reactivity. This presents a multi-objective problem not easily addressed by screening alone. Here we present a machine learning pipeline for generating covalent inhibitor candidates using multi-objective reinforcement learning (RL), applied to two targets: epidermal growth factor receptor (EGFR) and acetylcholinesterase (ACHE). A SMILES-based pretrained LSTM serves as the generative model, optimized via policy gradient RL with Pareto crowding distance to balance competing scoring functions including synthetic accessibility, predicted covalent activity, residue affinity, and an approximated docking score. The pipeline rediscovers known covalent inhibitors at rates of up to 0.50% (EGFR) and 0.74% (ACHE) in 10,000-structure runs, with candidate structures achieving warhead-to-residue distances as short as 5.5 angstrom (EGFR) and 3.2 angstrom (ACHE) after further docking-based screening. More notably, the pipeline spontaneously generates structures bearing warhead motifs absent from the training data - including allenes, 3-oxo-$β$-sultams, and $α$-methylene-$β$-lactones - all of which have independent literature support as covalent warheads. These results suggest that RL-guided generation can explore covalent chemical space beyond its training distribution, and may be useful as a tool for medicinal chemists working on covalent drug discovery.

Submitted: 2026-04-21 ArXiv ID: 2604.20019v1

▶

scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics

▶

Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation

Authors

Zehuan Zhang, Mark Chen, He Li, et al.

Abstract

Complex-Valued Neural Networks (CVNNs) have significant advantages in handling tasks that involve complex numbers. However, existing CVNNs are unable to quantify predictive uncertainty. We propose, for the first time, dropout-based Bayesian Complex-Valued Neural Networks (BayesCVNNs) to enable uncertainty quantification for complex-valued applications, exhibiting broad applicability and efficiency for hardware implementation due to modularity. Furthermore, as the dual-part nature of complex values significantly broadens the design space and enables novel configurations based on layer-mixing and part-mixing, we introduce an automated search approach to effectively identify optimal configurations for both real and imaginary components. To facilitate deployment, we present a framework that generates customized FPGA-based accelerators for BayesCVNNs, leveraging a set of optimized building blocks. Experiments demonstrate the best configuration can be effectively found via the automated search, attaining higher performance with lower hardware costs compared with manually crafted models. The optimized accelerators achieve approximately 4.5x and 13x speedups on different models with less than 10% power consumption compared to GPU implementations, and outperform existing work in both algorithm and hardware aspects. Our code is publicly available at: https://github.com/zehuanzhang/BayesCVNN.git.

Comments: Accepted to 63rd ACM/IEEE Design Automation Conference (DAC '26). 7 pages, 6 figures

Submitted: 2026-04-21 ArXiv ID: 2604.19993v1

▶

Fast Amortized Fitting of Scientific Signals Across Time and Ensembles via Transferable Neural Fields

▶

Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

Authors

Het Patel, Tiejin Chen, Hua Wei, et al.

Abstract

Large language models can be uncertain yet correct, or confident yet wrong, raising the question of whether their output-level uncertainty and their actual correctness are driven by the same internal mechanisms or by distinct feature populations. We introduce a 2x2 framework that partitions model predictions along correctness and confidence axes, and uses sparse autoencoders to identify features associated with each dimension independently. Applying this to Llama-3.1-8B and Gemma-2-9B, we identify three feature populations that play fundamentally different functional roles. Pure uncertainty features are functionally essential: suppressing them severely degrades accuracy. Pure incorrectness features are functionally inert: despite showing statistically significant activation differences between correct and incorrect predictions, the majority produce near-zero change in accuracy when suppressed. Confounded features that encode both signals are detrimental to output quality, and targeted suppression of them yields a 1.1% accuracy improvement and a 75% entropy reduction, with effects transferring across the ARC-Challenge and RACE benchmarks. The feature categories are also informationally distinct: the activations of just 3 confounded features from a single mid-network layer predict model correctness (AUROC ~0.79), enabling selective abstention that raises accuracy from 62% to 81% at 53% coverage. The results demonstrate that uncertainty and correctness are distinct internal phenomena, with implications for interpretability and targeted inference-time intervention.

Submitted: 2026-04-21 ArXiv ID: 2604.19974v1

▶

DistortBench: Benchmarking Vision Language Models on Image Distortion Identification

Authors

Divyanshu Goyal, Akhil Eppa, Vanya Bannihatti Kumar

Abstract

Vision-language models (VLMs) are increasingly used in settings where sensitivity to low-level image degradations matters, including content moderation, image restoration, and quality monitoring. Yet their ability to recognize distortion type and severity remains poorly understood. We present DistortBench, a diagnostic benchmark for no-reference distortion perception in VLMs. DistortBench contains 13,500 four-choice questions covering 27 distortion types, six perceptual categories, and five severity levels: 25 distortions inherit KADID-10k calibrations, while two added rotation distortions use monotonic angle-based levels. We evaluate 18 VLMs, including 17 open-weight models from five families and one proprietary model. Despite strong performance on high-level vision-language tasks, the best model reaches only 61.9% accuracy, just below the human majority-vote baseline of 65.7% (average individual: 60.2%), indicating that low-level perceptual understanding remains a major weakness of current VLMs. Our analysis further reveals weak and non-monotonic scaling with model size, performance drops in most base--thinking pairs, and distinct severity-response patterns across model families. We hope DistortBench will serve as a useful benchmark for measuring and improving low-level visual perception in VLMs.

Submitted: 2026-04-21 ArXiv ID: 2604.19966v1

▶

Generalization and Membership Inference Attack a Practical Perspective

▶

Physics-Guided Dimension Reduction for Simulation-Free Operator Learning of Stiff Differential--Algebraic Systems

Authors

Huy Hoang Le, Haoguang Wang, Christian Moya, et al.

Abstract

Neural surrogates for stiff differential-algebraic equations (DAEs) face two key challenges: soft-constraint methods leave algebraic residuals that stiffness amplifies into large errors, while hard-constraint methods require trajectory data from computationally expensive stiff integrators. We introduce an extended Newton implicit layer that enforces algebraic consistency and quasi-steady-state reduction within a single differentiable solve. Given slow-state predictions from a physics-informed DeepONet, the proposed layer recovers fast and algebraic states, eliminates the stiffness-amplification pathway within each time window, and reduces the output dimension to the slow states alone. Gradients derived via the implicit function theorem capture a stiffness-scaled coupling term that is absent in penalty-based approaches. Cascaded implicit layers further extend the framework to multi-component systems with provable convergence. On a grid-forming inverter DAE (21 states), the proposed method (7 outputs, 1.42 percent error) significantly outperforms penalty methods (39.3 percent), standard Newton approaches (57.0 percent), and augmented Lagrangian or feedback linearization baselines, which fail to converge. Two independently trained models compose into a 44-state system without retraining, achieving 0.72 to 1.16 percent error with zero algebraic residual. Conformal prediction further provides 90 percent coverage in-distribution and enables automatic out-of-distribution detection.

Submitted: 2026-04-21 ArXiv ID: 2604.19930v1

▶

A Multi-Plant Machine Learning Framework for Emission Prediction, Forecasting, and Control in Cement Manufacturing

SLAM Labs, :, Oleksiy Ostapenko, et al.

Abstract

We release Super Apriel, a 15B-parameter supernet in which every decoder layer provides four trained mixer choices -- Full Attention (FA), Sliding Window Attention (SWA), Kimi Delta Attention (KDA), and Gated DeltaNet (GDN). A placement selects one mixer per layer; placements can be switched between requests at serving time without reloading weights, enabling multiple speed presets from a single checkpoint. The shared checkpoint also enables speculative decoding without a separate draft model. The all-FA preset matches the Apriel 1.6 teacher on all reported benchmarks; recommended hybrid presets span $2.9\times$ to $10.7\times$ decode throughput at 96% to 77% quality retention, with throughput advantages that compound at longer context lengths. With four mixer types across 48 layers, the configuration space is vast. A surrogate that predicts placement quality from the per-layer mixer assignment makes the speed-quality landscape tractable and identifies the best tradeoffs at each speed level. We investigate whether the best configurations at each speed level can be identified early in training or only after convergence. Rankings stabilize quickly at 0.5B scale, but the most efficient configurations exhibit higher instability at 15B, cautioning against extrapolation from smaller models. Super Apriel is trained by stochastic distillation from a frozen Apriel 1.6 teacher, followed by supervised fine-tuning. We release the supernet weights, Fast-LLM training code, vLLM serving code, and a placement optimization toolkit.

Comments: Models: https://huggingface.co/ServiceNow-AI/SuperApriel-15B-Base and https://huggingface.co/ServiceNow-AI/SuperApriel-15B-Instruct . Dev model: https://huggingface.co/ServiceNow-AI/SuperApriel-0.5B-Base . Training code: https://github.com/ServiceNow/Fast-LLM . Async RL: https://github.com/ServiceNow/pipeline-rl . Training logs: https://wandb.ai/servicenow-team/Super_Apriel

Submitted: 2026-04-21 ArXiv ID: 2604.19877v1

▶

Generalization at the Edge of Stability

▶

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Authors

Venus Team, Sunhao Dai, Yong Deng, et al.

Abstract

Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.

Comments: Technical Report of DR-Venus

Submitted: 2026-04-21 ArXiv ID: 2604.19859v1

Personalized Federated Learning (PFL) aims to learn multiple task-specific models rather than a single global model across heterogeneous data distributions. Existing PFL approaches typically rely on iterative optimization-such as model update trajectories-to cluster users that need to accomplish the same tasks together. However, these learning-dynamics-based methods are inherently vulnerable to low-quality data and noisy labels, as corrupted updates distort clustering decisions and degrade personalization performance. To tackle this, we propose FB-NLL, a feature-centric framework that decouples user clustering from iterative training dynamics. By exploiting the intrinsic heterogeneity of local feature spaces, FB-NLL characterizes each user through the spectral structure of the covariances of their feature representations and leverages subspace similarity to identify task-consistent user groupings. This geometry-aware clustering is label-agnostic and is performed in a one-shot manner prior to training, significantly reducing communication overhead and computational costs compared to iterative baselines. Complementing this, we introduce a feature-consistency-based detection and correction strategy to address noisy labels within clusters. By leveraging directional alignment in the learned feature space and assigning labels based on class-specific feature subspaces, our method mitigates corrupted supervision without requiring estimation of stochastic noise transition matrices. In addition, FB-NLL is model-independent and integrates seamlessly with existing noise-robust training techniques. Extensive experiments across diverse datasets and noise regimes demonstrate that our framework consistently outperforms state-of-the-art baselines in terms of average accuracy and performance stability.

Comments: Submitted for journal publication

Submitted: 2026-04-21 ArXiv ID: 2604.19729v1

▶

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

Authors

Jean Mercat, Sedrick Keh, Kushal Arora, et al.

Abstract

We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, from language pretraining to action-expert fine-tuning. VLA Foundry supports both from-scratch training and pretrained backbones from Hugging Face. To demonstrate the utility of our framework, we train and release two types of models: the first trained fully from scratch through our LLM-->VLM-->VLA pipeline and the second built on the pretrained Qwen3-VL backbone. We evaluate closed-loop policy performance of both models on LBM Eval, an open-data, open-source simulator. We also contribute usability improvements to the simulator and the STEP analysis tools for easier public use. In the nominal evaluation setting, our fully-open from-scratch model is on par with our prior closed-source work and substituting in the Qwen3-VL backbone leads to a strong multi-task table top manipulation policy outperforming our baseline by a wide margin. The VLA Foundry codebase is available at https://github.com/TRI-ML/vla_foundry and all multi-task model weights are released on https://huggingface.co/collections/TRI-ML/vla-foundry. Additional qualitative videos are available on the project website https://tri-ml.github.io/vla_foundry.

Comments: 32 pages, 16 figures, technical report

Submitted: 2026-04-21 ArXiv ID: 2604.19728v1

▶

Benign Overfitting in Adversarial Training for Vision Transformers

▶

Adaptive MSD-Splitting: Enhancing C4.5 and Random Forests for Skewed Continuous Attributes

Authors

Jake Lee

Abstract

The discretization of continuous numerical attributes remains a persistent computational bottleneck in the induction of decision trees, particularly as dataset dimensions scale. Building upon the recently proposed MSD-Splitting technique -- which bins continuous data using the empirical mean and standard deviation to dramatically improve the efficiency and accuracy of the C4.5 algorithm -- we introduce Adaptive MSD-Splitting (AMSD). While standard MSD-Splitting is highly effective for approximately symmetric distributions, its rigid adherence to fixed one-standard-deviation cutoffs can lead to catastrophic information loss in highly skewed data, a common artifact in real-world biomedical and financial datasets. AMSD addresses this by dynamically adjusting the standard deviation multiplier based on feature skewness, narrowing intervals in dense regions to preserve discriminative resolution. Furthermore, we integrate AMSD into ensemble methods, specifically presenting the Random Forest-AMSD (RF-AMSD) framework. Empirical evaluations on the Census Income, Heart Disease, Breast Cancer, and Forest Covertype datasets demonstrate that AMSD yields a 2-4% accuracy improvement over standard MSD-Splitting, while maintaining near-identical O(N) time complexity reductions compared to the O(N log N) exhaustive search. Our Random Forest extension achieves state-of-the-art accuracy at a fraction of standard computational costs, confirming the viability of adaptive statistical binning in large-scale ensemble learning architectures.

Submitted: 2026-04-21 ArXiv ID: 2604.19722v1

▶

Ultrametric OGP - parametric RDT \emph{symmetric} binary perceptron connection

Authors

Mihailo Stojnic

Abstract

PREF-XAI: Preference-Based Personalized Rule Explanations of Black-Box Machine Learning Models

Authors

Salvatore Greco, Jacek Karolczak, Roman Słowiński, et al.

Abstract

Explainable artificial intelligence (XAI) has predominantly focused on generating model-centric explanations that approximate the behavior of black-box models. However, such explanations often overlook a fundamental aspect of interpretability: different users require different explanations depending on their goals, preferences, and cognitive constraints. Although recent work has explored user-centric and personalized explanations, most existing approaches rely on heuristic adaptations or implicit user modeling, lacking a principled framework for representing and learning individual preferences. In this paper, we consider Preference-Based Explainable Artificial Intelligence (PREF-XAI), a novel perspective that reframes explanation as a preference-driven decision problem. Within PREF-XAI, explanations are not treated as fixed outputs, but as alternatives to be evaluated and selected according to user-specific criteria. In the PREF-XAI perspective, here we propose a methodology that combines rule-based explanations with formal preference learning. User preferences are elicited through a ranking of a small set of candidate explanations and modeled via an additive utility function inferred using robust ordinal regression. Experimental results on real-world datasets show that PREF-XAI can accurately reconstruct user preferences from limited feedback, identify highly relevant explanations, and discover novel explanatory rules not initially considered by the user. Beyond the proposed methodology, this work establishes a connection between XAI and preference learning, opening new directions for interactive and adaptive explanation systems.

Submitted: 2026-04-21 ArXiv ID: 2604.19684v1

▶

Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty

Authors

Hunter L. Brown, Geoffrey Hollinger, Stefan Lee

Abstract

Reinforcement learning-based control policies have been frequently demonstrated to be more effective than analytical techniques for many manipulation tasks. Commonly, these methods learn neural control policies that predict end-effector pose changes directly from observed state information. For tasks like inserting delicate connectors which induce force constraints, pose-based policies have limited explicit control over force and rely on carefully tuned low-level controllers to avoid executing damaging actions. In this work, we present hybrid position-force control policies that learn to dynamically select when to use force or position control in each control dimension. To improve learning efficiency of these policies, we introduce Mode-Aware Training for Contact Handling (MATCH) which adjusts policy action probabilities to explicitly mirror the mode selection behavior in hybrid control. We validate MATCH's learned policy effectiveness using fragile peg-in-hole tasks under extreme localization uncertainty. We find MATCH substantially outperforms pose-control policies -- solving these tasks with up to 10% higher success rates and 5x fewer peg breaks than pose-only policies under common types of state estimation error. MATCH also demonstrates data efficiency equal to pose-control policies, despite learning in a larger and more complex action space. In over 1600 sim-to-real experiments, we find MATCH succeeds twice as often as pose policies in high noise settings (33% vs.~68%) and applies ~30% less force on average compared to variable impedance policies on a Franka FR3 in laboratory conditions.

Submitted: 2026-04-21 ArXiv ID: 2604.19677v1

▶

Budgeted Online Influence Maximization

▶

HardNet++: Nonlinear Constraint Enforcement in Neural Networks

Authors

Andrea Goertzen, Kaveh Alim, Navid Azizan

Abstract

Enforcing constraint satisfaction in neural network outputs is critical for safety, reliability, and physical fidelity in many control and decision-making applications. While soft-constrained methods penalize constraint violations during training, they do not guarantee constraint adherence during inference. Other approaches guarantee constraint satisfaction via specific parameterizations or a projection layer, but are tailored to specific forms (e.g., linear constraints), limiting their utility in other general problem settings. Many real-world problems of interest are nonlinear, motivating the development of methods that can enforce general nonlinear constraints. To this end, we introduce HardNet++, a constraint-enforcement method that simultaneously satisfies linear and nonlinear equality and inequality constraints. Our approach iteratively adjusts the network output via damped local linearizations. Each iteration is differentiable, admitting an end-to-end training framework, where the constraint satisfaction layer is active during training. We show that under certain regularity conditions, this procedure can enforce nonlinear constraint satisfaction to arbitrary tolerance. Finally, we demonstrate tight constraint adherence without loss of optimality in a learning-for-optimization context, where we apply this method to a model predictive control problem with nonlinear state constraints.

Submitted: 2026-04-21 ArXiv ID: 2604.19669v1

▶

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Authors

Yi Zhong, Buqiang Xu, Yijun Wang, et al.

Abstract

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve-making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic framework to mitigate recurrent execution errors. Chat2Workflow is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be transformed and directly deployed to practical workflow platforms such as Dify and Coze. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially under complex or changing requirements. Although our agentic framework yields up to 5.34% resolve rate gains, the remaining real-world gap positions Chat2Workflow as a foundation for advancing industrial-grade automation. Code is available at https://github.com/zjunlp/Chat2Workflow.

Comments: Work in progress

Submitted: 2026-04-21 ArXiv ID: 2604.19667v1

▶

From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems

Authors

Quang-Huy Nguyen, Thanh-Hai Nguyen, Khac-Manh Thai, et al.

Abstract

Counterfactual explanations (CEs) provide an intuitive way to understand recommender systems by identifying minimal modifications to user-item interactions that alter recommendation outcomes. Existing CE methods for recommender systems, however, have been evaluated under heterogeneous protocols, using different datasets, recommenders, metrics, and even explanation formats, which hampers reproducibility and fair comparison. Our paper systematically reproduces, re-implement, and re-evaluate eleven state-of-the-art CE methods for recommender systems, covering both native explainers (e.g., LIME-RS, SHAP, PRINCE, ACCENT, LXR, GREASE) and specific graph-based explainers originally proposed for GNNs. Here, a unified benchmarking framework is proposed to assess explainers along three dimensions: explanation format (implicit vs. explicit), evaluation level (item-level vs. list-level), and perturbation scope (user interaction vectors vs. user-item interaction graphs). Our evaluation protocol includes effectiveness, sparsity, and computational complexity metrics, and extends existing item-level assessments to top-K list-level explanations. Through extensive experiments on three real-world datasets and six representative recommender models, we analyze how well previously reported strengths of CE methods generalize across diverse setups. We observe that the trade-off between effectiveness and sparsity depends strongly on the specific method and evaluation setting, particularly under the explicit format; in addition, explainer performance remains largely consistent across item level and list level evaluations, and several graph-based explainers exhibit notable scalability limitations on large recommender graphs. Our results refine and challenge earlier conclusions about the robustness and practicality of CE generation methods in recommender systems: https://github.com/L2R-UET/CFExpRec.

Submitted: 2026-04-21 ArXiv ID: 2604.19663v1

▶

Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage Identification

Authors

Xudong Jian, Charikleia Stoura, Simon Scandella, et al.

Abstract

Damage identification is a core task in structural health monitoring. In practice, however, its reliability is often compromised by confounding non-damage effects, such as variations in excitation and environmental conditions, which can induce changes comparable to or larger than those caused by structural damage. To address this challenge, this study proposes a self-supervised label-free disentangled representation learning framework for robust vibration-based structural damage identification. The proposed framework employs an autoencoder with two latent representations to learn directly from raw vibration acceleration signals. A self-supervised invariance regularization, implemented via Variance-Invariance-Covariance Regularization (VICReg), is imposed on one latent representation using baseline data where structural damage is assumed constant but operational and environmental conditions vary. In addition, a frequency-domain constraint is introduced to enforce agreement between the power spectral density reconstructed from the latent representation and that computed from the corresponding input time series. Together, these mechanisms promote disentanglement, enabling the learned representation to be sensitive to damage-related characteristics while remaining invariant to nuisance variability. The framework is trained in a fully end-to-end and label-free manner, requiring no prior information on damage, excitation, or environmental conditions, making it well-suited for real-world applications. Its effectiveness is validated on two distinct real-world vibration datasets, including a bridge and a gearbox. The results demonstrate robustness to operational variability, strong generalization capability, and good performance in both damage detection and quantification.

Submitted: 2026-04-21 ArXiv ID: 2604.19658v1

▶

SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets

▶

RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian

Authors

Mircea Timpuriu, Mihaela-Claudia Cercel, Dumitru-Clementin Cercel

Abstract

The importance of clear and correct text in legal documents cannot be understated, and, consequently, a grammatical error correction tool meant to assist a professional in the law must have the ability to understand the possible errors in the context of a legal environment, correcting them accordingly, and implicitly needs to be trained in the same environment, using realistic legal data. However, the manually annotated data required by such a process is in short supply for languages such as Romanian, much less for a niche domain. The most common approach is the synthetic generation of parallel data; however, it requires a structured understanding of the Romanian grammar. In this paper, we introduce, to our knowledge, the first Romanian-language parallel dataset for the detection and correction of grammatical errors in the legal domain, RoLegalGEC, which aggregates 350,000 examples of errors in legal passages, along with error annotations. Moreover, we evaluate several neural network models that transform the dataset into a valuable tool for both detecting and correcting grammatical errors, including knowledge-distillation Transformers, sequence tagging architectures for detection, and a variety of pre-trained text-to-text Transformer models for correction. We consider that the set of models, together with the novel RoLegalGEC dataset, will enrich the resource base for further research on Romanian.

Submitted: 2026-04-21 ArXiv ID: 2604.19593v2

▶

An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $Φ$-Regret Minimization

Authors

Gabriele Farina, Juan Carlos Perdomo

Abstract

We give a Gordon-Greenwald-Marks (GGM) style black-box reduction from online learning to online multicalibration. Concretely, we show that to achieve high-dimensional multicalibration with respect to a class of functions H, it suffices to combine any no-regret learner over H with an expected variational inequality (EVI) solver. We also prove a converse statement showing that efficient multicalibration implies efficient EVI solving, highlighting how EVIs in multicalibration mirror the role of fixed points in the GGM result for $Φ$-regret. This first set of results resolves the main open question in Garg, Jung, Reingold, and Roth (SODA '24), showing that oracle-efficient online multicalibration with $\sqrt{T}$-type guarantees is possible in full generality. Furthermore, our GGM-style reduction unifies the analyses of existing online multicalibration algorithms, enables new algorithms for challenging environments with delayed observations or censored outcomes, and yields the first efficient black-box reduction between online learning and multiclass omniprediction. Our second main result is a fine-grained reduction from high-dimensional online multicalibration to (contextual) $Φ$-regret minimization. Together with our first result, this establishes a new route from external regret to Phi-regret that bypasses sophisticated fixed-point or semi-separation machinery, dramatically simplifies a result of Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25) while improving rates, and yields new algorithms that are robust to richer deviation classes, such as those belonging to any reproducing kernel Hilbert space.

Submitted: 2026-04-21 ArXiv ID: 2604.19592v1

▶

Lyapunov-Certified Direct Switching Theory for Q-Learning

▶

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

Authors

Jonas Waldendorf, Bashar Awwad Shiekh Hasan, Evgenii Tsymbalov

Abstract

Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on gold-standard outputs that are costly or impractical to obtain. Moreover, hallucination detection methods developed for text-based LLMs do not directly capture audio-specific signals. We investigate four attention-derived metrics: AUDIORATIO, AUDIOCONSISTENCY, AUDIOENTROPY, and TEXTENTROPY, designed to capture pathological attention patterns associated with hallucination, and train lightweight logistic regression classifiers on these features for efficient inference-time detection. Across automatic speech recognition and speech-to-text translation tasks, evaluations on Qwen-2-Audio and Voxtral-3B show that our approach outperforms uncertainty-based and prior attention-based baselines on in-domain data, achieving improvements of up to +0.23 PR-AUC, and generalises to out-of-domain ASR settings. We further find that strong performance can be achieved with approximately 100 attention heads, improving out-of-domain generalisation compared to using all heads. While effectiveness is model-dependent and task-specific training is required, our results demonstrate that attention patterns provide a valuable tool for hallucination detection in SpeechLLMs.

Comments: Accepted to Findings of ACL 2026

Submitted: 2026-04-21 ArXiv ID: 2604.19565v1

We propose a fundamental shift in the search for beyond the Standard Model long-lived particles (LLPs) at high-luminosity hadron colliders by prioritizing physical background suppression over traditional inner tracking. We introduce $\textsf{DELIGHT-SHIELD}$, a dedicated detector design for a 100 TeV Future Circular Collider at a dedicated interaction point for LLP searches. By replacing the inner parts of the detector with a multi-layered composite shield, followed by tracking volumes, we estimate a suppression of Standard Model hadronic and electromagnetic backgrounds by up to seven orders of magnitude analytically. Full Geant4 simulations validate the effectiveness of this design. Although the achieved suppression is somewhat lower than the analytical estimate, primarily due to secondary particle production within the shield, the residual background remains at a level that is manageable for LLP analyses. It can be further mitigated by applying energy thresholds, as well as vertexing and timing cuts in the downstream detector. Benchmarking against dark scalar model, we show that this shielding based detector concept achieves sensitivity to branching ratios as low as $\mathcal{O}(10^{-9})$ for $h\rightarrowφφ$ process under zero background condition $-$ outperforming general-purpose detector baselines. This strategy not only expands the discovery reach for neutral LLPs but also provides a rigorous experimental handle to distinguish new physics from Standard Model punch-through backgrounds. We further discuss a phased implementation at the High-Luminosity LHC as a critical testbed for this novel detection concept.

Comments: 30 pages, 3 figures, 5 Tables

Submitted: 2026-04-20 ArXiv ID: 2604.18693v1

▶

Optomechanical Detection of Individual Gas Collisions

▶

Two-body charmed anti-charmed baryonic $B$ decays

Authors

Chun-Khiang Chua

Abstract

We study the rates of two-body charmed anti-charmed baryonic $\overline B\to {\bf B}_c \overline {\bf B}_c$ decays using the topological amplitude approach. All amplitudes of $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$, ${\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf { 3_f})$, ${\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf {\bar 6_f})$ and ${\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf {\bar 6_f})$ decays are decomposed topologically. SU(3) breaking effects on these amplitudes, depending on the position of the $s$-quark line, are modeled. Using existing data as inputs, we obtained the following results. (i) In the low-lying $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$ decays, we find that the exchange diagram is sizable. Furthermore, there is a large cancellation between internal $W$-tree and exchange $W$-tree amplitudes. The SU(3) breaking is sizable, 35% SU(3) breaking effects are needed, and they work differently in different amplitudes. The rates of $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$ decays with some excited ${\bf B}_c(\bf {\bar 3_f})$ are also studied. (ii) The $\overline B\to {\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf { 3_f})$ decays, with low-lying $ \overline {\bf B}_c(\bf { 3_f})$ and low-lying and some excited ${\bf B}_c(\bf 6_f)$ baryons are studied with some predictions on rates obtained. (iii) The $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf {\bar 6_f})$ decays with low-lying charmed anti-charmed baryons are studied with some predictions on rates obtained. (iv) Uncertainties in most predicted rates are large, reflecting our current poor understanding of the related SU(3) breaking effects. Measuring these rates can provide very useful information about these effects.

Comments: 28 pages, 2 figures, 16 tables, wording slightly changed, references added

Submitted: 2026-04-20 ArXiv ID: 2604.18366v2

▶

Extracting Dark-Matter Mass from Angular Scanning

▶

A new approach to long-lived particle detection at hadron colliders: the $\textsf{DELIGHT-SHIELD}$ concept

Nicolas Grimbaum Yamamoto, Thomas Hambye

Abstract

High energy neutrinos can be injected in the early Universe from the decay or annihilation of long lived primordial relics. We analyse the possibility that the ultrahigh energy neutrino event recently observed by the KM3NeT neutrino telescope could have such an origin. This possibility has the advantage of leading to a sharp spectral feature in a way that the neutrino flux can be small at all energies except at the KM3NeT event energy. Thus, along this scenario the tension with null results from other experiments is reduced with respect to the usual power law case analysed by the KM3NeT and IceCube experiments. At such energies and for an emission around the recombination time, interactions of these neutrinos with background neutrinos prove to be relevant and must be determined from the development of a dedicated code. These interactions, as well as final state radiation processes, modify the spectrum. Interestingly, it turns out that the scenario can also leave an imprint in the CMB that could be probed in the near future. Interestingly too, this scenario does not predict an associated $γ$-ray flux beyond observation. All in all we do find that the high energy neutrino could be a primordial high energy neutrino, provided it has been produced around the recombination time or later.

Comments: 18 pages, 7 figures

Submitted: 2026-04-20 ArXiv ID: 2604.18677v1

▶

Two-body charmed anti-charmed baryonic $B$ decays

Sexaquarks and $H$ dibaryons in the $uuddss$ system: a comparison within a constituent quark model

Comments: 21 pages, 13 figures. arXiv admin note: text overlap with arXiv:2405.09240 by other authors

Submitted: 2026-04-20 ArXiv ID: 2604.17764v1

▶

Polarization, Maximal Concurrence, and Pure States in High-Energy Collisions

▶

Searching for dark photons in $J/ψ$ decays

Aoran Zhang, Tianyao Wei, Maria J. Guerrero, et al.

Abstract

Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. To address these issues, we propose a framework for structured sparse nonnegative low-rank factorization with detection probability estimation. We impose nonconvex $\ell_{1/2}$ regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. The resulting optimization problem is nonconvex and nonsmooth. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines.

Comments: 13 pages, 4 figures

Submitted: 2026-04-20 ArXiv ID: 2604.18820v1

▶

Discrete Tilt Matching

▶

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Authors

Vin Bhaskara, Haicheng Wang

Abstract

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.

Comments: 17 pages, 6 figures, 1 table

Submitted: 2026-04-20 ArXiv ID: 2604.18701v1

▶

Revisiting Active Sequential Prediction-Powered Mean Estimation

Authors

Maria-Eleni Sfyraki, Jun-Kun Wang

Abstract

In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the label is not queried, the prediction from a machine learning model is used instead. Prior work proposed an elegant scheme that determines the query probability by combining an uncertainty-based suggestion with a constant probability that encodes a soft constraint on the query probability. We explored different values of the mixing parameter and observed an intriguing empirical pattern: the smallest confidence width tends to occur when the weight on the constant probability is close to one, thereby reducing the influence of the uncertainty-based component. Motivated by this observation, we develop a non-asymptotic analysis of the estimator and establish a data-dependent bound on its confidence interval. Our analysis further suggests that when a no-regret learning approach is used to determine the query probability and control this bound, the query probability converges to the constraint of the max value of the query probability when it is chosen obliviously to the current covariates. We also conduct simulations that corroborate these theoretical findings.

Comments: Published as a conference paper at ICLR 2026

Submitted: 2026-04-20 ArXiv ID: 2604.18569v1

▶

FUSE: Ensembling Verifiers with Zero Labeled Data

▶

Bayesian experimental design: grouped geometric pooled posterior via ensemble Kalman methods

Authors

Huchen Yang, Xinghao Dong, Jinlong Wu

Abstract

Bayesian experimental design (BED) for complex physical systems is often limited by the nested inference required to estimate the expected information gain (EIG) or its gradients. Each outer sample induces a different posterior, creating a large and heterogeneous set of inference targets. Existing methods have to sacrifice either accuracy or efficiency: they either perform per-outer-sample posterior inference, which yields higher fidelity but at prohibitive computational cost, or amortize the inner inference across all outer samples for computational reuse, at the risk of degraded accuracy under posterior heterogeneity. To improve accuracy and maintain cost at the amortized level, we propose a grouped geometric pooled posterior framework that partitions outer samples into groups and constructs a pooled proposal for each group. While such grouping strategy would normally require generating separate proposal samples for different groups, our tailored ensemble Kalman inversion (EKI) formulation generates these samples without extra forward-model evaluation cost. We also introduce a conservative diagnostic to assess importance-sampling quality to guide grouping. This grouping strategy improves within-group proposal-target alignment, yielding more accurate and stable estimators while keeping the cost comparable to amortized approaches. We evaluate the performance of our method on both Gaussian-linear and high-dimensional network-based model discrepancy calibration problems.

Submitted: 2026-04-20 ArXiv ID: 2604.18505v1

▶

Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

Jonas Arruda, Sophie Chervet, Paula Staudt, et al.

Abstract

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.

Submitted: 2026-04-20 ArXiv ID: 2604.18319v1

▶

Symmetry Guarantees Statistic Recovery in Variational Inference

Authors

Daniel Marks, Dario Paccagnan, Mark van der Wilk

Abstract

Variational inference (VI) is a central tool in modern machine learning, used to approximate an intractable target density by optimising over a tractable family of distributions. As the variational family cannot typically represent the target exactly, guarantees on the quality of the resulting approximation are crucial for understanding which of its properties VI can faithfully capture. Recent work has identified instances in which symmetries of the target and the variational family enable the recovery of certain statistics, even under model misspecification. However, these guarantees are inherently problem-specific and offer little insight into the fundamental mechanism by which symmetry forces statistic recovery. In this paper, we overcome this limitation by developing a general theory of symmetry-induced statistic recovery in variational inference. First, we characterise when variational minimisers inherit the symmetries of the target and establish conditions under which these pin down identifiable statistics. Second, we unify existing results by showing that previously known statistic recovery guarantees in location-scale families arise as special cases of our theory. Third, we apply our framework to distributions on the sphere to obtain novel guarantees for directional statistics in von Mises-Fisher families. Together, these results provide a modular blueprint for deriving new recovery guarantees for VI in a broad range of symmetry settings.

Comments: 19 pages, 2 figures

Submitted: 2026-04-20 ArXiv ID: 2604.18310v1

▶

Horospherical Depth and Busemann Median on Hadamard Manifolds

Authors

Yangdi Jiang, Xiaotian Chang, Cyrus Mostajeran

Abstract

\We introduce the horospherical depth, an intrinsic notion of statistical depth on Hadamard manifolds, and define the Busemann median as the set of its maximizers. The construction exploits the fact that the linear functionals appearing in Tukey's half-space depth are themselves limits of renormalized distance functions; on a Hadamard manifold the same limiting procedure produces Busemann functions, whose sublevel sets are horoballs, the intrinsic replacements for halfspaces. The resulting depth is parametrized by the visual boundary, is isometry-equivariant, and requires neither tangent-space linearization nor a chosen base point.For arbitrary Hadamard manifolds, we prove that the depth regions are nested and geodesically convex, that a centerpoint of depth at least $1/(d+1)$ exists, and hence that the Busemann median exists for every Borel probability measure. Under strictly negative sectional curvature and mild regularity assumptions, the depth is strictly quasi-concave and the median is unique. We also establish robustness: the depth is stable under total-variation perturbations, and under contamination escaping to infinity the limiting median depends on the escape direction but not on how far the contaminating mass has moved along the geodesic ray, in contrast with the Fréchet mean. Finally, we establish uniform consistency of the sample depth and convergence of sample depth regions and sample Busemann medians; on symmetric spaces of noncompact type, the argument proceeds through a VC analysis of upper horospherical halfspaces, while on general Hadamard manifolds it follows from a compactness argument under a mild non-atomicity assumption.

Comments: 52 pages, 10 figures

Submitted: 2026-04-20 ArXiv ID: 2604.18242v1

▶

mlr3torch: A Deep Learning Framework in R based on mlr3 and torch

▶

Distributional Off-Policy Evaluation with Deep Quantile Process Regression

Authors

Qi Kuang, Chao Wang, Yuling Jiao, et al.

Abstract

This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a quantile-based approach for OPE using deep quantile process regression, presenting a novel algorithm called Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE). We provide new theoretical insights into the deep quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the rigorous sample complexity analysis for distributional OPE with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that DQPOPE achieves statistical advantages by estimating the full return distribution using the same sample size required to estimate a single policy value using conventional methods. Empirical studies further show that DQPOPE provides significantly more precise and robust policy value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.

Submitted: 2026-04-20 ArXiv ID: 2604.18143v1

▶

Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles

▶

Boltzmann Machine Learning with a Parallel, Persistent Markov chain Monte Carlo method for Estimating Evolutionary Fields and Couplings from a Protein Multiple Sequence Alignment

Authors

Sanzo Miyazawa

Abstract

The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction is often used to adjust the hyperparameters. However, it is not sensitive to these regularization parameters. Here, they are adjusted for the fields and couplings to satisfy a specific condition that is appropriate for protein conformations. This method has been applied to eight protein families.

▶

From Particles to Perils: SVGD-Based Hazardous Scenario Generation for Autonomous Driving Systems Testing

▶

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

▶

Collaborative Contextual Bayesian Optimization

Authors

Chih-Yu Chang, Qiyuan Chen, Tianhan Gao, et al.

We present a systematic empirical study of prompt engineering for formal mathematical reasoning in the context of the SAIR Equational Theories Stage 1 competition. The task requires deciding whether one equational law implies another over all magmas -- a problem that is undecidable in general but decidable for FALSE via finite model search. Over five weeks, we designed, tested, and analyzed more than 40 prompt variants, ranging from 0 to 4,878 bytes, across four evaluation splits and three language models (gpt-oss-120b, Llama 3.3 70B, Gemma 4 31B). Our central finding is a single-prompt ceiling: despite substantial engineering effort, balanced hard accuracy plateaus in an empirical saturation region of approximately 60--79% for gpt-oss-120b, compared to a 59.75% no-cheatsheet baseline. We identify three mechanisms underlying this ceiling: (1) the mathematical undecidability of the TRUE case limits what any finite prompt can encode; (2) complex rule systems decrease performance on weaker models (Llama 3.3 70B collapses to 0% TRUE recall with prompts exceeding 2KB); and (3) prompt ordering effects interact with model attention in fragile, non-monotonic ways. Our best submission (AN45c, 2,252 bytes) achieves 79.25% accuracy on hard3 (n=400; 95% CI: [75.0%, 82.9%]), with TRUE recall of 95.9% and FALSE recall of 63.4%, representing a +19.5 percentage-point improvement over the no-cheatsheet baseline (59.75%). We release all prompt variants, evaluation scripts, and results at https://github.com/israelcazares/sair-prompt-engineering

Comments: Companion repository: https://github.com/israelcazares/sair-prompt-engineering | Zenodo DOI: 10.5281/zenodo.19598433 | v15: final Contributor Network data (n=52, competition close April 20, 2026)

Authors

Gordon Ma, Xiufan Li

Abstract

Barren-plateau results have established exponential gradient suppression as a widely cited obstacle to the scalability of variational quantum algorithms. When and whether these results extend to a given objective has been addressed through loss-specific arguments, but a general structural characterization has remained open. We show that the objective itself admits a fixed-observable representation if and only if the loss is affine in the measured statistics, thereby identifying the exact boundary of the standard concentration-based proof template. Existing transfer results for non-affine losses achieve this reduction under additional assumptions; our characterization implies that such a reduction is not structurally available for a class of non-affine objectives, placing them outside the automatic reach of the existing proof template. Beyond the affine regime, a chain-rule decomposition reveals three governing factors -- model responsivity, loss-side signal, and transmittance -- and induces a loss-class dichotomy: bounded-gradient losses inherit suppression, while amplification-capable losses can in principle counteract it. In the exponentially wide setting, both classes fail, but for different structural reasons. When the interface is instead designed at polynomial width -- exposing coarse-grained statistics rather than individual bitstring probabilities -- the exponential-dimensional obstruction is relaxed and the dichotomy plays a genuine role. In a numerical demonstration on a charge-conserving quantum system, the amplification-capable objective produces resolved gradients several orders of magnitude larger than affine and inheriting baselines at comparable shot budgets. Over the tested interval, its scaling trend is statistically distinguished from the exponential trend of both alternatives. The boundary is affine; what lies beyond it is a representation-design problem.

Comments: 28 pages, 6 figures

Submitted: 2026-04-20 ArXiv ID: 2604.18846v1

▶

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

Authors

Chris Cameron, Wangzheng Wang, Nikita Ivanov, et al.

Abstract

Looped transformers scale computational depth without increasing parameter count by repeatedly applying a shared transformer block and can be used for iterative refinement, where each loop rewrites a full fixed-size prediction in parallel. On difficult problems, such as those that require search-like computation, reaching a highly structured solution starting from noise can require long refinement trajectories. Learning such trajectories is challenging when training specifies only the target solution and provides no supervision over the intermediate refinement path. Diffusion models tackle this issue by corrupting data with varying magnitudes of noise and training the model to reverse it in a \textit{single step}. However, this process misaligns training and testing behaviour. We introduce Denoising Recursion Models, a method that similarly corrupts data with noise but trains the model to reverse the corruption over \textit{multiple} recursive steps. This strategy provides a tractable curriculum of intermediate states, while better aligning training with testing and incentivizing non-greedy, forward-looking generation. Through extensive experiments, we show this approach outperforms the Tiny Recursion Model (TRM) on ARC-AGI, where it recently achieved breakthrough performance.

Submitted: 2026-04-20 ArXiv ID: 2604.18839v1

▶

Benchmarking Quantum Kernel Support Vector Machines Against Classical Baselines on Tabular Data: A Rigorous Empirical Study with Hardware Validation

Authors

Siavash Kakavand, Christoph Strohmeyer, Michael Schlotter

Abstract

Quantum kernel methods have been proposed as a promising approach for leveraging near-term quantum computers for supervised learning, yet rigorous benchmarks against strong classical baselines remain scarce. We present a comprehensive empirical study of quantum kernel support vector machines (QSVMs) across nine binary classification datasets, four quantum feature maps, three classical kernels, and multiple noise models, totalling 970 experiments with strict nested cross-validation. Our analysis spans four phases: (i) statistical significance testing, revealing that none of 29 pairwise quantum-classical comparisons reach significance at $α= 0.05$; (ii) learning curve analysis over six training fractions, showing steeper quantum slopes on six of eight datasets that nonetheless fail to close the gap to the best classical baseline; (iii) hardware validation on IBM ibm_fez (Heron r2), demonstrating kernel fidelity $r \geq 0.976$ across six experiments; and (iv) seed sensitivity analysis confirming reproducibility (mean CV 1.4%). A Kruskal-Wallis factorial analysis reveals that dataset choice dominates performance variance ($\varepsilon^2 = 0.73$), while kernel type accounts for only 9%. Spectral analysis offers a mechanistic explanation: current quantum feature maps produce eigenspectra that are either too flat or too concentrated, missing the intermediate profile of the best classical kernel, the radial basis function (RBF). Quantum kernel training (QKT) via kernel-target alignment yields the single competitive result -- balanced accuracy 0.968 on breast cancer -- but with ~2,000x computational overhead. Our findings provide actionable guidelines for quantum kernel research. The complete benchmark suite is publicly available to facilitate reproduction and extension.

Comments: Code and data: https://doi.org/10.5281/zenodo.19197916

Submitted: 2026-04-20 ArXiv ID: 2604.18837v1

▶

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

Authors

Sinan G. Aksoy, Alexandra A. Sabrio, Erik VonKaenel, et al.

Abstract

Abstract

Principal Component Analysis (PCA) is a fundamental tool for representation learning, but its global linear formulation fails to capture the structure of data supported on curved manifolds. In contrast, manifold learning methods model nonlinearity but often sacrifice the spectral structure and stability of PCA. We propose \emph{Geodesic Tangent Space Aggregation PCA (GTSA-PCA)}, a geometric extension of PCA that integrates curvature awareness and geodesic consistency within a unified spectral framework. Our approach replaces the global covariance operator with curvature-weighted local covariance operators defined over a $k$-nearest neighbor graph, yielding local tangent subspaces that adapt to the manifold while suppressing high-curvature distortions. We then introduce a geodesic alignment operator that combines intrinsic graph distances with subspace affinities to globally synchronize these local representations. The resulting operator admits a spectral decomposition whose leading components define a geometry-aware embedding. We further incorporate semi-supervised information to guide the alignment, improving discriminative structure with minimal supervision. Experiments on real datasets show consistent improvements over PCA, Kernel PCA, Supervised PCA and strong graph-based baselines such as UMAP, particularly in small sample size and high-curvature regimes. Our results position GTSA-PCA as a principled bridge between statistical and geometric approaches to dimensionality reduction.

Comments: 30 pages, 8 figures and 7 tables

Submitted: 2026-04-20 ArXiv ID: 2604.18816v1

▶

Rethinking Dataset Distillation: Hard Truths about Soft Labels

Authors

Priyam Dey, Aditya Sahdev, Sunny Bhati, et al.

Abstract

Despite the perceived success of large-scale dataset distillation (DD) methods, recent evidence finds that simple random image baselines perform on-par with state-of-theart DD methods like SRe2L due to the use of soft labels during downstream model training. This is in contrast with the findings in coreset literature, where high-quality coresets consistently outperform random subsets in the hardlabel (HL) setting. To understand this discrepancy, we perform a detailed scalability analysis to examine the role of data quality under different label regimes, ranging from abundant soft labels (termed as SL+KD regime) to fixed soft labels (SL) and hard labels (HL). Our analysis reveals that high-quality coresets fail to convincingly outperform the random baseline in both SL and SL+KD regimes. In the SL+KD setting, performance further approaches nearoptimal levels relative to the full dataset, regardless of subset size or quality, for a given compute budget. This performance saturation calls into question the widespread practice of using soft labels for model evaluation, where unlike the HL setting, subset quality has negligible influence. A subsequent systematic evaluation of five large-scale and four small-scale DD methods in the HL setting reveals that only RDED reliably outperforms random baselines on ImageNet-1K, but can still lag behind strong coreset methods due to its over-reliance on easy sample patches. Based on this, we introduce CAD-Prune, a compute-aware pruning metric that efficiently identifies samples of optimal difficulty for a given compute budget, and use it to develop CA2D, a compute-aligned DD method, outperforming current DD methods on ImageNet-1K at various IPC settings. Together, our findings uncover many insights into current DD research and establish useful tools to advance dataefficient learning for both coresets and DD.

Comments: CVPR 2026 (Oral). First two authors contributed equally

Submitted: 2026-04-20 ArXiv ID: 2604.18811v1

Vision-Language-Action (VLA) models fail systematically on long-horizon manipulation tasks despite strong short-horizon performance. We show that this failure is not resolved by extending context length alone in the current reactive execution setting; instead, it stems from three recurring execution-loop deficiencies: the memory gap, the verification gap, and the recovery gap. We present HELM, a model-agnostic framework that addresses these deficiencies with three components: an Episodic Memory Module (EMM) that retrieves key task history via CLIP-indexed keyframes, a learned State Verifier (SV) that predicts action failure before execution from observation, action, subgoal, and memory-conditioned context, and a Harness Controller (HC) that performs rollback and replanning. The SV is the core learning contribution: it consistently outperforms rule-based feasibility checks and ensemble uncertainty baselines, and its effectiveness depends critically on access to episodic memory. On LIBERO-LONG, HELM improves task success rate by 23.1 percentage points over OpenVLA (58.4% to 81.5%), while extending the context window to H=32 yields only a 5.4-point gain and same-budget LoRA adaptation remains 12.2 points below HELM. HELM also improves long-horizon performance on CALVIN and substantially boosts recovery success under controlled perturbations. Ablations and mechanism analyses isolate the contribution of each component, and we release LIBERO-Recovery as a perturbation-injection protocol for evaluating failure recovery in long-horizon manipulation.

Comments: 9 pages, 2 figures

Submitted: 2026-04-20 ArXiv ID: 2604.18791v1

▶

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

Authors

Jiacheng Liang, Yao Ma, Tharindu Kumarage, et al.

Abstract

Abstract

Fault detection and diagnosis are critical for the optimal and safe operation of industrial processes. The correlations among sensors often display non-Euclidean structures where graph neural networks (GNNs) are widely used therein. However, for large-scale systems, local, global, and dynamic relations extensively exist among sensors, and traditional GNNs often overlook such complex and multi-level structures for various problems including the fault diagnosis. To address this issue, we propose a structure-aware multi-level temporal graph network with local-global feature fusion for industrial fault diagnosis. First, a correlation graph is dynamically constructed using Pearson correlation coefficients to capture relationships among process variables. Then, temporal features are extracted through long short-term memory (LSTM)-based encoder, whereas the spatial dependencies among sensors are learned by graph convolution layers. A multi-level pooling mechanism is used to gradually coarsen and learn meaningful graph structures, to capture higher-level patterns while keeping important fault related details. Finally, a fusion step is applied to combine both detailed local features and overall global patterns before the final prediction. Experimental evaluations on the Tennessee Eastman process (TEP) demonstrate that the proposed model achieves superior fault diagnosis performance, particularly for complex fault scenarios, outperforming various baseline methods.

Submitted: 2026-04-20 ArXiv ID: 2604.18765v1

▶

Towards Understanding the Robustness of Sparse Autoencoders

▶

Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

Authors

Andrew Wang, Ellie Pavlick, Ritambhara Singh

Abstract

An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as an autoregressive sequence modeling task, utilizing causal decoders from large language models (LLMs) to model a patient's multimodal trajectory. We first introduce a missingness-aware contrastive pre-training objective that integrates multiple modalities in datasets with missingness in a shared latent space. We then show that autoregressive sequence modeling with transformer-based architectures outperforms baselines on the MIMIC-IV and eICU fine-tuning benchmarks. Finally, we use interpretability techniques to move beyond performance boosts and find that across various patient stays, removing modalities leads to divergent behavior that our contrastive pre-training mitigates. By abstracting clinical diagnosis as sequence modeling and interpreting patient stay trajectories, we develop a framework to profile and handle missing modalities while addressing the canonical desideratum of safe, transparent clinical AI.

Submitted: 2026-04-20 ArXiv ID: 2604.18753v1

▶

Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models

Jonas Sander, Anja Rabich, Nick Mahling, et al.

Abstract

Today, machine learning is widely applied in sensitive, security-related, and financially lucrative applications. Model extraction attacks undermine current business models where a model owner sells model access, e.g., via MLaaS APIs. Additionally, stolen models can enable powerful white-box attacks, facilitating privacy attacks on sensitive training data, and model evasion. In this paper, we focus on Decision Trees (DT), which are widely deployed in practice. Existing black-box extraction attacks for DTs are either query-intensive, make strong assumptions about the DT structure, or rely on rich API information. To limit attacks to the black-box setting, CPU vendors introduced Trusted Execution Environments (TEE) that use hardware-mechanisms to isolate workloads from external parties, e.g., MLaaS providers. We introduce TrEEStealer, a high-fidelity extraction attack for stealing TEE-protected DTs. TrEEStealer exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. We use two primitives to acquire CFI: for AMD SEV, we follow previous work using the SEV-Step framework and performance counters. For Intel SGX, we reproduce prior findings on current Xeon 6 CPUs and construct a new primitive to efficiently extract the branch history of inference runs through the Branch-History-Register. We found corresponding vulnerabilities in three popular libraries: OpenCV, mlpack, and emlearn. We show that TrEEStealer achieves superior efficiency and extraction fidelity compared to prior attacks. Our work establishes a new state-of-the-art for DT extraction and confirms that TEEs fail to protect against control-flow leakage.

Submitted: 2026-04-20 ArXiv ID: 2604.18716v1

▶

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Authors

Vin Bhaskara, Haicheng Wang

Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of supervision. We conduct a systematic empirical study across diverse model families and reasoning domains under three weak supervision settings: scarce data, noisy rewards, and self-supervised proxy rewards. We find that generalization is governed by training reward saturation dynamics: models that generalize exhibit a prolonged pre-saturation phase during which training reward and downstream performance climb together, while models that saturate rapidly memorize rather than learn. We identify reasoning faithfulness, defined as the extent to which intermediate steps logically support the final answer, as the pre-RL property that predicts which regime a model falls into, while output diversity alone is uninformative. Motivated by these findings, we disentangle the contributions of continual pre-training and supervised fine-tuning, finding that SFT on explicit reasoning traces is necessary for generalization under weak supervision, while continual pre-training on domain data amplifies the effect. Applied together to Llama3.2-3B-Base, these interventions enable generalization across all three settings where the base model previously failed.

Submitted: 2026-04-20 ArXiv ID: 2604.18574v1

▶

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

Authors

A. Sophia Koepke, Daniil Zverev, Shiry Ginosar, et al.

Abstract

Abstract

Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit, detect abrupt directional reversals (phase shifts) via a cosine-similarity $+$ entropy dual gate, and respond by rolling back the KV-cache and injecting a pre-computed steering vector. No fine-tuning, gradient computation, or additional forward passes are required. LPSR achieves $\mathbf{44.0\%}$ on MATH-500 with an 8B model versus $28.8\%$ for standard AR ($+15.2$ pp; McNemar $χ^2 = 66.96$, $p < 10^{-15}$). Critically, prompted self-correction, the most natural inference-time baseline, scores only $19.8\%$, below standard AR; LPSR exceeds it by $+24.2$ pp ($χ^2 = 89.4$, $p \approx 0$). LPSR also outperforms Best-of-16 ($+7.8$ pp) at $5.4\times$ lower token cost, and surpasses a standard 70B model ($35.2\%$) with $8.75\times$ fewer parameters at ${\sim}3\times$ the token budget. A 32-layer sweep reveals a novel \textbf{detection-correction dissociation}: error-detection AUC peaks at layer~14 ($0.718$) but task accuracy peaks at layer~16 ($44.0\%$ vs.\ $29.2\%$), demonstrating that optimal monitoring depth differs for detection and correction.

Comments: Under Review

Submitted: 2026-04-20 ArXiv ID: 2604.18567v1

▶

Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion

Authors

Terry Leitch

Abstract

We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics AI assistance: the \textbf{CLD Leaderboard} (53 tests, structured causal loop diagram extraction) and the \textbf{Discussion Leaderboard} (interactive model discussion, feedback explanation, and model building coaching). On CLD extraction, cloud models achieve 77--89\% overall pass rates; the best local model reaches 77\% (Kimi~K2.5~GGUF~Q3, zero-shot engine), matching mid-tier cloud performance. On Discussion, the best local models achieve 50--100\% on model building steps and 47--75\% on feedback explanation, but only 0--50\% on error fixing -- a category dominated by long-context prompts that expose memory limits in local deployments. A central contribution of this paper is a systematic analysis of \textit{model type effects} on performance: we compare reasoning vs.\ instruction-tuned architectures, GGUF (llama.cpp) vs.\ MLX (mlx\_lm) backends, and quantization levels (Q3 / Q4\_K\_M / MLX-3bit / MLX-4bit / MLX-6bit) across the same underlying model families. We find that backend choice has larger practical impact than quantization level: mlx\_lm does not enforce JSON schema constraints, requiring explicit prompt-level JSON instructions, while llama.cpp grammar-constrained sampling handles JSON reliably but causes indefinite generation on long-context prompts for dense models. We document the full parameter sweep ($t$, $p$, $k$) for all local models, cleaned timing data (stuck requests excluded), and a practitioner guide for running 671B--123B parameter models on Apple~Silicon.

Submitted: 2026-04-20 ArXiv ID: 2604.18566v2

Authors

Renato Campanini

Abstract

Comments: 18 pages,8 figures

Submitted: 2026-04-19 ArXiv ID: 2604.17657v1

▶

Crystallography, Lorentz violation, and the Standard-Model Extension

▶

Isospin Decomposition of Vector and Axial Two-Body Currents via Polarized Electron--Deuteron and Electron--$^3$He Scattering at the Electron-Ion Collider

Yubing Wang, Quan-feng Wu, Xun-Jie Xu

Abstract

We present an effective numerical method that can be used to straightforwardly calculate the full spectrum of primordial gravitational waves produced during inflation and reheating. Our method is based on the Bogoliubov approach with several key improvements to overcome its shortcomings such as numerical instabilities at high frequencies and issues with tachyonic modes. We also present a few useful analytical examples from which one can gain crucial insights into the numerical instabilities. The improved method allows us to demonstrate that anharmonicity of inflaton oscillations can leave interesting fingerprints on the high-frequency part of the GW spectrum. Our numerical code is publicly available on [GitHub](We present an effective numerical method that can be used to straightforwardly calculate the full spectrum of primordial gravitational waves produced during inflation and reheating. Our method is based on the Bogoliubov approach with several key improvements to overcome its shortcomings such as numerical instabilities at high frequencies and issues with tachyonic modes. We also present a few useful analytical examples from which one can gain crucial insights into the numerical instabilities. The improved method allows us to demonstrate that anharmonicity of inflaton oscillations can leave interesting fingerprints on the high-frequency part of the GW spectrum. Our numerical code is publicly available on GitHub https://github.com/xunjiexu/Unified-Bogoliubov.git.

Comments: 28 pages, 7 figures, code available at https://github.com/xunjiexu/Unified-Bogoliubov.git

Submitted: 2026-04-19 ArXiv ID: 2604.17478v1

▶

Testing $α$-attractor P-model of inflation by Cosmic Microwave Background radiation

Authors

Zhuo Ouyang, Jixian Liu, Enrique Mallada

Abstract

Inductive bias refers to restrictions on the hypothesis class that enable a learning method to generalize effectively from limited data. A canonical example in control is linearity, which underpins low sample-complexity guarantees for stabilization and optimal control. For general nonlinear dynamics, by contrast, guarantees often rely on smoothness assumptions (e.g., Lipschitz continuity) which, when combined with covering arguments, can lead to data requirements that grow exponentially with the ambient dimension. In this paper we argue that data-efficient nonlinear control demands exploiting inductive bias embedded in nature itself, namely, structure imposed by physical laws. Focusing on Hamiltonian systems, we leverage symplectic geometry and intrinsic recurrence on energy level sets to solve target reachability problems. Our approach combines the recurrence property with a recently proposed class of policies, called chain policies, which composes locally certified trajectory segments extracted from demonstrations to achieve target reachability. We provide sufficient conditions for reachability under this construction and show that the resulting data requirements depend on explicit geometric and recurrence properties of the Hamiltonian rather than the state dimension.

Submitted: 2026-04-19 ArXiv ID: 2604.17213v1

▶

Forecast Sports Outcomes under Efficient Market Hypothesis: Theoretical and Experimental Analysis of Odds-Only and Generalised Linear Models

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, et al.

Abstract

Constitution-conditioned post-training can be analysed as a structured perturbation of a model's learned representational geometry. We introduce ATLAS, a geometry-first program that traces constitution-induced hidden-state structure across charts, models, and substrates. Instead of treating the relevant unit as a single behaviour, neuron, vector, or patch, ATLAS tests a local chart whose tangent structure, occupancy distribution, and behavioural coupling can be measured under system change. On Gemma, the anchored source-local chart captures 310 / 320 reviewed source rows and all 84 / 84 reviewed score-flip rows, but compact exact-patch sufficiency does not close, so the exportable unit is the broader source-defined family. Freezing that family, we re-identify a target-local realisation in an unadapted Phi model, where the fully adjudicated confirmatory contrast separates with AUC 0.984 and mean gap 5.50. In held-out ALM8 mouse frontal-cortex perturbation data, the same source-defined family receives support across 5/5 folds, with mean held-out AUC 0.72 and mean fold gap 4.50. A multiple-choice analysis provides the main boundary: nearby target-local signals can appear without source-faithful closure. The resulting correspondence is not coordinate identity, site identity, or a target-side mediation theorem. It is geometric recurrence under redistribution: written constitutions can induce recoverable latent geometry whose organisation remains detectable across model and substrate changes while its local coordinates, occupancy, and behavioural expression shift.

Comments: 49 pages, 7 figures

Submitted: 2026-04-19 ArXiv ID: 2604.17663v1

▶

Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation

Authors

Vaibhavi Lokegaonkar, Aryan Vijay Bhosale, Vishnu Raj, et al.

LLM-based agents are assumed to integrate environmental observations into their reasoning: discovering highly relevant but unexpected information should naturally lead to a model exploiting its own discoveries. We show that this assumption is false for current LLM-based agents, which struggle to reflect or react to unexpected information. Across three benchmarks (Terminal-Bench, SWE-Bench, AppWorld), we inject complete task solutions into the agent environments to deliberately expose a task's solution to a model. While agents discover these solutions on Terminal-Bench in 79-81% of runs, they interact, or exploit, them in only 37-50% of cases. This gap is starkest in AppWorld: agents see documentation stating that a command "returns the complete solution to this task" in over 90% of attempts but exploit this in fewer than 7% of trials. We show that agents lack what we call environmental curiosity: the capability to recognize and investigate unexpected but relevant observations in response to environmental stimuli. We identify three main factors influencing environmental curiosity: available tools in the agent scaffold, test-time compute, and training data distribution. Our findings identify configurations that maximize curiosity also achieve the best performance on the unmodified benchmarks. Yet even jointly optimized agents still ignore discovered solutions in the majority of trials: current agents use the environment to fetch expected information, but not to revise their strategy or maximally exploit useful stimuli.

Submitted: 2026-04-19 ArXiv ID: 2604.17609v1

▶

DGSSM: Diffusion guided state-space models for multimodal salient object detection

Authors

Suklav Ghosh, Arijit Sur, Pinaki Mitra

Abstract

Salient object detection (SOD) requires modeling both long-range contextual dependencies and fine-grained structural details, which remains challenging for convolutional, transformer-based, and Mamba-based state space models. While recent Mamba-based state space approaches enable efficient global reasoning, they often struggle to recover precise object boundaries. In contrast, diffusion models capture strong structural priors through iterative denoising, but their use in discriminative dense prediction is still limited due to computational cost and integration challenges. In this work, we propose DGSSM, a diffusion-guided state space (Mamba) framework that formulates multimodal salient object detection as a progressive denoising process. The framework integrates diffusion structural priors with multi-scale state space encoding, adaptive saliency prompting, and an iterative Mamba diffusion refinement mechanism to improve boundary accuracy. A boundary-aware refinement head and self-distillation strategy further enhance spatial coherence and feature consistency. Extensive experiments on 13 public benchmarks across RGB, RGB-D, and RGB-T settings demonstrate that DGSSM consistently outperforms state-of-the-art methods across multiple evaluation metrics while maintaining a compact model size. These results suggest that diffusion-guided state space modeling is an effective and generalizable paradigm for multimodal dense prediction tasks.

Comments: Accepted at ICPR 2026. Diffusion-guided Mamba framework for multimodal salient object detection. Evaluated on 13 benchmarks (RGB, RGB-D, RGB-T)

Submitted: 2026-04-19 ArXiv ID: 2604.17585v1

▶

How Much Data is Enough? The Zeta Law of Discoverability in Biomedical Data, featuring the enigmatic Riemann zeta function

Authors

Paul M. Thompson

Abstract

How much data is enough to make a scientific discovery? As biomedical datasets scale to millions of samples and AI models grow in capacity, progress increasingly depends on predicting when additional data will substantially improve performance. In practice, model development often relies on empirical scaling curves measured across architectures, modalities, and dataset sizes, with limited theoretical guidance on when performance should improve, saturate, or exhibit cross-over behavior. We propose a scaling-law framework for cross-modal discoverability based on spectral structure of data covariance operators, task-aligned signal projections, and learned representations. Many performance metrics, including AUC, can be expressed in terms of cumulative signal-to-noise energy accumulated across identifiable spectral modes of an encoder and cross-modal operator. Under mild assumptions, this accumulation follows a zeta-like scaling law governed by power-law decay of covariance spectra and aligned signal energy, leading naturally to the appearance of the Riemann zeta function. Representation learning methods such as sparse models, low-rank embeddings, and multimodal contrastive objectives improve sample efficiency by concentrating useful signal into earlier stable modes, effectively steepening spectral decay and shifting scaling curves. The framework predicts cross-over regimes in which simpler models perform best at small sample sizes, while higher-capacity or multimodal encoders outperform them once sufficient data stabilizes additional degrees of freedom. Applications include multimodal disease classification, imaging genetics, functional MRI, and topological data analysis. The resulting zeta law provides a principled way to anticipate when scaling data, improving representations, or adding modalities is most likely to accelerate discovery.

Comments: 25 pages, 5 figures

Submitted: 2026-04-19 ArXiv ID: 2604.17581v1

▶

Recovery Guarantees for Continual Learning of Dependent Tasks: Memory, Data-Dependent Regularization, and Data-Dependent Weights

▶

Diverse Dictionary Learning

Authors

Yujia Zheng, Zijian Li, Shunxing Fan, et al.

Abstract

Comments: ICLR 2026

Submitted: 2026-04-19 ArXiv ID: 2604.17568v1

▶

Target Parameterization in Diffusion Models for Nonlinear Spatiotemporal System Identification

Authors

Achraf El Messaoudi, Noureddine Khaous, Karim Cherifi

Abstract

Machine learning is becoming increasingly important for nonlinear system identification, including dynamical systems with spatially distributed outputs. However, classical identification and forecasting approaches become markedly less reliable in turbulent-flow regimes, where the dynamics are high-dimensional, strongly nonlinear, and highly sensitive to compounding rollout errors. Diffusion-based models have recently shown improved robustness in this setting and offer probabilistic inference capabilities, but many current implementations inherit target parameterizations from image generation, most commonly noise or velocity prediction. In this work, we revisit this design choice in the context of nonlinear spatiotemporal system identification. We consider a simple, self-contained patch-based transformer that operates directly on physical fields and use turbulent flow simulation as a representative testbed. Our results show that clean-state prediction consistently improves rollout stability and reduces long-horizon error relative to velocity- and noise-based objectives, with the advantage becoming more pronounced as the per-token dimensionality increases. These findings identify target parameterization as a key modeling choice in diffusion-based identification of nonlinear systems with spatial outputs in turbulent regimes.

Submitted: 2026-04-19 ArXiv ID: 2604.17566v1

▶

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

▶

Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells

Authors

Mattie Ji, Indradyumna Roy, Vikas Garg

Abstract

Comments: 31 pages, 6 figures, 4 algorithms, 2 tables. Accepted at ICLR 2026

Submitted: 2026-04-19 ArXiv ID: 2604.17548v1

▶

ONTO: A Token-Efficient Columnar Notation for LLM Input Optimization

Authors

Harshavardhanan Deekeswar

Abstract

Serialization formats designed for document interchange impose structural overhead that becomes prohibitive when large language models consume operational data at scale. A modest dataset of 1,000 IoT sensor readings serialized as JSON requires approximately 80,000 tokens - the majority spent on repeated field names, nested braces, and structural punctuation rather than semantic content. We present ONTO (Object Notation for Token Optimization), a columnar notation that declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks on Qwen2.5-7B show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided. Ablation analysis reveals that key repetition accounts for the majority of JSON overhead, with indentation costs in nested structures explaining the 4-percentage-point gap between flat and hierarchical data. ONTO occupies a previously unfilled position in the serialization landscape: columnar efficiency with hierarchical structure, optimized for LLM context windows rather than document interchange. Code and specification are available at https://github.com/harsh-aranga/onto.

Comments: 8 pages, 5 tables, 1 figure. Code, benchmarks, and specification at https://github.com/harsh-aranga/onto

Machine Learning Hamiltonian Dynamical Systems with Sparse and Noisy Data

Authors

Vedanta Thapar, Abhinav Gupta

Abstract

Machine learning has become a powerful tool for discovering governing laws of dynamical systems from data. However, most existing approaches degrade severely when observations are sparse, noisy, or irregularly sampled. In this work, we address the problem of learning symbolic representations of nonlinear Hamiltonian dynamical systems under extreme data scarcity by explicitly incorporating physical structure into the learning architecture. We introduce Adaptable Symplectic Recurrent Neural Networks (ASRNNs), a parameter-cognizant, structure-preserving model that combines Hamiltonian learning with symplectic recurrent integration, avoiding time derivative estimation, and enabling stable learning under noise. We demonstrate that ASRNNs can accurately predict long-term dynamics even when each training trajectory consists of only two irregularly spaced time points, possibly corrupted by correlated noise. Leveraging ASRNNs as structure-preserving data generators, we further enable symbolic discovery using independent regression methods (SINDy and PySR), recovering exact symbolic equations for polynomial systems and consistent polynomial approximations for non-polynomial Hamiltonians. Our results show that such architectures can provide a robust pathway to interpretable discovery of Hamiltonian dynamics from sparse and noisy data.

Submitted: 2026-04-19 ArXiv ID: 2604.17470v1

▶

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

▶

Neural Adjoint Method for Meta-optics: Accelerating Volumetric Inverse Design via Fourier Neural Operators

Authors

Chanik Kang, Hyewon Suk, Haejun Chung

Abstract

Meta-optics promises compact, high-performance imaging and color routing. However, designing high-performance structures is a high-dimensional optimization problem: mapping a desired optical output back to a physical 3D structure requires solving computationally expensive Maxwell's equations iteratively. Even with adjoint optimization, broadband design can require thousands of Maxwell solves, making industrial-scale optimization slow and costly. To overcome this challenge, we propose the Neural Adjoint Method, a solver-supervised surrogate that predicts 3D adjoint gradient fields from a voxelized permittivity volume using a Fourier Neural Operator (FNO). By learning the dense, per-voxel sensitivity field that drives gradient-based updates, our method can replace per-iteration adjoint solves with fast predictions, greatly reducing the computational cost of full-wave simulations required during iterative refinement. To better preserve sensitivity peaks, we introduce a stage-wise FNO that progressively refines residual errors with increasing emphasis on higher-frequency components. We curate a meta-optics dataset from paired forward/adjoint FDTD simulations and evaluate it across three tasks: spectral sorting (color routers), achromatic focusing (metalenses), and waveguide mode conversion. Our method reduces design time from hours to seconds. These results suggest a practical route toward fast, large-scale volumetric meta-optical design enabled by AI-accelerated scientific computing.

Comments: 10 pages, 6 figures, 3 tables

Submitted: 2026-04-19 ArXiv ID: 2604.17425v1

▶

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

▶

TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering

Authors

Keyang Chen, Mingxuan Jiang, Yongsheng Zhao, et al.

Abstract

Money laundering poses severe risks to global financial systems, driving the widespread adoption of machine learning for transaction monitoring. However, progress remains stifled by the lack of realistic benchmarks. Existing transaction-graph datasets suffer from two pervasive limitations: (i) they provide sparse node-level semantics beyond anonymized identifiers, and (ii) they rely on template-driven anomaly injection, which biases benchmarks toward static structural motifs and yields overly optimistic assessments of model robustness. We propose TransXion, a benchmark ecosystem for Anti-Money Laundering (AML) research that integrates profile-aware simulation of normal activity with stochastic, non-template synthesis of illicit subgraphs.TransXion jointly models persistent entity profiles and conditional transaction behavior, enabling evaluation of "out-of-character" anomalies where observed activity contradicts an entity's socio-economic context. The resulting dataset comprises approximately 3 million transactions among 50,000 entities, each endowed with rich demographic and behavioral attributes. Empirical analyses show that TransXion reproduces key structural properties of payment networks, including heavy-tailed activity distributions and localized subgraph structure. Across a diverse array of detection models spanning multiple algorithmic paradigms, TransXion yields substantially lower detection performance than widely used benchmarks, demonstrating increased difficulty and realism. TransXion provides a more faithful testbed for developing context-aware and robust AML detection methods. The dataset and code are publicly available at https://github.com/chaos-max/TransXion.

Submitted: 2026-04-19 ArXiv ID: 2604.17420v1

▶

ARMove: Learning to Predict Human Mobility through Agentic Reasoning

Authors

Chuyue Wang, Jie Feng, Yuxi Wu, et al.

Abstract

Human mobility prediction is a critical task but remains challenging due to its complexity and variability across populations and regions. Recently, large language models (LLMs) have made progress in zero-shot prediction, but existing methods suffer from limited interpretability (due to black-box reasoning), lack of iterative learning from new data, and poor transferability. In this paper, we introduce \textbf{ARMove}, a fully transferable framework for predicting human mobility through agentic reasoning. To address these limitations, ARMove employs standardized feature management with iterative optimization and user-specific customization: four major feature pools for foundational knowledge, user profiles for segmentation, and an automated generation mechanism integrating LLM knowledge. Robust generalization is achieved via agentic decision-making that adjusts feature weights to maximize accuracy while providing interpretable decision paths. Finally, large-small model synergy distills strategies from large LLMs (e.g., 72B) to smaller ones (e.g., 7B), reducing costs and enhancing performance ceilings. Extensive experiments on four global datasets show ARMove outperforms state-of-the-art baselines on 6 out of 12 metrics (gains of 0.78\% to 10.47\%), with transferability tests confirming robustness across regions, users, and scales. The other 4 items also achieved suboptimal results. Transferability tests confirm its 19 robustness across regions, user groups, and model scales, while interpretability 20 analysis highlights its transparency in decision-making. Our codes are available at: https://anonymous.4open.science/r/ARMove-F847.

Submitted: 2026-04-19 ArXiv ID: 2604.17419v1

▶

Reward Score Matching: Unifying Reward-based Fine-tuning for Flow and Diffusion Models

▶

On the Generalization Bounds of Symbolic Regression with Genetic Programming

Authors

Masahiro Nomura, Ryoki Hamano, Isao Ono

Abstract

Symbolic regression (SR) with genetic programming (GP) aims to discover interpretable mathematical expressions directly from data. Despite its strong empirical success, the theoretical understanding of why GP-based SR generalizes beyond the training data remains limited. In this work, we provide a learning-theoretic analysis of SR models represented as expression trees. We derive a generalization bound for GP-style SR under constraints on tree size, depth, and learnable constants. Our result decomposes the generalization gap into two interpretable components: a structure-selection term, reflecting the combinatorial complexity of choosing an expression-tree structure, and a constant-fitting term, capturing the complexity of optimizing numerical constants within a fixed structure. This decomposition provides a theoretical perspective on several widely used practices in GP, including parsimony pressure, depth limits, numerically stable operators, and interval arithmetic. In particular, our analysis shows how structural restrictions reduce hypothesis-class growth while stability mechanisms control the sensitivity of predictions to parameter perturbations. By linking these practical design choices to explicit complexity terms in the generalization bound, our work offers a principled explanation for commonly observed empirical behaviors in GP-based SR and contributes towards a more rigorous understanding of its generalization properties.

Submitted: 2026-04-19 ArXiv ID: 2604.17402v1

▶

RISC-V Functional Safety for Autonomous Automotive Systems: An Analytical Framework and Research Roadmap for ML-Assisted Certification

Authors

Nick Andreasyan, Mikhail Struve, Alexey Popov, et al.

Abstract

RISC-V is emerging as a viable platform for automotive-grade embedded computing, with recent ISO 26262 ASIL-D certifications demonstrating readiness for safety-critical deployment in autonomous driving systems. However, functional safety in automotive systems is fundamentally a certification problem rather than a processor problem. The dominant costs arise from diagnostic coverage analysis, toolchain qualification, fault injection campaigns, safety-case generation, and compliance with ISO 26262, ISO 21448 (SOTIF), and ISO/SAE 21434. This paper analyzes the role of RISC-V in automotive functional safety, focusing on ISA openness, formal verifiability, custom extension control, debug transparency, and vendor-independent qualification. We examine autonomous driving safety requirements and map them to RISC-V architectural challenges such as lockstep execution, safety islands, mixed-criticality isolation, and secure debug. Rather than proposing a single algorithmic breakthrough, we present an analytical framework and research roadmap centered on certification economics as the primary optimization objective. We also discuss how selected ML methods, including LLM-assisted FMEDA generation, knowledge-graph-based safety case automation, reinforcement learning for fault injection, and graph neural networks for diagnostic coverage, can support certification workflows. We argue that the strongest outcome is not a faster core, but an ASIL-D-ready certifiable RISC-V platform.

Comments: 11 pages, 3 figures, 4 tables. Analytical perspective paper on automotive-grade RISC-V functional safety, certification economics, and ML-assisted certification for autonomous driving systems

This paper investigates communication-efficient neural network transmission by exploiting structured symmetry constraints in convolutional kernels. Instead of transmitting all model parameters, we propose a degrees-of-freedom (DoF) based codec that sends only the unique coefficients implied by a chosen symmetry group, enabling deterministic reconstruction of the full weight tensor at the receiver. The proposed framework is evaluated under quantization and noisy channel conditions across multiple symmetry patterns, signal-to-noise ratios, and bit-widths. To improve robustness against transmission impairments, a projection step is further applied at the receiver to enforce consistency with the symmetry-invariant subspace, effectively denoising corrupted parameters. Experimental results on MNIST and CIFAR-10 using a DeepCNN architecture demonstrate that DoF-based transmission achieves substantial bandwidth reduction while preserving significantly higher accuracy than pruning-based baselines, which often suffer catastrophic degradation. Among the tested symmetries, \textit{central-skew symmetry} consistently provides the best accuracy-compression tradeoff, confirming that structured redundancy can be leveraged for reliable and efficient neural model delivery over constrained links.

Submitted: 2026-04-19 ArXiv ID: 2604.17371v1

▶

SPaRSe-TIME: Saliency-Projected Low-Rank Temporal Modeling for Efficient and Interpretable Time Series Prediction

Authors

K. A. Shahriar

Abstract

Time series forecasting is traditionally dominated by sequence-based architectures such as recurrent neural networks and attention mechanisms, which process all time steps uniformly and often incur substantial computational cost. However, real-world temporal signals typically exhibit heterogeneous structure, where informative patterns are sparsely distributed and interspersed with redundant observations. This work introduces \textbf{SPaRSe-TIME}, a structured and computationally efficient framework that models time series through a decomposition into three complementary components: saliency, memory, and trend. The proposed approach reformulates temporal modeling as a projection onto informative subspaces, where saliency acts as a data-dependent sparsification operator, memory captures dominant low-rank temporal patterns, and trend encodes low-frequency dynamics. These components are integrated through a lightweight, adaptive mapping that enables simplified, selective, and interpretable temporal reasoning. Extensive experiments on diverse real-world datasets demonstrate that SPaRSe-TIME achieves competitive predictive performance compared to recurrent and attention-based architectures, while significantly reducing computational complexity. The model is particularly effective in structured time series with clear temporal components and provides explicit interpretability through component-wise contributions. Furthermore, analysis reveals both the strengths and limitations of decomposition-based modeling, highlighting challenges in highly stochastic and complex multivariate settings. Overall, SPaRSe-TIME offers a principled alternative to monolithic sequence models, bridging efficiency, interpretability, and performance, and providing a scalable framework for time series learning.

Comments: N.A

Authors

Sajjad Ghiasvand, Mark Beliaev, Mahnoosh Alizadeh, et al.

Abstract

Deeper analysis of Fermi-LAT unassociated 4FGL J2112.5-3043 for possible identification

Authors

Federica Giacchino, Cristina Fernández-Suárez, Miguel Á Sánchez-Conde, et al.

Abstract

In the 4FGL-DR4 point-source catalog of the Large Area Telescope (LAT) onboard NASA's Fermi Gamma-ray Observatory (Fermi-LAT), around a third of the sources are still unidentified (unIDs). In this work, we perform a detailed study of one of them, namely 4FGL J2112.5-3043. Only gamma-ray emission has been detected from this unidentified source, with no counterpart observed at any other wavelength as of today. Together with its high detection significance, this makes 4FGL J2112.5-3043 a particularly compelling target for further investigation. The results of our spectral and spatial analyses show that the source photon spectrum is better described with a subexponential cutoff power-law spectral model, with no significant flux variability over time, and a morphology consistent with being a point-like source. We investigate and discuss the characterized emission within the context of both conventional and exotic astrophysics, namely a pulsar origin or potential dark matter (DM) annihilations in a nearby Galactic subhalo. Although our results are inconclusive and neither confirm a DM origin nor firmly establish an astrophysical nature, we find a spectral preference for the $b\bar{b}$ and $c\bar{c}$ DM annihilation channels over a pulsar origin, thus making this unID a particularly intriguing candidate for next multiwavelength observations.

Comments: 11 pages, 7 figures

Submitted: 2026-04-16 ArXiv ID: 2604.14794v1

▶

Multiboson and VBS measurements in ATLAS and CMS

▶

Dalitz decay of $K^*(892) \rightarrow K \ell^+\ell^-$: A New Probe for Hadronic Structure and Dark Photon Searches

▶

Charmed baryon decays at BESIII

▶

High Energy Physics - Phenomenology

▶

Neutrino self-interactions in post-reionization era: Lyman-$α$, 21-cm and cross-spectra

Authors

Sourav Pal, Supratik Pal

Abstract

Neutrino self-interactions delay the onset of free-streaming in the early universe, leaving distinct, scale-dependent signatures on the matter power spectrum. We investigate these signatures in post-reionization 21-cm intensity mapping and the Lyman-$α$ (Ly$α$) forest at redshifts $z \sim 2$--$3.5$, and forecast the constraints achievable with upcoming surveys using Fisher matrix analysis. Modeling neutrino self-interactions through an effective four-fermion parameterization with coupling $G_{\rm eff}$, we compute modifications to the Ly$α$ and 21-cm auto- and cross-power spectra for both strongly interacting (SI$_ν$, $\log_{10}G_{\mathrm{eff}} = -1.77$) and moderately interacting (MI$_ν$, $\log_{10}G_{\mathrm{eff}} = -5$) scenarios. We then combine these with forecasts for a representative next-generation cosmic microwave background (CMB) mission to evaluate the capabilities of SKA1-Mid and PUMA. We find that the Ly$α$--21-cm cross-correlation provides a systematics-resilient probe of the interaction signal, and decisively breaks the degeneracy between the primordial scalar power spectrum amplitude ($A_s$) and $G_{\rm eff}$ that limits CMB only analysis, particularly for the SI$_ν$ mode. Furthermore, the CMB+PUMA combination emerges as the optimal survey configuration for both regimes, reaching 1$σ$ constraints of $\mathcal{O}(10^{-3})$ on $σ(\log_{10}G_{\rm eff})$ for the SI$_ν$ mode and $\mathcal{O}(10^{-2})$ for the MI$_ν$ mode. Compared to the CMB-only baseline, this represents an improvement of approximately one order of magnitude for the SI$_ν$ mode, and nearly two orders of magnitude for the MI$_ν$ mode. We show that this conclusion holds uniformly over the full range of coupling strengths from $\log_{10}G_{\rm eff} = -6$ to $-1.77$.

Comments: 40 pages, 15 figures, 3 tables. Comments are welcome

Submitted: 2026-04-16 ArXiv ID: 2604.15287v1

▶

Charmonium radiative transitions to dileptons from lattice QCD: The case of $h_c \to η_c \ell^+\ell^-$ and $χ_{c1} \to J/ψ\,\ell^+\ell^-$

Authors

D. Bečirević, R. Di Palma, R. Frezzotti, et al.

Abstract

We present a lattice QCD study of dilepton production in charmonium transitions, specifically focusing on the $1^{+-} \to 0^{-+}$ and $1^{++} \to 1^{--}$ processes: $h_c \to η_c \ell^+ \ell^-$ and $χ_{c1} \to J/ψ\ell^+ \ell^-$, where $\ell = e, μ$. The relevant hadronic matrix elements are computed using gauge field configurations generated by the Extended Twisted Mass Collaboration with $N_f = 2+1+1$ dynamical Wilson--Clover twisted-mass fermions at four lattice spacings. Simulations are performed at physical dynamical $u$, $d$, $s$, and $c$ quark masses, except for the coarsest lattice, where the lightest sea quark mass corresponds to a slightly heavier pion mass. A controlled continuum extrapolation is carried out. In the continuum limit for the $h_c$ decays, we obtain $Γ(h_c \to η_c e^+ e^-) = 5.45(19)~\mathrm{keV}$, and $Γ(h_c \to η_c μ^+ μ^-) = 0.635(22)~\mathrm{keV}$. For the $χ_{c1}$ decays, we find: $Γ(χ_{c1} \to J/ψe^+ e^-)= 2.869(90)~\mathrm{keV}$, and $Γ(χ_{c1} \to J/ψμ^+ μ^-) = 0.1993(72)~\mathrm{keV}$. Our results for the $χ_{c1}$ decays show good compatibility with experimental data. However, our prediction for the $h_c \to η_c e^+ e^- $ decay rate is approximately $3σ$ larger than the BESIII result. We also present predictions for the differential decay widths as functions of the dilepton invariant mass, $q^2$, and for angular observables sensitive to longitudinal transition form factors, which are inaccessible in radiative decays with real photon emission. These results constitute the first fully dynamical lattice QCD predictions for dilepton decay rates in $h_c$ and $χ_{c1}$ charmonium transitions, including their differential distributions and angular observables. They provide benchmark predictions for future experimental studies.

Comments: 29 pages, 17 figures

▶

Status of the hadronic light-by-light contribution to the muon $g-2$ and holographic QCD predictions

▶

Microscopic primordial black holes as macroscopic dark matter from large extra dimensions

Authors

Giuseppe Filiberto Vitale, Gaetano Lambiase, Tanmay Kumar Poddar, et al.

Abstract

We study the coupled cosmological evolution of primordial black holes (PBHs) and radiation in the Arkani-Hamed-Dimopoulos-Dvali (ADD) framework with $n$ large extra dimensions and a fundamental gravity scale $M_\star$ at the TeV scale. For PBHs with horizon radius smaller than the compactification scale, the higher-dimensional geometry implies a larger horizon size at fixed mass and therefore a suppressed Hawking temperature. As a result, radiation accretion can overcome evaporation in the early Universe and drive a ``runaway'' phase of rapid mass growth. By numerically solving the coupled mass and energy-density evolution equations, we show that for $n \geq 2$ initially microscopic PBHs with initial mass $M_i \gtrsim 10^{12}\,$g can grow by many orders of magnitude and potentially reach macroscopic, even solar-mass, scales by matter-radiation equality. We determine the critical initial abundance $β_{\rm crit}$ required for PBHs to account for the observed dark matter density and find that extra dimensions dramatically lower this threshold, allowing viable scenarios with $β_{\rm crit}\sim 10^{-44}$. This identifies a previously unexplored region of parameter space in which the dark matter abundance is achieved through dynamical mass growth rather than large initial collapse fractions.

Comments: 14 pages, 10 figures, 2 tables, comments are welcome

Submitted: 2026-04-16 ArXiv ID: 2604.14871v1

▶

Rescattering effects in near-threshold $J/ψ$ photoproduction

▶

Deeper analysis of Fermi-LAT unassociated 4FGL J2112.5-3043 for possible identification

An efficient Wavelet-Based Hamiltonian Formulation of Quantum Field Theories using Flow-Equations

Authors

Mrinmoy Basak, Debsubhra Chakraborty, Nilmani Mathur

Abstract

We propose an effective Hamiltonian formulation of quantum field theories using a Daubechies wavelet basis in position space. Combined with flow-equation methods of the similarity renormalization group (SRG), this approach provides an efficient framework for analyzing quantum field theories by reducing the dimensionality of the Hamiltonian and systematically decoupling degrees of freedom across scales. As an application, the free scalar field theory has been reformulated within this framework to calculate the low-lying energy spectrum of the theory. These basis elements are known to transform the free scalar field theory into a theory of coupled localized oscillators, each of which is labeled by a location and a resolution index. In this representation, the Hamiltonian is naturally organized into fixed-resolution blocks, alongside blocks associated with the interactions between different resolutions. To decouple the different resolution modes and obtain a block diagonalized Hamiltonian with each block associated with a fixed resolution, the flow equation approach of SRG is applied. Finally, we demonstrate that with increasing resolution, the low-energy spectrum can be extracted from the effective lowest-resolution block of the Hamiltonian, leading to a significant reduction in computational cost.

Comments: 17 pages, 6 figures

Submitted: 2026-04-16 ArXiv ID: 2604.14594v1

▶

Loop integrals in de Sitter spacetime: The parity-split IBP system and $\di\log$-form differential equations

▶

Machine Learning - Statistics

▶

Structural interpretability in SVMs with truncated orthogonal polynomial kernels

▶

Amortized Optimal Transport from Sliced Potentials

Authors

Minh-Phuc Truong, Khai Nguyen

Abstract

We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich potentials derived from sliced OT. We introduce two amortization strategies: regression-based amortization (RA-OT) and objective-based amortization (OA-OT). In RA-OT, we formulate a functional regression model that treats Kantorovich potentials from the original OT problem as responses and those obtained from sliced OT as predictors, and estimate these models via least-squares methods. In OA-OT, we estimate the parameters of the functional model by optimizing the Kantorovich dual objective. In both approaches, the predicted OT plan is subsequently recovered from the estimated potentials. As amortized OT methods, both RA-OT and OA-OT enable efficient solutions to repeated OT problems across different measure pairs by reusing information learned from prior instances to rapidly approximate new solutions. Moreover, by exploiting the structure provided by sliced OT, the proposed models are more parsimonious, independent of specific structures of the measures, such as the number of atoms in the discrete case, while achieving high accuracy. We demonstrate the effectiveness of our approaches on tasks including MNIST digit transport, color transfer, supply-demand transportation on spherical data, and mini-batch OT conditional flow matching.

Comments: 26 pages, 11 figures, 10 tables

Submitted: 2026-04-16 ArXiv ID: 2604.15114v1

▶

MinShap: A Modified Shapley Value Approach for Feature Selection

Emre Özyıldırım, Barış Yaycı, Umut Eren Akturk, et al.

Abstract

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.

Authors

Jiamei Wu, Ce Zhang, Zhipeng Cai, et al.

Abstract

Conformal prediction (CP) has attracted broad attention as a simple and flexible framework for uncertainty quantification through prediction sets. In this work, we study how to deploy CP under differential privacy (DP) in a statistically efficient manner. We first introduce differential CP, a non-splitting conformal procedure that avoids the efficiency loss caused by data splitting and serves as a bridge between oracle CP and private conformal inference. By exploiting the stability properties of DP mechanisms, differential CP establishes a direct connection to oracle CP and inherits corresponding validity behavior. Building on this idea, we develop Differentially Private Conformal Prediction (DPCP), a fully private procedure that combines DP model training with a private quantile mechanism for calibration. We establish the end-to-end privacy guarantee of DPCP and investigate its coverage properties under additional regularity conditions. We further study the efficiency of both differential CP and DPCP under empirical risk minimization and general regression models, showing that DPCP can produce tighter prediction sets than existing private split conformal approaches under the same privacy budget. Numerical experiments on synthetic and real datasets demonstrate the practical effectiveness of the proposed methods.

Submitted: 2026-04-16 ArXiv ID: 2604.14621v1

▶

CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization

Authors

Feihu Huang, Guanyi Zhang, Songcan Chen

Abstract

Lion optimizer is a popular learning-based optimization algorithm in machine learning, which shows impressive performance in training many deep learning models. Although convergence property of the Lion optimizer has been studied, its generalization analysis is still missing. To fill this gap, we study generalization property of the Lion via algorithmic stability based on the mathematical induction. Specifically, we prove that the Lion has a generalization error of $O(\frac{1}{Nτ^T})$, where $N$ is training sample size, and $τ>0$ denotes the smallest absolute value of non-zero element in gradient estimator, and $T$ is the total iteration number. In addition, we obtain an interesting byproduct that the SignSGD algorithm has the same generalization error as the Lion. To enhance generalization of the Lion, we design a novel efficient Cautious Lion (i.e., CLion) optimizer by cautiously using sign function. Moreover, we prove that our CLion has a lower generalization error of $O(\frac{1}{N})$ than $O(\frac{1}{Nτ^T})$ of the Lion, since the parameter $τ$ generally is very small. Meanwhile, we study convergence property of our CLion optimizer, and prove that our CLion has a fast convergence rate of $O(\frac{\sqrt{d}}{T^{1/4}})$ under $\ell_1$-norm of gradient for nonconvex stochastic optimization, where $d$ denotes the model dimension. Extensive numerical experiments demonstrate effectiveness of our CLion optimizer.

Comments: 30 pages

Submitted: 2026-04-16 ArXiv ID: 2604.14587v1

▶

Generative Augmented Inference

Authors

Cheng Lu, Mengxin Wang, Dennis J. Zhang, et al.

Abstract

Data-driven operations management often relies on parameters estimated from costly human-generated labels. Recent advances in large language models (LLMs) and other AI systems offer inexpensive auxiliary data, but introduce a new challenge: AI outputs are not direct observations of the target outcomes, but could involve high-dimensional representations with complex and unknown relationships to human labels. Conventional methods leverage AI predictions as direct proxies for true labels, which can be inefficient or unreliable when this relationship is weak or misspecified. We propose Generative Augmented Inference (GAI), a general framework that incorporates AI-generated outputs as informative features for estimating models of human-labeled outcomes. GAI uses an orthogonal moment construction that enables consistent estimation and valid inference with flexible, nonparametric relationship between LLM-generated outputs and human labels. We establish asymptotic normality and show a "safe default" property: relative to human-data-only estimators, GAI weakly improves estimation efficiency under arbitrary auxiliary signals and yields strict gains whenever the auxiliary information is predictive. Empirically, GAI outperforms benchmarks across diverse settings. In conjoint analysis with weak auxiliary signals, GAI reduces estimation error by about 50% and lowers human labeling requirements by over 75%. In retail pricing, where all methods access the same auxiliary inputs, GAI consistently outperforms alternative estimators, highlighting the value of its construction rather than differences in information. In health insurance choice, it cuts labeling requirements by over 90% while maintaining decision accuracy. Across applications, GAI improves confidence interval coverage without inflating width. Overall, GAI provides a principled and scalable approach to integrating AI-generated information.

Submitted: 2026-04-16 ArXiv ID: 2604.14575v1

▶

Improving Machine Learning Performance with Synthetic Augmentation

How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations

Authors

Nouhaila Innan, Antonello Rosato, Alberto Marchisio, et al.

Abstract

Node embeddings act as the information interface for graph neural networks, yet their empirical impact is often reported under mismatched backbones, splits, and training budgets. This paper provides a controlled benchmark of embedding choices for graph classification, comparing classical baselines with quantum-oriented node representations under a unified pipeline. We evaluate two classical baselines alongside quantum-oriented alternatives, including a circuit-defined variational embedding and quantum-inspired embeddings computed via graph operators and linear-algebraic constructions. All variants are trained and tested with the same backbone, stratified splits, identical optimization and early stopping, and consistent metrics. Experiments on five different TU datasets and on QM9 converted to classification via target binning show clear dataset dependence: quantum-oriented embeddings yield the most consistent gains on structure-driven benchmarks, while social graphs with limited node attributes remain well served by classical baselines. The study highlights practical trade-offs between inductive bias, trainability, and stability under a fixed training budget, and offers a reproducible reference point for selecting quantum-oriented embeddings in graph learning.

Comments: 6 pages. Accepted at IJCNN 2026

Submitted: 2026-04-16 ArXiv ID: 2604.15273v1

▶

Prism: Symbolic Superoptimization of Tensor Programs

▶

SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation

Authors

Tianhao Fu, Austin Wang, Charles Chen, et al.

Abstract

Reliable uncertainty estimation is critical for medical image segmentation, where automated contours feed downstream quantification and clinical decision support. Many strong uncertainty methods require repeated inference, while efficient single-forward-pass alternatives often provide weaker failure ranking or rely on restrictive feature-space assumptions. We present $\textbf{SegWithU}$, a post-hoc framework that augments a frozen pretrained segmentation backbone with a lightweight uncertainty head. SegWithU taps intermediate backbone features and models uncertainty as perturbation energy in a compact probe space using rank-1 posterior probes. It produces two voxel-wise uncertainty maps: a calibration-oriented map for probability tempering and a ranking-oriented map for error detection and selective prediction. Across ACDC, BraTS2024, and LiTS, SegWithU is the strongest and most consistent single-forward-pass baseline, achieving AUROC/AURC of $0.9838/2.4885$, $0.9946/0.2660$, and $0.9925/0.8193$, respectively, while preserving segmentation quality. These results suggest that perturbation-based uncertainty modeling is an effective and practical route to reliability-aware medical segmentation. Source code is available at https://github.com/ProjectNeura/SegWithU.

Submitted: 2026-04-16 ArXiv ID: 2604.15271v1

▶

Cloning is as Hard as Learning for Stabilizer States

Authors

Nikhil Bansal, Matthias C. Caro, Gaurav Mahajan

Abstract

The impossibility of simultaneously cloning non-orthogonal states lies at the foundations of quantum theory. Even when allowing for approximation errors, cloning an arbitrary unknown pure state requires as many initial copies as needed to fully learn the state. Rather than arbitrary unknown states, modern quantum learning theory often considers structured classes of states and exploits such structure to develop learning algorithms that outperform general-state tomography. This raises the question: How do the sample complexities of learning and cloning relate for such structured classes? We answer this question for an important class of states. Namely, for $n$-qubit stabilizer states, we show that the optimal sample complexity of cloning is $Θ(n)$. Thus, also for this structured class of states, cloning is as hard as learning. To prove these results, we use representation-theoretic tools in the recently proposed Abelian State Hidden Subgroup framework and a new structured version of the recently introduced random purification channel to relate stabilizer state cloning to a variant of the sample amplification problem for probability distributions that was recently introduced in classical learning theory. This allows us to obtain our cloning lower bounds by proving new sample amplification lower bounds for classes of distributions with an underlying linear structure. Our results provide a more fine-grained perspective on No-Cloning theorems, opening up connections from foundations to quantum learning theory and quantum cryptography.

Comments: 10 + 33 + 8 pages

Submitted: 2026-04-16 ArXiv ID: 2604.15269v1

▶

Stability and Generalization in Looped Transformers

Authors

Asher Labovich

Abstract

Looped transformers promise test-time compute scaling by spending more iterations on harder problems, but it remains unclear which architectural choices let them extrapolate to harder problems at test time rather than memorize training-specific solutions. We introduce a fixed-point based framework for analyzing looped architectures along three axes of stability -- reachability, input-dependence, and geometry -- and use it to characterize when fixed-point iteration yields meaningful predictions. Theoretically, we prove that looped networks without recall have countable fixed points and cannot achieve strong input-dependence at any spectral regime, while recall combined with outer normalization reliably produces a regime in which fixed points are simultaneously reachable, locally smooth in the input, and supported by stable backpropagation. Empirically, we train single-layer looped transformers on chess, sudoku, and prefix-sums and find that downstream performance tracks the framework's predictions across tasks and architectural configurations. We additionally introduce internal recall, a novel recall placement variant, and show that it becomes competitive with -- and on sudoku, substantially better than -- standard recall placement once outer normalization is applied.

Comments: 11 main pages, 27 total

An Analysis of Regularization and Fokker-Planck Residuals in Diffusion Models for Image Generation

Authors

Onno Niemann, Gonzalo Martínez Muñoz, Alberto Suárez Gonzalez

Abstract

Recent work has shown that diffusion models trained with the denoising score matching (DSM) objective often violate the Fokker--Planck (FP) equation that governs the evolution of the true data density. Directly penalizing these deviations in the objective function reduces their magnitude but introduces a significant computational overhead. It is also observed that enforcing strict adherence to the FP equation does not necessarily lead to improvements in the quality of the generated samples, as often the best results are obtained with weaker FP regularization. In this paper, we investigate whether simpler penalty terms can provide similar benefits. We empirically analyze several lightweight regularizers, study their effect on FP residuals and generation quality, and show that the benefits of FP regularization are available at substantially lower computational cost. Our code is available at https://github.com/OnnoNiemann/fp_diffusion_analysis.

Comments: Accepted at IJCNN 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Submitted: 2026-04-16 ArXiv ID: 2604.15171v1

▶

Assessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling Data

▶

When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Authors

Marcus Armstrong

Abstract

Post-training quantization (PTQ) assumes that a well-converged model is a quantization-ready model. We show this assumption fails in a structured, measurable, and previously uncharacterized way. Using a calibration-free per-group INT4 probe applied to all 154 publicly available Pythia-160m training checkpoints, we identify a three-phase divergence structure: a rapid-learning phase where both FP32 perplexity and quantization robustness improve together, a meta-stable plateau lasting roughly 70,000 steps where FP32 perplexity stagnates but INT4 gap remains bounded, and an explosive divergence phase where the INT4 gap compounds from 11% to 517% while FP32 perplexity barely moves. Critically, this divergence begins not when the learning rate starts decaying, but precisely when FP32 perplexity converges a finer-grained onset predictor that implies post-convergence weight updates, rather than decay magnitude alone, are the proximate cause. We further show that INT8 quantization is entirely immune throughout all three phases, constraining the mechanism to the coarseness of the 16-level INT4 grid specifically, and rule out weight outlier accumulation as the mechanism via direct kurtosis measurement. Finally, we conduct a controlled fork experiment from the pre-divergence checkpoint comparing three learning rate schedules (cosine continuation, SGDR warm restarts, and our proposed Oscillatory Lock-In) across nine independent runs. SGDR uniformly accelerates divergence (0/9 pairwise wins against cosine), while OLI's settled cool phases reduce the INT4 gap by 2.2 percentage points on average (t = -5.46, p < 0.0001), demonstrating that schedule amplitude calibration, not oscillation alone, determines whether perturbation helps or hurts. Our code, probe implementation, and all 154-checkpoint audit results are released publicly.

Submitted: 2026-04-16 ArXiv ID: 2604.15167v1

▶

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

Authors

Arman Hatami, Romina Aalishah, Ilya E. Monosov

Abstract

Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.

Comments: Accepted to the CVPR 2026 Workshop on Machine Unlearning for Vision (MUV)

Submitted: 2026-04-16 ArXiv ID: 2604.15166v1

▶

LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

Minh-Phuc Truong, Khai Nguyen

Abstract

Comments: 26 pages, 11 figures, 10 tables

Submitted: 2026-04-16 ArXiv ID: 2604.15114v1

▶

IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation

▶

MinShap: A Modified Shapley Value Approach for Feature Selection

Abstract

The evaluation of fairness in machine learning systems has become a central concern in high-stakes applications, including biometric recognition, healthcare decision-making, and automated risk assessment. Existing approaches typically rely on a small number of fairness metrics to assess model behaviour across group partitions, implicitly assuming that these metrics provide consistent and reliable conclusions. However, different fairness metrics capture distinct statistical properties of model performance and may therefore produce conflicting assessments when applied to the same system. In this work, we investigate the consistency of fairness evaluation by conducting a systematic multi-metric analysis of demographic bias in machine learning models. Using face recognition as a controlled experimental setting, we evaluate model performance across multiple group partitions under a range of commonly used fairness metrics, including error-rate disparities and performance-based measures. Our results demonstrate that fairness assessments can vary significantly depending on the choice of metrics, leading to contradictory conclusions regarding model bias. To quantify this phenomenon, we introduce the Fairness Disagreement Index (FDI), a measure designed to capture the degree of inconsistency across fairness metrics. We further show that disagreement remains high across thresholds and model configurations. These findings highlight a critical limitation in current fairness evaluation practices and suggest that single-metric reporting is insufficient for reliable bias assessment.

Comments: 15 pages, 4 figues, 5 tables

Submitted: 2026-04-16 ArXiv ID: 2604.15038v1

▶

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

▶

DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits

Authors

Maksim Pershin, Ivan Golovanov, Pavel Baltabaev, et al.

Abstract

Contextual bandit algorithms suffer from high regret during cold-start, when the learner has insufficient data to distinguish good arms from bad. We propose augmenting Disjoint LinUCB with LLM pseudo-observations: after each round, a large language model predicts counterfactual rewards for the unplayed arms, and these predictions are injected into the learner as weighted pseudo-observations. The injection weight is controlled by a calibration-gated decay schedule that tracks the LLM's prediction accuracy on played arms via an exponential moving average; high calibration error suppresses the LLM's influence, while accurate predictions receive higher weight during the critical early rounds. We evaluate on two contextual bandit environments - UCI Mushroom (2-arm, asymmetric rewards) and MIND-small (5-arm news recommendation) - and find that when equipped with a task-specific prompt, LLM pseudo-observations reduce cumulative regret by 19% on MIND relative to pure LinUCB. However, generic counterfactual prompt framing increases regret on both environments, demonstrating that prompt design is the dominant factor, more important than the choice of decay schedule or calibration gating parameters. We analyze the failure modes of calibration gating on domains with small prediction errors and provide a theoretical motivation for the bias-variance trade-off governing pseudo-observation weight.

Submitted: 2026-04-16 ArXiv ID: 2604.14961v1

▶

MLDAS: Machine Learning Dynamic Algorithm Selection for Software-Defined Networking Security

Authors

Pablo Benlloch, Oscar Romero, Antonio Leon, et al.

Abstract

Network security is a critical concern in the digital landscape of today, with users demanding secure browsing experiences and protection of their personal data. This study explores the dynamic integration of Machine Learning (ML) algorithms with Software-Defined Networking (SDN) controllers to enhance network security through adaptive decision mechanisms. The proposed approach enables the system to dynamically choose the most suitable ML algorithm based on the characteristics of the observed network traffic. This work examines the role of Intrusion Detection Systems (IDS) as a fundamental component of secure communication networks and discusses the limitations of SDN-based attack detection mechanisms. The proposed framework uses adaptive model selection to maintain reliable intrusion detection under varying network conditions. The study highlights the importance of analyzing traffic-type-based metrics to define effective classification rules and enhance the performance of ML models. Additionally, it addresses the risks of overfitting and underfitting, underscoring the critical role of hyperparameter tuning in optimizing model accuracy and generalization. The central contribution of this work is an automated mechanism that adaptively selects the most suitable ML algorithm according to real-time network conditions, prioritizing detection robustness and operational feasibility within SDN environments.

Comments: 22 pages, 15 figures, 12 tables

Submitted: 2026-04-16 ArXiv ID: 2604.14957v1

Emre Özyıldırım, Barış Yaycı, Umut Eren Akturk, et al.

Abstract

Submitted: 2026-04-16 ArXiv ID: 2604.14908v1

▶

Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task

Authors

Evaldas Vaiciukynas, Paulius Danenas, Linas Ablonskis, et al.

Abstract

Online hate speech and abusive language pose a growing challenge for content moderation, especially in multilingual settings and for low-resource languages such as Lithuanian. This paper investigates to what extent modern multilingual sentence embedding models can support accurate hate speech detection in Lithuanian, Russian, and English, and how their performance depends on downstream modeling choices and feature dimensionality. We introduce LtHate, a new Lithuanian hate speech corpus derived from news portals and social networks, and benchmark six modern multilingual encoders (potion, gemma, bge, snow, jina, e5) on LtHate, RuToxic, and EnSuperset using a unified Python pipeline. For each embedding, we train both a one class HBOS anomaly detector and a two class CatBoost classifier, with and without principal component analysis (PCA) compression to 64-dimensional feature vectors. Across all datasets, two class supervised models consistently and substantially outperform one class anomaly detection, with the best configurations achieving up to 80.96% accuracy and AUC ROC of 0.887 in Lithuanian (jina), 92.19% accuracy and AUC ROC of 0.978 in Russian (e5), and 77.21% accuracy and AUC ROC of 0.859 in English (e5 with PCA). PCA compression preserves almost all discriminative power in the supervised setting, while showing some negative impact for the unsupervised anomaly detection case. These results demonstrate how modern multilingual sentence embeddings combined with gradient boosted decision trees provide robust soft-computing solutions for multilingual hate speech detection applications.

Comments: Submitted to Applied Soft Computing (Status: Decision in Process)

Submitted: 2026-04-16 ArXiv ID: 2604.14907v1

▶

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

Authors

Mariia Ivonina, Jakub Rydzewski

Abstract

The SARS-CoV-2 RNA pseudoknot is a promising target for antiviral intervention, as it regulates the efficiency of $-$1 programmed ribosomal frameshifting ($-$1 PRF), a mechanism that is essential for viral protein synthesis. The pseudoknot represents a viral RNA sequence composed of helical stems that adopts two long-lived topologies, threaded and unthreaded. Ligand-induced distortion of this fold is thought to underlie the susceptibility of $-$1 PRF to small-molecule inhibitors. Resolving these distortions from unbiased molecular dynamics (MD) requires collective variables (CVs) that isolate the slowest dynamic modes of the RNA--ligand system from the high-frequency fluctuations. Here, we use spectral map (SM), a thermodynamics-driven machine-learning method, to learn such CVs directly from MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and two related analogs. We examine both threaded and unthreaded pseudoknot topologies and consider the neutral and ionized ligand forms relevant at physiological pH. Free-energy landscapes show that ligand-induced destabilization is topology-selective: merafloxacin and its analogs destabilize the S2 stem in the threaded pseudoknot, whereas in the unthreaded pseudoknot, destabilization shifts to the S1 and S3 stems. We find that the zwitterionic form of merafloxacin uniquely imposes slow dynamics on the otherwise featureless unthreaded pseudoknot. Furthermore, the neutral and zwitterionic forms of merafloxacin differ qualitatively in their mechanisms within the same RNA topology. Overall, these results clarify how pseudoknot topology, ligand type, and protonation state shape the slow conformational dynamics of viral RNA and establish physiological protonation as an essential factor for modeling RNA-targeted drug action.

Submitted: 2026-04-16 ArXiv ID: 2604.14906v1

▶

Beyond Importance Sampling: Rejection-Gated Policy Optimization

Submitted: 2026-04-15 ArXiv ID: 2604.14359v1

▶

Probing $τ$ lepton dipole moments at future Lepton Colliders

▶

AI-assisted modeling and Bayesian inference of unpolarized quark transverse momentum distributions from Drell-Yan data

Authors

Zhong-Bo Kang, Luke Sellers, Congyue Zhang, et al.

Abstract

We present an extraction of unpolarized quark transverse-momentum-dependent parton distribution functions (TMD PDFs) from Drell-Yan data within a Bayesian inference framework, incorporating artificial intelligence at multiple stages of the analysis. Our analysis is performed at ${\rm N^3LO}$ in perturbative QCD combined with ${\rm N^4LL}$ resummation accuracy. We first employ an AI-driven iterative procedure to explore and rank candidate functional forms for the nonperturbative contributions to TMD PDFs at the initial scale, as well as for the Collins-Soper evolution kernel, using $χ^2$ fits and physics constraints. To enable efficient Bayesian inference, we construct a surrogate model for TMD cross sections by training a machine-learning emulator over the parameter space, replacing computationally expensive repeated evaluations and allowing scalable sampling with an affine-invariant Markov Chain Monte Carlo (MCMC) ensemble. Using this framework, we perform a global analysis of Drell-Yan data from fixed-target, RHIC, and LHC experiments and extract TMD PDFs with quantified uncertainties. We compare the results with those obtained using the replica method and highlight differences in the resulting uncertainty estimates.

Comments: 48 pages, 14 figures

Submitted: 2026-04-15 ArXiv ID: 2604.14133v1

Authors

A. Senol, B. S. Ozaltay, M. Tekin, et al.

Abstract

We investigate flavor-changing neutral current (FCNC) interactions of the top quark at a future muon collider with a center-of-mass energy of $\sqrt{s} = 10~\mathrm{TeV}$. The process $μ^{+}μ^{-} \to ν_μ\,μ^+\,b\,j$ and its corresponding charge conjugate are considered as a probe of anomalous $tqZ$ and $tqγ$ couplings, parametrized within an effective field theory framework in terms of $κ_{tqZ}$ and $λ_{tqγ}$. Signal and background events are simulated using Monte Carlo techniques, including parton showering and hadronization with \texttt{Pythia} and a fast detector simulation based on \texttt{Delphes} with a dedicated 10~TeV muon collider setup. A multivariate analysis based on boosted decision trees is employed to enhance the signal discrimination. Assuming an integrated luminosity of $10~\mathrm{ab}^{-1}$, we obtain projected sensitivities to the anomalous couplings at the $\mathcal{O}(10^{-3})$ level, corresponding to branching ratio limits of $\mathcal{O}(10^{-6})$ for the rare $t \to qZ$ and $t \to qγ$ decays. These results significantly improve upon the current bounds from the CMS and ATLAS collaborations, extending the sensitivity by more than one order of magnitude. Our findings demonstrate that a multi-TeV muon collider provides a powerful and complementary platform for probing rare top-quark interactions, offering a unique opportunity to explore physics beyond the Standard Model through FCNC processes.

Comments: 24 pages, 6 figures

Submitted: 2026-04-15 ArXiv ID: 2604.13562v1

▶

Enhancing Event Reconstruction in Hyper-Kamiokande with Machine Learning: A ResNet Implementation

Authors

Andrew Atta, Nick Prouse, Shuoyu Chen, et al.

Abstract

The forthcoming Hyper-Kamiokande experiment requires substantially larger Monte Carlo datasets than previous experiments to satisfy stringent systematic-uncertainty requirements. While traditional maximum-likelihood reconstruction provides high-quality results, its per-event computational cost makes processing these large samples increasingly impractical. We demonstrate a neural-network-based reconstruction approach for the Hyper-Kamiokande far detector using simulated data. Single-particle events with kinetic energies from the Cherenkov threshold up to 2 GeV are propagated through the detector, with PMT charge and timing information mapped to $190\times189$ two-channel images serving as inputs to ResNet models in the WatChMaL framework. These models (i) classify events into four particle hypotheses ($e$, $μ$, $γ$, $π^{0}$) and (ii) regress the vertex, direction, and momentum of electrons and muons. Averaged over the full kinematic range, the regression models achieve momentum resolutions of $1.35\%$ and $2.39\%$, angular resolutions of $1.25^\circ$ and $1.94^\circ$, and vertex resolutions of $28.2$ cm and $25.4$ cm, for muons and electrons respectively, broadly consistent with traditional methods. The classifier improves $e$-$μ$, $e$-$γ$, and $e$-$π^{0}$ separation, with ROC curve areas of $0.9999992$, $0.633$, and $0.9526$. Crucially, our networks achieve inference times of 1-2 ms per event on a single GPU, yielding speed-ups of $3.2\times10^{4}$-$5.2\times10^{4}$ relative to likelihood-based reconstruction, highlighting deep learning as a scalable alternative for Hyper-Kamiokande event reconstruction.

Submitted: 2026-04-15 ArXiv ID: 2604.13503v1

▶

High Energy Physics - Phenomenology

▶

Entropy considerations in Many-Body Gravity and General Relativity, and the impact on cosmic inflation

Authors

S Ganesh

Abstract

Many body gravity (MBG) is a novel modified theory of gravity formulated in a 5-D space-time-temperature framework, in which the variation in temperature is recast as a variation in the 5-D metric. Previous work on MBG has shown that it can reproduce galaxy rotation curves, radial acceleration relation and the weak gravitational lensing of the bullet cluster, without the inclusion of dark matter. In this work we show that MBG can reproduce cosmic inflation, and in the process, analyze fundamental relations between interaction, time and gravity. To analyze cosmic inflation using interacting massless scalar fields, we first analyze theoretically a hypothetical universe with a single massive particle, or a collection of non-interacting massive particles. A quantitative relation between time and interaction is developed using Quantum Field Theory (QFT), which suggests that the notion of time becomes ill-defined for such a universe. The mass terms in MBG and General Relativity cause a discrepancy with the QFT results. An interacting massless scalar field then becomes a necessity to resolve the issue at the onset of inflation. However, the entropic terms in the MBG field equations are seen to be consistent with the QFT results and further accelerate inflation. The slow-roll condition is shown to be a natural consequence of the Euler-Lagrange equations of motion governing the massless scalar field in 5-D space-time-temperature, during the early phase of inflation. Finally, the MBG field equations are solved in the context of a Friedmann metric, leading to inflation. The matter era is also investigated.

Comments: 22 pages, 6 figures

Submitted: 2026-04-15 ArXiv ID: 2604.14481v1

▶

Wave-envelope dark matter beyond the monochromatic paradigm

▶

Astrophysical bounds on the high-energy evolution of neutrino mixing

▶

AI-assisted modeling and Bayesian inference of unpolarized quark transverse momentum distributions from Drell-Yan data

Authors

Zhong-Bo Kang, Luke Sellers, Congyue Zhang, et al.

Abstract

Comments: 48 pages, 14 figures

Submitted: 2026-04-15 ArXiv ID: 2604.14133v1

▶

Gravitational Sommerfeld Effects: Formalism, Renormalization, and Perturbation to $O(G^{10})$

Authors

Chih-Hao Chang, Chia-Hsien Shen, Zihan Zhou

Abstract

In the effective field theory (EFT) description of binary inspirals, the radiated gravitational waveform receives universal corrections from the curved background, the ``tail effects'', that resum into the so-called ``Sommerfeld factor''. We develop a systematic framework for computing this gravitational Sommerfeld factor for scalar perturbations with the presence of tidal effects on the system. Using the worldline EFT, we recast the diagrammatic resummation as a solution to the $d$-dimensional wave equation with a localized source, and derive a closed-form expression for the Sommerfeld factor in terms of the EFT connection matrix. We prove that the phase of the Sommerfeld factor is exactly the same as elastic Compton scattering phase shift when there is no tidal dissipation. By combining the renormalization techniques in EFT with the Mano--Suzuki--Takasugi method in black hole perturbation theory, we analytically solve the Sommerfeld factor for both the magnitude and phase to $O(G^{10})$ for the $\ell = 0, 1, 2$ partial waves. We further establish a new renormalization group equation for the radiative multipole moments, whose exact solution yields an improved resummation of the waveform beyond the universal tail logarithms. These high-precision data and exact relations pave the way for future resummation models of the waveform.

Comments: 23 pages, 2 figures

Submitted: 2026-04-15 ArXiv ID: 2604.14112v1

▶

A dynamical implementation of colour coherence for quenched jets in JEWEL

Authors

Korinna Zapp

Abstract

Colour coherence affects the radiation pattern of hard partons both in vacuum and in a dense coloured background formed in heavy ion collisions. In vacuum evolution it leads to the well-known phenomenon of angular ordering, and in heavy ion collisions the appearance of a medium resolution scale strongly affects the way in which a fragmenting hard parton interacts with the background medium. In this paper I present the implementation of colour coherence in the JEWEL event generator for jet evolution in a dense medium. In each interaction between a hard parton and the medium it is checked whether the momentum transfer of the scattering is sufficient to resolve the colour dipole. In this way it is dynamically decided which structures stay coherent. Importantly, scatterings that resolve an individual parton disrupt the colour coherence, which affects the next splitting via the loss of angular ordering. This leads to a suppression of hard radiation, and consequently a reduction in overall scattering rate, which is the dominant source of effects of colour coherence observable in reconstructed jets. I discuss these modifications using the examples of nuclear modification factor, jet fragmentation function and jet-hadron correlations.

Comments: 44 pages, 30 figures, code available at jewel.hepforge.org

Submitted: 2026-04-15 ArXiv ID: 2604.13932v1

▶

Refining two-loop corrections to trilinear Higgs couplings in the Two-Higgs-Doublet Model

Authors

Johannes Braathen, Felix Egle, Alain Verduras Schaeidt

Abstract

The precise determination of the Higgs self-couplings is an essential task for understanding electroweak symmetry breaking and probing physics beyond the Standard Model (SM). The calculation of two-loop corrections to scalar couplings is important as it provides a critical test of the perturbative stability of the theoretical predictions, especially in scenarios with extended scalar sectors where large one-loop corrections can occur. Moreover, two-loop corrections need to be taken into account for the future perspective of precisely measuring the trilinear Higgs self-coupling. We present new results for the leading two-loop corrections to trilinear Higgs couplings in the Two-Higgs-Doublet Model (2HDM). We focus in particular on the couplings $λ_{hhh}$ and $λ_{hhH}$, which are relevant for Higgs pair production at the (HL-)LHC or at future linear colliders. We address the renormalisation of the alignment limit in the Higgs basis and give some insights into technical details of the calculation. Finally, we discuss the phenomenological impact of our results on di-Higgs production differential distributions.

Comments: 15 pages, 3 figures. Contribution to the proceedings of the International Workshop on Future Linear Colliders (LCWS2025), 20-24 October 2025. Valencia, Spain (C25-10-20.1)

Submitted: 2026-04-15 ArXiv ID: 2604.13922v1

▶

Correlation between Ultra-High-Energy Neutrino KM3-230213A and Gamma-Ray Bursts

▶

Robust parameter inference for Taiji via time-frequency contrastive learning and normalizing flows

Authors

Tian-Yang Sun, Bo Liang, Ji-Yu Song, et al.

Abstract

Transient noise artifacts, commonly referred to as glitches, pose a major challenge to parameter inference for space-based gravitational-wave (GW) observations. We develop a glitch-robust amortized inference framework for massive black hole binaries in the Taiji detector configuration by combining conditional normalizing flows, a time-frequency multimodal fusion encoder, and contrastive learning. To enable large-scale training on contaminated data, we further introduce a neural glitch generator that produces high-fidelity synthetic transients at substantially reduced computational cost. Systematic experiments show that, under glitch contamination, the proposed method yields more accurate and better-calibrated posteriors than a conventional Markov Chain Monte Carlo baseline. In ablation studies, the full time-frequency model with contrastive learning performs best overall and remains robust to variations in glitch duration and merger-relative timing. We further show that standard coverage diagnostics alone are insufficient to fully assess posterior fidelity. We therefore complement them with the continuous ranked probability score, which provides a stricter assessment of global distributional agreement in non-ideal GW data. Taken together, these results establish deep-learning-based amortized inference as a promising framework for fast and robust Bayesian parameter estimation in future space-based GW observations.

Comments: 15 pages, 8 figures

Submitted: 2026-04-15 ArXiv ID: 2604.13867v1

▶

Dark energy, spatial curvature, and star formation efficiency from JWST photometric and spectroscopic high-redshift galaxies

Authors

Leonardo Comini, Sunny Vagnozzi, Abraham Loeb

Abstract

Early observations from the James Webb Space Telescope (JWST) have revealed an overabundance of massive high-redshift galaxies, raising the question of whether this points to new physics beyond $Λ$CDM, or an enhanced formation efficiency of massive stars. We revisit this issue going beyond earlier analyses based on direct comparisons to theoretical bounds at a fixed cosmology, by performing a full Bayesian analysis of the most extreme galaxies in the CEERS imaging and FRESCO spectroscopic samples, jointly constraining cosmological parameters and the baryon-to-star conversion efficiency $ε$. We do so not only within the spatially flat $Λ$CDM model, but also in models where the dark energy equation of state $w$ and/or the spatial curvature parameter $Ω_K$ are allowed to vary, carefully discussing the impact of both $w$ and $Ω_K$ on the cumulative comoving stellar mass density. Within the flat $Λ$CDM model, once cosmological parameters are marginalized over, the CEERS sample provides a weak $2σ$ lower limit of $ε\gtrsim 0.07$, compatible with astrophysical expectations. In contrast, the FRESCO sample requires $ε\gtrsim 0.5$ at $2σ$, with values $ε\lesssim 0.2$ disfavored at $>5σ$. These results do not qualitatively change when we allow $w$ and/or $Ω_K$ to vary, with no evidence for deviations from $w=-1$ or $Ω_K=0$. Our results therefore suggest that the origin of the ``JWST tension'' is unlikely to be cosmological, but lies in the astrophysics of galaxy formation.

Comments: 18 pages, 8 sub-figures arranged into 6 figures, key figure with clear visual summary of results is Fig. 6

Submitted: 2026-04-15 ArXiv ID: 2604.13866v1

▶

Fast Neutrino-Flavor Conversion with Attenuation and Global Lepton Gradient

Authors

Masamichi Zaizen, Hiroki Nagakura

Abstract

Fast neutrino-flavor conversion (FFC) can nontrivially alter neutrino radiation field in core-collapase supernovae (CCSN) and binary neutron-star merger (BNSM) remnants. However, its interplay with global geometry remains poorly understood because microscopic flavor conversion scales are much shorter than global transport scales. We perform global quantum kinetic neutrino transport simulations in spherical geometry with neutrino and matter backgrounds, using an attenuated oscillation Hamiltonian. We find that steep radial lepton gradients can suppress FFC, whereas the suppression is highly sensitive to the adopted attenuation parameter. This behavior is explained by an adiabatic condition: flavor coherence can grow sufficiently only while the flavor wave remains on the unstable branch in the local dispersion relation during propagation. Background variation shifts the unstable branch, while attenuation lengthens the growth timescale, making the flavor coherence following more difficult. We provide an approximate formula for the adiabaticity that can be used directly in CCSN and BNSM models developed by classical neutrino transport simulations. Our results show that attenuation artificially leads to an overestimation of the impact of background variation and should therefore be applied with caution in global simulations of neutrino flavor conversion.

Comments: 14 pages, 8 figures; Submitted to PRD

Submitted: 2026-04-15 ArXiv ID: 2604.13617v1

▶

Sensitivity to top-quark FCNC interactions at future muon colliders

Dustin Keller

Abstract

Collinear factorization and the leading-twist operator product expansion (OPE) in perturbative QCD express suitably inclusive observables in scale-separated kinematics as composites of perturbative short-distance coefficients with universal long-distance non-perturbative correlators such as parton distribution functions (PDFs), up to controlled power corrections. A persistent structural feature is \emph{presentation non-uniqueness}: coefficients and correlators are not individually physical, but are defined only up to finite factorization-scheme redefinitions induced by collinear subtractions and renormalized-operator mixing. We formalize this redundancy categorically by introducing an \emph{interface algebra object} encoding admissible finite collinear counterterms/mixing kernels and by organizing coefficient data and hadronic data as right/left modules over this algebra in a symmetric monoidal category encoding the chosen recomposition calculus. Our main result, the \emph{Core Representation Theorem}, identifies the universal scheme-invariant carrier: the functor of balanced (scheme-invariant) pairings is represented by the relative tensor product $C\otimes_A f$, which is terminal among all quotients of the naive composite $C\otimes f$ that preserve scheme-invariant semantics. Finally, we show how standard physics inputs (symmetry constraints, locality/OPE, and a stated accuracy truncation) canonically induce the interface algebra and module structures, and we prove a minimal closure principle for completing a generating set of long-distance operators/correlators to an $A$-stable sector.

Comments: accepted for publication in Theoretical and Mathematical Physics

Submitted: 2026-04-15 ArXiv ID: 2604.13439v2

▶

Machine Learning - Statistics

▶

Early-stopped aggregation: Adaptive inference with computational efficiency

BOAT: Navigating the Sea of In Silico Predictors for Antibody Design via Multi-Objective Bayesian Optimization

Authors

Jackie Rao, Ferran Gonzalez Hernandez, Leon Gerard, et al.

Abstract

Antibody lead optimization is inherently a multi-objective challenge in drug discovery. Achieving a balance between different drug-like properties is crucial for the development of viable candidates, and this search becomes exponentially challenging as desired properties grow. The ever-growing zoo of sophisticated in silico tools for predicting antibody properties calls for an efficient joint optimization procedure to overcome resource-intensive sequential filtering pipelines. We present BOAT, a versatile Bayesian optimization framework for multi-property antibody engineering. Our `plug-and-play' framework couples uncertainty-aware surrogate modeling with a genetic algorithm to jointly optimize various predicted antibody traits while enabling efficient exploration of sequence space. Through systematic benchmarking against genetic algorithms and newer generative learning approaches, we demonstrate competitive performance with state-of-the-art methods for multi-objective protein optimization. We identify clear regimes where surrogate-driven optimization outperforms expensive generative approaches and establish practical limits imposed by sequence dimensionality and oracle costs.

Comments: Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026

Submitted: 2026-04-15 ArXiv ID: 2604.13980v1

▶

Sandpile Economics: Theory, Identification, and Evidence

Authors

Diego Vallarino

Abstract

Why do capitalist economies recurrently generate crises whose severity is disproportionate to the size of the triggering shock? This paper proposes a structural answer grounded in the evolutionary geometry of production networks. As economies evolve through specialization, integration, and competitive selection, their inter-sectoral linkages drift toward configurations of increasing geometric fragility, eventually crossing a threshold beyond which small disturbances generate disproportionately large cascades. We introduce Sandpile Economics, a formal framework that interprets macroeconomic instability as an emergent property of disequilibrium production networks. The key state variable is the Forman--Ricci curvature of the input--output graph, capturing local substitution possibilities when supply chains are disrupted. We show that when curvature falls below an endogenous threshold, the distribution of cascade sizes follows a power law with tail index $α\in (1,2)$, implying a regime of unbounded amplification. The underlying mechanism is evolutionary: specialization reduces input substitutability, pushing the economy toward criticality, while crisis episodes induce endogenous network reconfiguration and path dependence. These dynamics are inherently non-ergodic and cannot be captured by representative-agent frameworks. Empirically, using global input--output data, we document that production networks operate in persistently negative curvature regimes and that curvature robustly predicts medium-run output dynamics. A one-standard-deviation increase in curvature is associated with higher cumulative growth over three-year horizons, and curvature systematically outperforms standard network metrics in explaining cross-country differences in resilience.

Submitted: 2026-04-15 ArXiv ID: 2604.13890v1

▶

Forecasting Multivariate Time Series under Predictive Heterogeneity: A Validation-Driven Clustering Framework

Michael Leznik

Abstract

We introduce Metric-Aware Principal Component Analysis (MAPCA), a unified framework for scale-invariant representation learning based on the generalised eigenproblem max Tr(W^T Sigma W) subject to W^T M W = I, where M is a symmetric positive definite metric matrix. The choice of M determines the representation geometry. The canonical beta-family M(beta) = Sigma^beta, beta in [0,1], provides continuous spectral bias control between standard PCA (beta=0) and output whitening (beta=1), with condition number kappa(beta) = (lambda_1/lambda_p)^(1-beta) decreasing monotonically to isotropy. The diagonal metric M = D = diag(Sigma) recovers Invariant PCA (IPCA), a method rooted in Frisch (1928) diagonal regression, as a distinct member of the broader framework. We prove that scale invariance holds if and only if the metric transforms as M_tilde = CMC under rescaling C, a condition satisfied exactly by IPCA but not by the general beta-family at intermediate values. Beyond its classical interpretation, MAPCA provides a geometric language that unifies several self-supervised learning objectives. Barlow Twins and ZCA whitening correspond to beta=1 (output whitening); VICReg's variance term corresponds to the diagonal metric. A key finding is that W-MSE, despite being described as a whitening-based method, corresponds to M = Sigma^{-1} (beta = -1), outside the spectral compression range entirely and in the opposite spectral direction to Barlow Twins. This distinction between input and output whitening is invisible at the level of loss functions and becomes precise only within the MAPCA framework.

Comments: 12 pages , one figure

Submitted: 2026-04-15 ArXiv ID: 2604.14249v1

▶

Robust Low-Rank Tensor Completion based on M-product with Weighted Correlated Total Variation and Sparse Regularization

Authors

Biswarup Karmakar, Ratikanta Behera

Abstract

The robust low-rank tensor completion problem addresses the challenge of recovering corrupted high-dimensional tensor data with missing entries, outliers, and sparse noise commonly found in real-world applications. Existing methodologies have encountered fundamental limitations due to their reliance on uniform regularization schemes, particularly the tensor nuclear norm and $\ell_1$ norm regularization approaches, which indiscriminately apply equal shrinkage to all singular values and sparse components, thereby compromising the preservation of critical tensor structures. The proposed tensor weighted correlated total variation (TWCTV) regularizer addresses these shortcomings through an $M$-product framework that combines a weighted Schatten-$p$ norm on gradient tensors for low-rankness with smoothness enforcement and weighted sparse components for noise suppression. The proposed weighting scheme adaptively reduces the thresholding level to preserve both dominant singular values and sparse components, thus improving the reconstruction of critical structural elements and nuanced details in the recovered signal. Through a systematic algorithmic approach, we introduce an enhanced alternating direction method of multipliers (ADMM) that offers both computational efficiency and theoretical substantiation, with convergence properties comprehensively analyzed within the $M$-product framework.Comprehensive numerical evaluations across image completion, denoising, and background subtraction tasks validate the superior performance of this approach relative to established benchmark methods.

Comments: 32 pages

Submitted: 2026-04-15 ArXiv ID: 2604.13525v1

▶

Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization

Authors

Sida Liu, Yangzi Guo, Mingyuan Wang

Abstract

Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.

Submitted: 2026-04-15 ArXiv ID: 2604.13484v1

▶

Universality of Gaussian-Mixture Reverse Kernels in Conditional Diffusion

▶

Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making

Authors

Pramudita Satria Palar, Paul Saves, Muhammad Daffa Robani, et al.

Abstract

The simulation of complex systems increasingly relies on sophisticated but fundamentally opaque computational black-box simulators. Surrogate models play a central role in reducing the computational cost of complex systems simulations across a wide range of scientific and engineering domains. Notwithstanding, they inevitably inherit and often exacerbate this black-box nature, obscuring how input variables drive physical responses. Conversely, Explainable Artificial Intelligence (XAI) offers powerful tools to unpack these models. Yet, XAI methods struggle with engineering-specific constraints, such as highly correlated inputs, dynamical systems, and rigorous reliability requirements. Consequently, surrogate modeling and XAI have largely evolved as distinct fields of research, despite their strong complementarity. To reconnect these approaches, this state-of-the-art survey provides a structured perspective that maps existing XAI techniques onto the various stages of surrogate modeling workflows for design and exploration. To ground this synthesis, we draw upon illustrative applications across both equation-based simulations and agent-based modeling. We survey a broad spectrum of techniques, highlighting their strengths for revealing interactions and supporting human comprehension. Finally, we identify pressing open challenges, including the explainability of dynamical systems and the handling of mixed-variable systems, and propose a research agenda to make explainability a core, embedded element of simulation-driven workflows from model construction through decision-making. By transforming opaque emulators into explainable tools, this agenda empowers practitioners to move beyond accelerating simulations to extracting actionable insights from complex system behaviors.

Comments: Accepted for publication in Archives of Computational Methods in Engineering, 2026, ID d9d36aab-3723-4a70-b2ce-166435179528

Submitted: 2026-04-15 ArXiv ID: 2604.14240v1

▶

Authors

Stavros Kassinos

Abstract

Physics-informed neural networks (PINNs) are often selected by a single scalar loss even when the quantity of interest is more specific. We study a hybrid design in which the governing PDE residual remains automatic-differentiation (AD) based, while finite differences (FD) appear only in a weak auxiliary term that penalizes gradients of the sampled residual field. The FD term regularizes the residual field without replacing the PDE residual itself. We examine this idea in two stages. Stage 1 is a controlled Poisson benchmark comparing a baseline PINN, the FD residual-gradient regularizer, and a matched AD residual-gradient baseline. Stage 2 transfers the same logic to a three-dimensional annular heat-conduction benchmark (PINN3D), where baseline errors concentrate near a wavy outer wall and the auxiliary grid is implemented as a body-fitted shell adjacent to the wall. In Stage 1, the FD regularizer reproduces the main effect of residual-gradient control while exposing a trade-off between field accuracy and residual cleanliness. In Stage 2, the shell regularizer improves the application-facing quantities, namely outer-wall flux and boundary-condition behavior. Across seeds 0-5 and 100k epochs, the most reliable tested configuration is a fixed shell weight of 5e-4 under the Kourkoutas-beta optimizer regime: relative to a matched run without the shell term, it reduces the mean outer-wall BC RMSE from 1.22e-2 to 9.29e-4 and the mean wall-flux RMSE from 9.21e-3 to 9.63e-4. Adam with beta2=0.999 becomes usable when the initial learning rate is reduced to 1e-3, although its shell benefit is less robust than under Kourkoutas-beta. Overall, the results support a targeted view of hybrid PINNs: an auxiliary-only FD regularizer is most valuable when it is aligned with the physical quantity of interest, here the outer-wall flux.

Comments: 18 pages, 5 figures, 10 tables

Submitted: 2026-04-15 ArXiv ID: 2604.14472v1

▶

Bias in Surface Electromyography Features across a Demographically Diverse Cohort

Authors

Aditi Agrawal, Celine John Philip, Giancarlo K. Sagastume, et al.

Abstract

Neuromotor decoding from upper-limb electromyography (sEMG) can enhance human-machine interfaces and offer a more natural means of controlling prosthetic limbs, virtual reality, and household electronics. Unfortunately, current sEMG technology does not always perform consistently across users because individual differences such as age and body mass index, among many others, can substantially alter signal quality. This variability makes sEMG characteristics highly idiosyncratic, often necessitating laborious personalization and iterative tuning to achieve reliable performance. This variability has particular import for sEMG-based assistive devices and neural interfaces, where demographic biases in sEMG features could undermine broad and fair deployment. In this study, we explore how demographic differences affect the sEMG signals produced and their implications for machine learning-based gesture decoding. We analyze the data set provided by, in which we derive 147 common sEMG features extracted from 81 demographically diverse individuals performing discrete hand gestures. Using mixed-effects linear models and partial least squares (PLS) analysis, which take into consideration demographic variables (including age, sex, height, weight, skin properties, subcutaneous fat, and hair density), we identify that 33\% (49 of 147) of commonly used sEMG features show significant associations with demographic characteristics. These results may help guide the development of fair and unbiased sEMG-based neural interfaces across a diverse population.

Comments: 17 pages, 4 Figures

Submitted: 2026-04-15 ArXiv ID: 2604.14460v1

▶

Asynchronous Probability Ensembling for Federated Disaster Detection

▶

Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers

Authors

Felipe Parodi, Jordan Matelsky, Melanie Segado

Abstract

Zero-ablation -- replacing token activations with zero vectors -- is widely used to probe token function in vision transformers. Register zeroing in DINOv2+registers and DINOv3 produces large drops (up to $-36.6$\,pp classification, $-30.9$\,pp segmentation), suggesting registers are functionally indispensable. However, three replacement controls -- mean-substitution, noise-substitution, and cross-image register-shuffling -- preserve performance across classification, correspondence, and segmentation, remaining within ${\sim}1$\,pp of the unmodified baseline. Per-patch cosine similarity shows these replacements genuinely perturb internal representations, while zeroing causes disproportionately large perturbations, consistent with why it alone degrades tasks. We conclude that zero-ablation overstates dependence on exact register content. In the frozen-feature evaluations we test, performance depends on plausible register-like activations rather than on exact image-specific values. Registers nevertheless buffer dense features from \texttt{[CLS]} dependence and are associated with compressed patch geometry. These findings, including the replacement-control results, replicate at ViT-B scale.

Comments: 12 pages, 10 figures, to be published in CVPR 2026 HOW Vision Interpretability Workshop Proceedings

Submitted: 2026-04-15 ArXiv ID: 2604.14433v1

▶

Three-Phase Transformer

Authors

Mohammad R. Abu Ayyash

Abstract

Abstract

Online A/B testing at scale relies on proxy metrics -- short-term, easily-measured signals used in place of slow-moving long-term outcomes. When the proxy-outcome relationship is heterogeneous across user segments, aggregate correlation can mask directional failures akin to Simpson's Paradox, leading to costly ship/no-ship errors. We introduce PROXIMA (Proxy Metric Validation Framework for Online Experiments), a lightweight diagnostic framework that scores proxy reliability through a composite of three complementary dimensions: normalised effect correlation, directional accuracy, and segment-level fragility rate. Unlike surrogate-index approaches that predict long-term treatment effects, PROXIMA directly audits whether a candidate proxy leads to correct launch decisions and flags the user segments where it fails. We validate PROXIMA on two public datasets -- the Criteo Uplift corpus (14M observations, advertising) and KuaiRec (7K users, video recommendation) -- using 80 simulated A/B tests. Early engagement metrics achieve a composite reliability of 0.80 on Criteo and 0.62 on KuaiRec, yielding 98.4% average decision agreement with an oracle policy. Fragility analysis reveals that recommendation domains exhibit substantially higher segment-level heterogeneity (68% fragility) than advertising (13%), yet directional accuracy remains above 96% in both cases. A sensitivity analysis over the weight space confirms that no single component suffices and that the composite provides substantially better discrimination between reliable and unreliable proxies than correlation alone. Code and reproduction scripts are available at: https://github.com/Avinash-Amudala/PROXIMA

Comments: 14 pages. Sole-author submission. Independent research. Companion code at https://github.com/Avinash-Amudala/PROXIMA. Zenodo archive: 10.5281/zenodo.15483241. Related US provisional patent application: 63/974,569 (filed Feb 3, 2026)

Submitted: 2026-04-15 ArXiv ID: 2604.14352v1

▶

Tight Sample Complexity Bounds for Best-Arm Identification Under Bounded Systematic Bias

▶

Path-Sampled Integrated Gradients

▶

When Missing Becomes Structure: Intent-Preserving Policy Completion from Financial KOL Discourse

▶

Thermodynamic Diffusion Inference with Minimal Digital Conditioning

Austin Talbot, Alex V. Kotlar, Yue Ke

Abstract

Targeted amplicon panels are widely used in oncology diagnostics, but providing per-gene performance guarantees for copy number variant (CNV) detection remains challenging due to amplification artifacts, process-mismatch heterogeneity, and limited validation sample sizes. While Bayesian CNV callers naturally quantify per-sample uncertainty, translating this into the frequentist population-level guarantees required for clinical validation, coverage rates, false-positive bounds, and minimum detectable copy-number changes, is a fundamentally different inferential problem. We show empirically that even robust Bayesian credible intervals, including coarsened posteriors and sandwich-adjusted intervals, are severely miscalibrated on panels with small amplicon counts per gene. To address this, we propose a hybrid framework that evaluates Bayesian posterior functionals on validation samples and models the resulting squared losses with a Gamma distribution, yielding tolerance intervals with valid frequentist coverage. Three components make the method practical under real-world constraints: (1) imputation that removes the influence of true CNV-positive samples without requiring known ground truth, (2) regularization to address small sample variability, and (3) evidence-based stratification on the log model evidence to accommodate non-exchangeable noise profiles arising from process mismatch. Evaluated on two targeted amplicon panels using leave-one-out cross-validation, the proposed method achieves single-digit mean absolute coverage error across all genes under both process-matched and unmatched conditions, whereas Bayesian comparators exhibit mean absolute errors exceeding 60\% on clinically relevant genes such as ERBB2.

Submitted: 2026-04-15 ArXiv ID: 2604.14305v1

▶

Quantum-inspired tensor networks in machine learning models

▶

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

Authors

Yuqiao Tan, Minzheng Wang, Bo Liu, et al.

Abstract

While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its potential is fundamentally bounded by the base model's existing output distribution. Optimizing the marginal distribution P(y) in the Pre-train Space addresses this bottleneck by encoding reasoning ability and preserving broad exploration capacity. Yet, conventional pre-training relies on static corpora for passive learning, leading to a distribution shift that hinders targeted reasoning enhancement. In this paper, we introduce PreRL (Pre-train Space RL), which applies reward-driven online updates directly to P(y). We theoretically and empirically validate the strong gradient alignment between log P(y) and log P(y|x), establishing PreRL as a viable surrogate for standard RL. Furthermore, we uncover a critical mechanism: Negative Sample Reinforcement (NSR) within PreRL serves as an exceptionally effective driver for reasoning. NSR-PreRL rapidly prunes incorrect reasoning spaces while stimulating endogenous reflective behaviors, increasing transition and reflection thoughts by 14.89x and 6.54x, respectively. Leveraging these insights, we propose Dual Space RL (DSRL), a Policy Reincarnation strategy that initializes models with NSR-PreRL to expand the reasoning horizon before transitioning to standard RL for fine-grained optimization. Extensive experiments demonstrate that DSRL consistently outperforms strong baselines, proving that pre-train space pruning effectively steers the policy toward a refined correct reasoning subspace.

Comments: Preprint. Our code is available at https://github.com/Trae1ounG/Pretrain_Space_RLVR

Submitted: 2026-04-15 ArXiv ID: 2604.14142v1

▶

LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning

▶

From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs

Authors

Itay Itzhak, Eliya Habba, Gabriel Stanovsky, et al.

Abstract

Evaluating LLMs is challenging, as benchmark scores often fail to capture models' real-world usefulness. Instead, users often rely on ``vibe-testing'': informal experience-based evaluation, such as comparing models on coding tasks related to their own workflow. While prevalent, vibe-testing is often too ad hoc and unstructured to analyze or reproduce at scale. In this work, we study how vibe-testing works in practice and then formalize it to support systematic analysis. We first analyze two empirical resources: (1) a survey of user evaluation practices, and (2) a collection of in-the-wild model comparison reports from blogs and social media. Based on these resources, we formalize vibe-testing as a two-part process: users personalize both what they test and how they judge responses. We then introduce a proof-of-concept evaluation pipeline that follows this formulation by generating personalized prompts and comparing model outputs using user-aware subjective criteria. In experiments on coding benchmarks, we find that combining personalized prompts and user-aware evaluation can change which model is preferred, reflecting the role of vibe-testing in practice. These findings suggest that formalized vibe-testing can serve as a useful approach for bridging benchmark scores and real-world experience.

Comments: Under review. 42 pages, 18 figures. Code and data at https://technion-cs-nlp.github.io/vibe-testing-llms

Submitted: 2026-04-15 ArXiv ID: 2604.14137v2

▶

Rhetorical Questions in LLM Representations: A Linear Probing Study

Authors

Louie Hong Yao, Vishesh Anand, Yuan Zhuang, et al.

Abstract

Rhetorical questions are asked not to seek information but to persuade or signal stance. How large language models internally represent them remains unclear. We analyze rhetorical questions in LLM representations using linear probes on two social-media datasets with different discourse contexts, and find that rhetorical signals emerge early and are most stably captured by last-token representations. Rhetorical questions are linearly separable from information-seeking questions within datasets, and remain detectable under cross-dataset transfer, reaching AUROC around 0.7-0.8. However, we demonstrate that transferability does not simply imply a shared representation. Probes trained on different datasets produce different rankings when applied to the same target corpus, with overlap among the top-ranked instances often below 0.2. Qualitative analysis shows that these divergences correspond to distinct rhetorical phenomena: some probes capture discourse-level rhetorical stance embedded in extended argumentation, while others emphasize localized, syntax-driven interrogative acts. Together, these findings suggest that rhetorical questions in LLM representations are encoded by multiple linear directions emphasizing different cues, rather than a single shared direction.

Comments: 18 pages, 15 figures, accepted to ACL 2026

Submitted: 2026-04-15 ArXiv ID: 2604.14128v1

▶

Complex Interpolation of Matrices with an application to Multi-Manifold Learning

▶

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

Authors

Junzhe Wang, Zhiheng Xi, yajie yang, et al.

Abstract

Search agents extend Large Language Models (LLMs) beyond static parametric knowledge by enabling access to up-to-date and long-tail information unavailable during pretraining. While reinforcement learning has been widely adopted for training such agents, existing approaches face key limitations: process supervision often suffers from unstable value estimation, whereas outcome supervision struggles with credit assignment due to sparse, trajectory-level rewards. To bridge this gap, we propose Contribution-Weighted GRPO (CW-GRPO), a framework that integrates process supervision into group relative policy optimization. Instead of directly optimizing process rewards, CW-GRPO employs an LLM judge to assess the retrieval utility and reasoning correctness at each search round, producing per-round contribution scores. These scores are used to rescale outcome-based advantages along the trajectory, enabling fine-grained credit assignment without sacrificing optimization stability. Experiments on multiple knowledge-intensive benchmarks show that CW-GRPO outperforms standard GRPO by 5.0\% on Qwen3-8B and 6.3\% on Qwen3-1.7B, leading to more effective search behaviors. Additional analysis reveals that successful trajectories exhibit concentrated contributions across rounds, providing empirical insight into search agent tasks.

Comments: Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Main Conference

Submitted: 2026-04-15 ArXiv ID: 2604.14267v1

▶

ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation

Authors

Xiaofan Zhou, Kyumin Lee

Abstract

Sequential recommendation has become increasingly prominent in both academia and industry, particularly in e-commerce. The primary goal is to extract user preferences from historical interaction sequences and predict items a user is likely to engage with next. Recent advances have leveraged contrastive learning and graph neural networks to learn more expressive representations from interaction histories -- graphs capture relational structure between nodes, while ID-based representations encode item-specific information. However, few studies have explored multi-view contrastive learning between ID and graph perspectives to jointly improve user and item representations, especially in settings where only interaction data is available without auxiliary information. To address this gap, we propose Multi-View Contrastive learning for sequential recommendation (MVCrec), a framework that integrates complementary signals from both sequential (ID-based) and graph-based views. MVCrec incorporates three contrastive objectives: within the sequential view, within the graph view, and across views. To effectively fuse the learned representations, we introduce a multi-view attention fusion module that combines global and local attention mechanisms to estimate the likelihood of a target user purchasing a target item. Comprehensive experiments on five real-world benchmark datasets demonstrate that MVCrec consistently outperforms 11 state-of-the-art baselines, achieving improvements of up to 14.44\% in NDCG@10 and 9.22\% in HitRatio@10 over the strongest baseline. Our code and datasets are available at https://github.com/sword-Lz/MMCrec.

Submitted: 2026-04-15 ArXiv ID: 2604.14114v1

▶

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

▶

Reinforcement Learning via Value Gradient Flow

Authors

Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, et al.

Abstract

We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on reparameterized policy gradient, which are difficult to scale to large generative models, or on reject sampling, which can be overly conservative when attempting to move beyond the behavior support. In this paper, we propose Value Gradient Flow (VGF), a scalable new paradigm for behavior-regularized RL. VGF casts behavior-regularized RL as an optimal transport problem that maps the reference distribution to the value-induced optimal policy distribution. We solve this transport problem via discrete gradient flow, where value gradients guide particles initialized from the reference distribution. Our analysis shows that VGF imposes regularization implicitly by controlling the transport budget. VGF eliminates explicit policy parameterization while remaining expressive and flexible, this enables adaptive test-time scaling by adjusting the transport budget. Extensive experiments demonstrate that VGF significantly outperforms prior methods, achieving state-of-the-art results on offline RL benchmarks (D4RL, OGBench) and LLM RL tasks. Code and runs can be found at https://ryanxhr.github.io/vgf.

Comments: ICLR 2026

Submitted: 2026-04-15 ArXiv ID: 2604.14265v1

▶

TIP: Token Importance in On-Policy Distillation

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

Authors

Yaocheng Zhang, Yuanheng Zhu, Wenyue Chong, et al.

Abstract

Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse rewards, weak credit assignment, and limited labeled data. Self-play offers a scalable route to reduce data dependence, but conventional self-play optimizes students only through sparse outcome rewards, leading to low learning efficiency. In this work, we observe that self-play naturally produces a question construction path (QCP) during task generation, an intermediate artifact that captures the reverse solution process. This reveals a new source of privileged information for self-distillation: self-play can itself provide high-quality privileged context for the teacher model in a low-cost and scalable manner, without relying on human feedback or curated privileged information. Leveraging this insight, we propose Privileged Information Self-Play ($π$-Play), a multi-agent self-evolution framework. In $π$-Play, an examiner generates tasks together with their QCPs, and a teacher model leverages QCP as privileged context to densely supervise a student via self-distillation. This design transforms conventional sparse-reward self-play into a dense-feedback self-evolution loop. Extensive experiments show that data-free $π$-Play surpasses fully supervised search agents and improves evolutionary efficiency by 2-3$\times$ over conventional self-play.

Comments: 26 pages, 12 figures

Multimodal Continual Instruction Tuning (MCIT) is essential for sequential task adaptation of Multimodal Large Language Models (MLLMs) but is severely restricted by catastrophic forgetting. While existing literature focuses on the reasoning language backbone, in this work, we expose a critical yet neglected dual-forgetting phenomenon across both perception drift in Cross-modal Projection Space and reasoning collapse in Low-rank Parameter Space. To resolve this, we present \textbf{MAny} (\textbf{M}erge \textbf{Any}thing), a framework that merges task-specific knowledge through \textbf{C}ross-modal \textbf{P}rojection \textbf{M}erging (\textbf{CPM}) and \textbf{L}ow-rank \textbf{P}arameter \textbf{M}erging (\textbf{LPM}). Specifically, CPM recovers perceptual alignment by adaptively merging cross-modal visual representations via visual-prototype guidance, ensuring accurate feature recovery during inference. Simultaneously, LPM eliminates mutual interference among task-specific low-rank modules by recursively merging low-rank weight matrices. By leveraging recursive least squares, LPM provides a closed-form solution that mathematically guarantees an optimal fusion trajectory for reasoning stability. Notably, MAny operates as a training-free paradigm that achieves knowledge merging via efficient CPU-based algebraic operations, eliminating additional gradient-based optimization beyond initial tuning. Our extensive evaluations confirm the superior performance and robustness of MAny across multiple MLLMs and benchmarks. Specifically, on the UCIT benchmark, MAny achieves significant leads of up to 8.57\% and 2.85\% in final average accuracy over state-of-the-art methods across two different MLLMs, respectively.

Submitted: 2026-04-15 ArXiv ID: 2604.14016v1

▶

Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning

▶

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

▶

Diffusion Language Models for Speech Recognition

▶

Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification

Authors

Mohammad Nooraiepour, Zezhang Song, Wei Li, et al.

Abstract

Accurate methane sorption prediction across heterogeneous coal ranks requires models that combine thermodynamic consistency, efficient knowledge transfer across data-scarce geological systems, and calibrated uncertainty estimates, capabilities that are rarely addressed together in existing frameworks. We present a physics-informed transfer learning framework that adapts a hydrogen sorption PINN to methane sorption prediction via Elastic Weight Consolidation, coal-specific feature engineering, and a three-phase curriculum that progressively balances transfer preservation with thermodynamic fine-tuning. Trained on 993 equilibrium measurements from 114 independent coal experiments spanning lignite to anthracite, the framework achieves R2 = 0.932 on held-out coal samples, a 227% improvement over pressure-only classical isotherms, while hydrogen pre-training delivers 18.9% lower RMSE and 19.4% faster convergence than random initialization. Five Bayesian uncertainty quantification approaches reveal a systematic divergence in performance across physics-constrained architectures. Monte Carlo Dropout achieves well-calibrated uncertainty at minimal overhead, while deep ensembles, regardless of architectural diversity or initialization strategy, exhibit performance degradation because shared physics constraints narrow the admissible solution manifold. SHAP and ALE analyses confirm that learned representations remain physically interpretable and aligned with established coal sorption mechanisms: moisture-volatile interactions are most influential, pressure-temperature coupling captures thermodynamic co-dependence, and features exhibit non-monotonic effects. These results identify Monte Carlo Dropout as the best-performing UQ method in this physics-constrained transfer learning framework, and demonstrate cross-gas transfer learning as a data-efficient strategy for geological material modeling.

Submitted: 2026-04-15 ArXiv ID: 2604.13992v1

▶

Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models

▶

Unsupervised domain transfer: Overcoming signal degradation in sleep monitoring by increasing scoring realism

P. Pedroni, F. Afzal, S. Abt, et al.

Abstract

New data for the total inclusive helicity-dependent cross section for the proton and deuteron were obtained in the photon energy interval 200-1400 MeV. The experiment was performed at the A2 tagged-photon facility of the Mainz Microtron (MAMI) using a circularly polarized photon beam and longitudinally polarized proton and deuteron targets. The reaction products were detected using the large-acceptance Crystal Ball/TAPS calorimeter, which covers 97% of the full solid angle. These new results, obtained with fine energy binning, significantly expand both the quantity and the quality of the available data for these observables and enable a detailed comparison with state-of-the-art theoretical calculations. From the combination of the results for the deuteron and the proton, important information could also be extracted for the free neutron. Based on these data, and using existing models to evaluate the missing contributions from unmeasured photon energy regions, the validity of the Gerasimov-Drell-Hearn (GDH) sum rule has been verified for the proton, the neutron, and the deuteron. These new data provide a precise experimental benchmark for theoretical models used to study nucleons, both in their free state and when embedded in the nuclear medium.

Comments: 16 pages, 9 figures

Submitted: 2026-04-15 ArXiv ID: 2604.14385v1

▶

AI-assisted modeling and Bayesian inference of unpolarized quark transverse momentum distributions from Drell-Yan data

M. Z. Serikow, D. Bazin, M. A. Caprio, et al.

Abstract

The spectroscopy of $^{11}$Be is explored using the $^{10}$Be$(d,p)$$^{11}$Be transfer reaction performed in inverse kinematics at $9.6\,\MeV/u$ using the Active Target Time Projection Chamber (AT-TPC) inside the SOLARIS solenoid. This experiment is the first attempt at coupling the AT-TPC with SOLARIS to perform a high luminosity transfer reaction measurement without compromising excitation energy and scattering angle resolutions. The angular momentum transfer for states up to $3.40\,\MeV$ are determined from distorted-wave Born approximation analysis of the measured angular distributions, from which the corresponding spectroscopic factors are deduced. These factors are compared with those from various shell model interactions, and those for the $3.40\,\MeV$ state are consistent with a positive parity assignment. Recent \textit{ab initio} no-core configuration interaction (NCCI) calculations with various nucleon-nucleon interactions are presented for the low-lying positive parity states of $^{11}$Be. The excitation energies produced using the Daejeon16 interaction are in good agreement with those found from both this experiment and the literature, thus supporting a positive parity assignment. The $3.40\,\MeV$ state, if assigned a tentative $J^π=3/2^+$, would then correspond to the second excited state of the $K^P=1/2^+$ one-neutron halo ground state rotational band also predicted from such NCCI calculations.

Comments: 11 pages, 7 figures

Submitted: 2026-04-15 ArXiv ID: 2604.13766v1

▶

Scattering lengths beyond the nuclear scale and the Efimov effect

▶

Global polarization of $Λ$ hyperons in hot QCD matter at TeV energies

▶

The Quest for Neutrinoless Double Beta Decay: Progress and Prospects

▶

Physics-driven Comparative Analysis of Various Statistical Distance Metrics and Normalizing Functions

▶ 2026-04-14

▶

Proton Structure from Neural Simulation-Based Inference at the LHC

Authors

Ricardo Barrué, Lisa Benato, Ali Kaan Güven, et al.

Abstract

The precise determination of the parton distribution functions (PDFs) of the proton is an essential ingredient for LHC analyses, including for those at the upcoming High-Luminosity LHC. So far, PDFs are determined from global fits to binned low-dimensional data obtained from unfolded hard-scattering cross section measurements. In this work we demonstrate for the first time the feasibility of neural simulation-based inference (NSBI) for constraining the proton PDFs using a high-dimensional unbinned data set. Exploiting the full statistical power of unbinned data removes the loss of information inherited by the binning procedure. As a proof-of-concept, we determine the gluon PDF from simulated data of top quark pair production at the LHC with $\sqrt{s}=13$ TeV. Taking into account both experimental and theoretical systematic uncertainties in the detector-level features, we demonstrate how the NSBI pipeline achieves significant improvements in precision compared to existing low-dimensional binned analyses. Our results illustrate the potential of unbinned inference to reduce the reliance on coarse approximations of uncertainties and their correlations entering PDF determinations, hence contributing to a new paradigm of unbinned detector-level ML-assisted measurements at the LHC.

Comments: 57 pages, 24 figures

Submitted: 2026-04-14 ArXiv ID: 2604.13157v1

▶

Z Boson Radiative Decay $Z\to μ^+ μ^- γ$ at the LHC

▶

Mass creation by the strong interaction: Glueballs -- status and perspectives

Projection of purification performance for the RELICS experiment

▶

Deciphering the nature of $P^Σ_{ψs}$ pentaquarks in the light of their electromagnetic multipole moments

Authors

Ulaş Özdem

Abstract

We calculate electromagnetic multipole moments of $Σ$-type strange hidden-charm pentaquarks $P^Σ_{ψs}$ (isospin triplet $Σ^+,Σ^0,Σ^-$) using QCD light-cone sum rules, with six (spin-1/2) and seven (spin-3/2) interpolating currents built from diquark-diquark-antiquark operators. We compute magnetic dipole $μ$ for all channels and, for spin-3/2, electric quadrupole ${\cal Q}$ and magnetic octupole ${\cal O}$ moments (first computation), and give the first quark-flavor decomposition. Scalar diquark currents yield charm-dominated, flavor-insensitive moments ($μ\in[-1.92,-1.21]μ_N$ for spin-1/2, $|μ|\lesssim1.2μ_N$ for spin-3/2), consistent with heavy-quark spin symmetry. Axial-vector diquark currents produce larger, flavor-sensitive moments with sign reversals governed by $e_u/e_d=-2$. For ${\cal Q}$, scalar-diquark currents give oblate deformations ($Q_0\approx-2.0\times10^{-2}{\rm fm}^2$) dominated by charm, while two-axial-vector-diquark currents predict prolate values up to $Q_0=+8.0\times10^{-2}{\rm fm}^2$, with sign reversal for $[su][uc]\bar{c}$ in two currents. Currents with scalar antiquark coupling yield a topology-independent octupole ${\cal O}\approx-0.25\times10^{-3}{\rm fm}^3$, a lattice QCD benchmark. Comparison with constituent quark models identifies four discriminants: $|μ|\gtrsim3μ_N$ in spin-1/2; sign of $μ$ for $[su][uc]\bar{c}$ in spin-3/2; non-zero ${\cal Q}$ (vanishes in $S$-wave molecules); and the ${\cal Q}$-${\cal O}$ sign correlation, probing $1/m_q$ weighting.

Comments: 33 pages, 9 tables and 5 figures

Submitted: 2026-04-14 ArXiv ID: 2604.12533v1

▶

Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$

▶

Cross-Domain Transfer with Particle Physics Foundation Models: From Jets to Neutrino Interactions

▶

High Energy Physics - Phenomenology

▶

Constraints on Vector-Like Top Dipole Interactions from Top-Associated Photon Measurements at the LHC

Authors

Mohammad Sahraei, Yasaman Hosseini, Mojtaba Mohammadi Najafabadi

Abstract

Vector-like top partners with electric charge $+2/3$ are predicted in many extensions of the Standard Model and are actively searched for at the LHC through their electroweak decays $T\to Wb$, $Zt$, and $Ht$. More general scenarios, however, allow dipole interactions that induce radiative decays $T\to tγ$ and $T\to tg$. We reinterpret precision measurements of top-associated photon production to constrain such dipole operators. This approach provides a complementary probe to traditional resonance searches, which rely on direct reconstruction of heavy states, by instead exploiting distortions in precision observables. Using unfolded differential cross sections for $t\bar{t}γ$ production measured by CMS and the fiducial $t\bar{t}γγ$ cross section reported by ATLAS, we derive constraints on the electromagnetic and chromomagnetic dipole couplings of a vector like $T$ quark within an effective field theory framework. We present limits in terms of the effective couplings $c_{tγ}$ and $c_{tg}$, as well as the corresponding branching fractions $BR(T \to tγ)$ and $BR(T \to tg)$, for masses in the range $500~GeV \le m_T \le 2.0~TeV$. For $m_T = 500~GeV$, the analysis reaches sensitivity to the electromagnetic dipole coupling as small as $c_{tγ} \simeq 0.005~TeV^{-1}$ in the gluon dominated scenario $B_γ = 0.1$, while the sensitivity degrades to $O(1)~TeV^{-1}$ at $m_T = 2.0~TeV$. We find that the $t\bar{t}γ$ and $t\bar{t}γγ$ measurements provide complementary sensitivity, probing different regions of parameter space and lifting degeneracies between electromagnetic and chromomagnetic dipole interactions. These results demonstrate that precision measurements of top-associated photon final states provide a powerful and complementary probe of vector-like quarks in scenarios where radiative decays dominate.

Submitted: 2026-04-14 ArXiv ID: 2604.13270v1

▶

Heavy baryons with relativistic quarks

▶

Searching for axions with quantum interferometry

Authors

Tanmay Kumar Poddar, Michael Spannowsky

Abstract

Quantum phase measurements offer a complementary route to axion searches. We show that axion-photon interactions can imprint both Aharonov-Bohm (AB) and Berry phases in experimentally motivated quantum setups. For a coherently oscillating axion dark matter background, the induced effective current generates a time dependent magnetic flux in an rf-SQUID, leading to a measurable voltage signal through the Josephson phase. For representative benchmarks, this AB phase search reaches the minimum axion-photon coupling $g_{aγγ}^{\mathrm{min}}\sim 7.8\times10^{-14}~\mathrm{GeV}^{-1}$ at axion mass $m_a\sim 10^{-10}~\mathrm{eV}$, with projected sensitivity that can improve on existing limits in that parameter space by roughly one to two orders of magnitude. We also identify a geometric phase observable in a Mach-Zehnder interferometer with an adiabatically rotating magnetic field, providing a proof-of-principle phase-based probe of meV-scale axions even when they do not constitute the dark matter, although sensitivity on the coupling remains weaker than current bounds with conservative tabletop benchmarks. Extending the analysis to a three level photon-axion quasiparticle (AQP)-axion system, with the AQP realized in a topological magnetic insulator, we find a potentially measurable THz Berry phase dominated by the AQP sector, furnishing a nontrivial validation of the formalism in a richer coupled system. These setups establish quantum phase observables as a useful new framework for axion searches, with immediate phenomenological promise in superconducting circuits and longer term potential in quantum enhanced interferometry.

Comments: 19 pages, 4 figures, comments are welcome

Submitted: 2026-04-14 ArXiv ID: 2604.13181v1

▶

$N$-Jettiness Soft Functions Made Simple

▶

Proton Structure from Neural Simulation-Based Inference at the LHC

Operator Identification in Charged Lepton-Flavor Violation: Global EFT Analysis with RG Evolution, Polarization Observables, and Bayesian Model Discrimination at Future Colliders

Graviton Production from Inflaton Condensate: Boltzmann vs Bogoliubov

Authors

Chenhuan Wang, Yong Xu, Wenbin Zhao

Abstract

We study graviton production from an oscillating inflaton condensate during reheating by systematically comparing Boltzmann and Bogoliubov descriptions for inflaton potentials of the form $V(φ)\proptoφ^n$ around the minimum. The Bogoliubov framework provides a unified description of graviton production, capturing both perturbative and non-perturbative effects across short and long wavelengths, whereas the Boltzmann approach is restricted to perturbative production at short wavelengths. For the quadratic case ($n=2$), we find that the two approaches yield identical graviton spectra at short wavelengths, indicating that the Boltzmann treatments fully captures perturbative gravitational production in this regime. For steeper potentials ($n>2$), however, we identify a sizable contribution arising from the non-adiabatic transition between inflation and reheating. This component is naturally incorporated in the Bogoliubov formalism but absent in the Boltzmann description, and we show that it is important over a broad range of momenta. We derive analytic approximations within both frameworks that clarify the physical origin and scaling behavior of the spectrum. Our results delineate the regime of validity of Boltzmann approaches and show that, for steeper inflaton potentials, graviton production is governed by non-adiabatic transition dynamics for which the Bogoliubov formalism provides the most appropriate description.

Comments: 29 pages, 7 figures

Submitted: 2026-04-14 ArXiv ID: 2604.12687v1

▶

Next-to-next-to-next-to-leading order QCD corrections to photon-pair production

▶

Open-flavor threshold effects on quarkonium spectrum in the BOEFT

Ulaş Özdem

Abstract

Comments: 33 pages, 9 tables and 5 figures

Submitted: 2026-04-14 ArXiv ID: 2604.12533v1

▶

Acoustic instability at shock-wave precursors

Authors

Antonio Capanema, Pasquale Blasi, Emanuele Sobacchi

Abstract

Magnetic field amplification is an integral part of the process of particle acceleration at non-relativistic shocks. It is necessary to reach the maximum energies required by observations, especially in supernova remnants, thought to be sources of the bulk of Galactic cosmic rays. Such amplification can be caused by the acoustic instability that develops when small density perturbations interact with the cosmic-ray pressure gradient in the upstream of a cosmic-ray-modified shock. The vorticity induced by the nonlinear development of the instability may lead to turbulence, which amplifies the pre-existing magnetic fields. To study this phenomenon, we use the PLUTO code to carry out 2D (and some 3D) magnetohydrodynamical simulations of the evolution of small density perturbations in the presence of an assigned cosmic-ray pressure gradient. Adopting more realistic values of Mach number and cosmic-ray acceleration efficiency than previously assumed in the literature, we show that the acoustic instability can transform small density perturbations into large nonlinear structures while the fluid crosses the precursor region of a cosmic-ray-modified shock. We study the power spectrum of turbulent magnetic fluctuations that may be important to scatter particles. We comment on the possible constructive interference between acoustic and non-resonant streaming instabilities. We discuss limitations of previous and current numerical investigations in accessing spatial scales where turbulence is expected to turn nonlinear, and outline perspectives for future investigations.

Comments: 15 pages, 8 figures, accepted for publication in Astronomy & Astrophysics

Farbod Alinezhad, Jianfei Cao, Gary J. Young, et al.

Abstract

Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate uncertainty quantification in existing methods. We introduce the Causal Diffusion Model (CDM), the first denoising diffusion probabilistic approach explicitly designed to generate full probabilistic distributions of counterfactual outcomes under sequential interventions. CDM employs a novel residual denoising architecture with relational self-attention, capturing intricate temporal dependencies and multimodal outcome trajectories without requiring explicit adjustments (e.g., inverse-probability weighting or adversarial balancing) for confounding. In rigorous evaluation on a pharmacokinetic-pharmacodynamic tumor-growth simulator widely adopted in prior work, CDM consistently outperforms state-of-the-art longitudinal causal inference methods, achieving a 15-30% relative improvement in distributional accuracy (1-Wasserstein distance) while maintaining competitive or superior point-estimate accuracy (RMSE) under high-confounding regimes. By unifying uncertainty quantification and robust counterfactual prediction in complex, sequentially confounded settings, without tailored deconfounding, CDM offers a flexible, high-impact tool for decision support in medicine, policy evaluation, and other longitudinal domains.

Submitted: 2026-04-14 ArXiv ID: 2604.12992v1

▶

An Optimal Sauer Lemma Over $k$-ary Alphabets

Authors

Steve Hanneke, Qinglin Meng, Shay Moran, et al.

Abstract

The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).

Comments: 38 pages

Submitted: 2026-04-14 ArXiv ID: 2604.12952v1

▶

Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

Authors

Dorival Leão, Alberto Ohashi, Simone Scotti, et al.

Abstract

This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.

Comments: 74 pages, 3 figures

Large language models (LLMs) can generate survey responses at low cost, but their reliability varies substantially across questions and is unknown before data collection. Deploying LLMs in surveys still requires costly human responses for verification and correction. How should a limited human-labeling budget be allocated across questions in real time? We propose an adaptive allocation algorithm that learns which questions are hardest for the LLM while simultaneously collecting human responses. Each human label serves a dual role: it improves the estimate for that question and reveals how well the LLM predicts human responses on it. The algorithm directs more budget to questions where the LLM is least reliable, without requiring any prior knowledge of question-level LLM accuracy. We prove that the allocation gap relative to the best possible allocation vanishes as the budget grows, and validate the approach on both synthetic data and a real survey dataset with 68 questions and over 2000 respondents. On real survey data, the standard practice of allocating human labels uniformly across questions wastes 10--12% of the budget relative to the optimal; our algorithm reduces this waste to 2--6%, and the advantage grows as questions become more heterogeneous in LLM prediction quality. The algorithm achieves the same estimation quality as traditional uniform sampling with fewer human samples, requires no pilot study, and is backed by formal performance guarantees validated on real survey data. More broadly, the framework applies whenever scarce human oversight must be allocated across tasks where LLM reliability is unknown.

Submitted: 2026-04-14 ArXiv ID: 2604.12497v1

▶

A Bayesian Perspective on the Role of Epistemic Uncertainty for Delayed Generalization in In-Context Learning

▶

Information-Geometric Decomposition of Generalization Error in Unsupervised Learning

Authors

Gilhan Kim

Abstract

We decompose the Kullback--Leibler generalization error (GE) -- the expected KL divergence from the data distribution to the trained model -- of unsupervised learning into three non-negative components: model error, data bias, and variance. The decomposition is exact for any e-flat model class and follows from two identities of information geometry: the generalized Pythagorean theorem and a dual e-mixture variance identity. As an analytically tractable demonstration, we apply the framework to $ε$-PCA, a regularized principal component analysis in which the empirical covariance is truncated at rank $N_K$ and discarded directions are pinned at a fixed noise floor $ε$. Although rank-constrained $ε$-PCA is not itself e-flat, it admits a technical reformulation with the same total GE on isotropic Gaussian data, under which each component of the decomposition takes closed form. The optimal rank emerges as the cutoff $λ_{\mathrm{cut}}^{*} = ε$ -- the model retains exactly those empirical eigenvalues exceeding the noise floor -- with the cutoff reflecting a marginal-rate balance between model-error gain and data-bias cost. A boundary comparison further yields a three-regime phase diagram -- retain-all, interior, and collapse -- separated by the lower Marchenko--Pastur edge and an analytically computable collapse threshold $ε_{*}(α)$, where $α$ is the dimension-to-sample-size ratio. All claims are verified numerically.

Comments: 21 pages, 3 figures

Submitted: 2026-04-14 ArXiv ID: 2604.12340v1

▶

Fine-tuning Factor Augmented Neural Lasso for Heterogeneous Environments

Authors

Jinhang Chai, Jianqing Fan, Cheng Gao, et al.

Abstract

Fine-tuning is a widely used strategy for adapting pre-trained models to new tasks, yet its methodology and theoretical properties in high-dimensional nonparametric settings with variable selection have not yet been developed. This paper introduces the fine-tuning factor augmented neural Lasso (FAN-Lasso), a transfer learning framework for high-dimensional nonparametric regression with variable selection that simultaneously handles covariate and posterior shifts. We use a low-rank factor structure to manage high-dimensional dependent covariates and propose a novel residual fine-tuning decomposition in which the target function is expressed as a transformation of a frozen source function and other variables to achieve transfer learning and nonparametric variable selection. This augmented feature from the source predictor allows for the transfer of knowledge to the target domain and reduces model complexity there. We derive minimax-optimal excess risk bounds for the fine-tuning FAN-Lasso, characterizing the precise conditions, in terms of relative sample sizes and function complexities, under which fine-tuning yields statistical acceleration over single-task learning. The proposed framework also provides a theoretical perspective on parameter-efficient fine-tuning methods. Extensive numerical experiments across diverse covariate- and posterior-shift scenarios demonstrate that the fine-tuning FAN-Lasso consistently outperforms standard baselines and achieves near-oracle performance even under severe target sample size constraints, empirically validating the derived rates.

Comments: Authors are listed in alphabetical order

Submitted: 2026-04-14 ArXiv ID: 2604.12288v1

▶

Machine Learning

▶

BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

Mohsen Nayebi Kerdabadi, Arya Hadizadeh Moghaddam, Chen Chen, et al.

Abstract

In electronic health record (EHR) mining, learning high-quality representations of medical concepts (e.g., standardized diagnosis, medication, and procedure codes) is fundamental for downstream clinical prediction. However, robust concept representation learning is hindered by two key challenges: (i) clinically important cross-type dependencies (e.g., diagnosis-medication and medication-procedure relations) are often missing or incomplete in existing ontology resources, limiting the ability to model complex EHR patterns; and (ii) rich clinical semantics are often missing from structured resources, and even when available as text, are difficult to integrate with KG structure for representation learning. To address these challenges, we present CoMed, an LLM-empowered graph learning framework for medical concept representation. CoMed first builds a global knowledge graph (KG) over medical codes by combining statistically reliable associations mined from EHRs with type-constrained LLM prompting to infer semantic relations. It then utilizes LLMs to enrich the KG into a text-attributed graph by generating node descriptions and edge rationales, providing semantic signals for both concepts and their relationships. Finally, CoMed jointly trains a LoRA-tuned LLaMA text encoder with a heterogeneous GNN, fusing text semantics and graph structure into unified concept embeddings. Extensive experiments on MIMIC-III and MIMIC-IV show that CoMed consistently improves prediction performance and serves as an effective plug-in concept encoder for standard EHR pipelines.

Comments: This paper has been accepted at ACL 2026 main conference

Submitted: 2026-04-14 ArXiv ID: 2604.13331v1

▶

Multi-Task LLM with LoRA Fine-Tuning for Automated Cancer Staging and Biomarker Extraction

Authors

Jiahao Shao, Anam Nawaz Khan, Christopher Brett, et al.

Abstract

Pathology reports serve as the definitive record for breast cancer staging, yet their unstructured format impedes large-scale data curation. While Large Language Models (LLMs) offer semantic reasoning, their deployment is often limited by high computational costs and hallucination risks. This study introduces a parameter-efficient, multi-task framework for automating the extraction of Tumor-Node-Metastasis (TNM) staging, histologic grade, and biomarkers. We fine-tune a Llama-3-8B-Instruct encoder using Low-Rank Adaptation (LoRA) on a curated, expert-verified dataset of 10,677 reports. Unlike generative approaches, our architecture utilizes parallel classification heads to enforce consistent schema adherence. Experimental results demonstrate that the model achieves a Macro F1 score of 0.976, successfully resolving complex contextual ambiguities and heterogeneous reporting formats that challenge traditional extraction methods including rule-based natural language processing (NLP) pipelines, zero-shot LLMs, and single-task LLM baselines. The proposed adapter-efficient, multi-task architecture enables reliable, scalable pathology-derived cancer staging and biomarker profiling, with the potential to enhance clinical decision support and accelerate data-driven oncology research.

Comments: 11 pages, 3 figures and 4 tables in the main manuscript. Additional content, figures and tables are in supplementary material section. 17 pages in total

Submitted: 2026-04-14 ArXiv ID: 2604.13328v1

▶

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

▶

Beyond Uniform Sampling: Synergistic Active Learning and Input Denoising for Robust Neural Operators

Authors

Samrendra Roy, Souvik Chakraborty, Syed Bahauddin Alam

An unsupervised framework for hyperspectral image (HSI) clustering is proposed that incorporates masked deep representation learning with diffusion-based clustering, extending the Spatially-Regularized Superpixel-based Diffusion Learning ($S^2DL$) algorithm. Initially, a denoised latent representation of the original HSI is learned via an unsupervised masked autoencoder (UMAE) model with a Vision Transformer backbone. The UMAE takes spatial context and long-range spectral correlations into account and incorporates an efficient pretraining process via masking that utilizes only a small subset of training pixels. In the next stage, the entropy rate superpixel (ERS) algorithm is used to segment the image into superpixels, and a spatially regularized diffusion graph is constructed using Euclidean and diffusion distances within the compressed latent space instead of the HSI space. The proposed algorithm, Deep Spatially-Regularized Superpixel-based Diffusion Learning ($DS^2DL$), leverages more faithful diffusion distances and subsequent diffusion graph construction that better reflect the intrinsic geometry of the underlying data manifold, improving labeling accuracy and clustering quality. Experiments on Botswana and KSC datasets demonstrate the efficacy of $DS^2DL$.

Comments: To appear in IEEE IGARSS 2026

Submitted: 2026-04-14 ArXiv ID: 2604.13307v1

Yann V. Bellec

Abstract

Aerial object detection in UAV imagery presents unique challenges due to the high prevalence of tiny objects, adverse environmental conditions, and strict computational constraints. Standard YOLO-based detectors fail to address these jointly: their minimum detection stride of 8 pixels renders sub-32px objects nearly undetectable, their CIoU loss produces zero gradients for non-overlapping tiny boxes, and their architectures contain significant filter redundancy. We propose DroneScan-YOLO, a holistic system contribution that addresses these limitations through four coordinated design choices: (1) increased input resolution of 1280x1280 to maximize spatial detail for tiny objects, (2) RPA-Block, a dynamic filter pruning mechanism based on lazy cosine-similarity updates with a 10-epoch warm-up period, (3) MSFD, a lightweight P2 detection branch at stride 4 adding only 114,592 parameters (+1.1%), and (4) SAL-NWD, a hybrid loss combining Normalized Wasserstein Distance with size-adaptive CIoU weighting, integrated into YOLOv8's TaskAligned assignment pipeline. Evaluated on VisDrone2019-DET, DroneScan-YOLO achieves 55.3% mAP@50 and 35.6% mAP@50-95, outperforming the YOLOv8s baseline by +16.6 and +12.3 points respectively, improving recall from 0.374 to 0.518, and maintaining 96.7 FPS inference speed with only +4.1% parameters. Gains are most pronounced on tiny object classes: bicycle AP@50 improves from 0.114 to 0.328 (+187%), and awning-tricycle from 0.156 to 0.237 (+52%).

Comments: 12 pages, 10 figures

Submitted: 2026-04-14 ArXiv ID: 2604.13278v1

▶

Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

▶

Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling

Authors

Anton Saenko, Pranshav Gajjar, Abiodun Ganiyu, et al.

Abstract

Large Language Models (LLMs) are increasingly applied to complex telecommunications tasks, including 3GPP specification analysis and O-RAN network troubleshooting. However, a critical limitation remains: LLM-generated confidence scores are often biased and unreliable, frequently exhibiting systematic overconfidence. This lack of trustworthy self-assessment makes it difficult to verify model outputs and safely rely on them in practice. In this paper, we study confidence calibration in telecom-domain LLMs using the representative Gemma-3 model family (4B, 12B, and 27B parameters), evaluated on TeleQnA, ORANBench, and srsRANBench. We show that standard single-pass, verbalized confidence estimates fail to reflect true correctness, often assigning high confidence to incorrect predictions. To address this, we propose a novel Twin-Pass Chain of Thought (CoT)-Ensembling methodology for improving confidence estimation by leveraging multiple independent reasoning evaluations and aggregating their assessments into a calibrated confidence score. Our approach reduces Expected Calibration Error (ECE) by up to 88% across benchmarks, significantly improving the reliability of model self-assessment. These results highlight the limitations of current confidence estimation practices and demonstrate a practical path toward more trustworthy evaluation of LLM outputs in telecommunications.

Submitted: 2026-04-14 ArXiv ID: 2604.13271v1

▶

Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation

Authors

Yilang Zhang, Abraham Jaeger Mountain, Bingcong Li, et al.

Abstract

Meta-learning offers a principled framework leveraging \emph{task-invariant} priors from related tasks, with which \emph{task-specific} models can be fine-tuned on downstream tasks, even with limited data records. Gradient-based meta-learning (GBML) relies on gradient descent (GD) to adapt the prior to a new task. Albeit effective, these methods incur high computational overhead that scales linearly with the number of GD steps. To enhance efficiency and scalability, existing methods approximate the gradient of prior parameters (meta-gradient) via truncated backpropagation, yet suffer large approximation errors. Targeting accurate approximation, this work puts forth binomial GBML (BinomGBML), which relies on a truncated binomial expansion for meta-gradient estimation. This novel expansion endows more information in the meta-gradient estimation via efficient parallel computation. As a running paradigm applied to model-agnostic meta-learning (MAML), the resultant BinomMAML provably enjoys error bounds that not only improve upon existing approaches, but also decay super-exponentially under mild conditions. Numerical tests corroborate the theoretical analysis and showcase boosted performance with slightly increased computational overhead.

Comments: Accepted as poster at ICLR 2026. Code available at https://github.com/AbrahamJJM/binomgbml

Submitted: 2026-04-14 ArXiv ID: 2604.13263v1

▶

Rethinking Uncertainty in Segmentation: From Estimation to Decision

Authors

Saket Maganti

Abstract

Abstract

Anomaly detection aims to identify observations that deviate from expected behavior. Because anomalous events are inherently sparse, most frameworks are trained exclusively on normal data to learn a single reference model of normality. This implicitly assumes that normal behavior can be captured by a single, unconditional reference distribution. In practice, however, anomalies are often context-dependent: A specific observation may be normal under one operating condition, yet anomalous under another. As machine learning systems are deployed in dynamic and heterogeneous environments, these fixed-context assumptions introduce structural ambiguity, i.e., the inability to distinguish contextual variation from genuine abnormality under marginal modeling, leading to unstable performance and unreliable anomaly assessments. While modern sensing systems frequently collect multimodal data capturing complementary aspects of both system behavior and operating conditions, existing methods treat all data streams equally, without distinguishing contextual information from anomaly-relevant signals. As a result, abnormality is often evaluated without explicitly conditioning on operating conditions. We argue that multimodal anomaly detection should be reframed as a cross-modal contextual inference problem, in which modalities play asymmetric roles, separating context from observation, to define abnormality conditionally rather than relative to a single global reference. This perspective has implications for model design, evaluation protocols, and benchmark construction, and outline open research challenges toward robust, context-aware multimodal anomaly detection.

Submitted: 2026-04-14 ArXiv ID: 2604.13252v1

▶

Analog Optical Inference on Million-Record Mortgage Data

▶

A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models

Authors

Augustin de la Brosse, Damien Garreau, Thomas Houet, et al.

Abstract

Mapping the spatial distribution of species is essential for conservation policy and invasive species management. Species distribution models (SDMs) are the primary tools for this task, serving two purposes: achieving robust predictive performance while providing ecological insights into the driving factors of distribution. However, the increasing complexity of deep learning SDMs has made extracting these insights more challenging. To reconcile these objectives, we propose the first implementation of concept-based Explainable AI (XAI) for SDMs. We leverage the Robust TCAV (Testing with Concept Activation Vectors) methodology to quantify the influence of landscape concepts on model predictions. To enable this, we provide a new open-access landscape concept dataset derived from high-resolution multispectral and LiDAR drone imagery. It includes 653 patches across 15 distinct landscape concepts and 1,450 random reference patches, designed to suit a wide range of species. We demonstrate this approach through a case study of two aquatic insects, Plecoptera and Trichoptera, using two Convolutional Neural Networks and one Vision Transformer. Results show that concept-based XAI helps validate SDMs against expert knowledge while uncovering novel associations that generate new ecological hypotheses. Robust TCAV also provides landscape-level information, useful for policy-making and land management. Code and datasets are publicly available.

Submitted: 2026-04-14 ArXiv ID: 2604.13240v1

▶

Does Dimensionality Reduction via Random Projections Preserve Landscape Features?

Authors

Iván Olarte Rodríguez, Anja Jankovic, Thomas Bäck, et al.

Abstract

Exploratory Landscape Analysis (ELA) provides numerical features for characterizing black-box optimization problems. In high-dimensional settings, however, ELA suffers from sparsity effects, high estimator variance, and the prohibitive cost of computing several feature classes. Dimensionality reduction has therefore been proposed as a way to make ELA applicable in such settings, but it remains unclear whether features computed in reduced spaces still reflect intrinsic properties of the original landscape. In this work, we investigate the robustness of ELA features under dimensionality reduction via Random Gaussian Embeddings (RGEs). Starting from the same sampled points and objective values, we compute ELA features in projected spaces and compare them to those obtained in the original search space across multiple sample budgets and embedding dimensions. Our results show that linear random projections often alter the geometric and topological structure relevant to ELA, yielding feature values that are no longer representative of the original problem. While a small subset of features remains comparatively stable, most are highly sensitive to the embedding. Moreover, robustness under projection does not necessarily imply informativeness, as apparently robust features may still reflect projection-induced artifacts rather than intrinsic landscape characteristics.

Comments: 9 Pages, 5 figures, Submitted and accepted to Proceedings of The Genetic and Evolutionary Computation Conference 2026,

Submitted: 2026-04-14 ArXiv ID: 2604.13230v1

▶

KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs

▶

Identifiability of Potentially Degenerate Gaussian Mixture Models With Piecewise Affine Mixing

▶

Rare Event Analysis via Stochastic Optimal Control

Authors

Yuanqi Du, Jiajun He, Dinghuai Zhang, et al.

Abstract

Submitted: 2026-04-14 ArXiv ID: 2604.13213v1

▶

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

Authors

Chashi Mahiul Islam, Alan Villarreal, Mao Nishino, et al.

Abstract

As Large Language Models (LLMs) are increasingly integrated into agentic workflows, their unpredictability stemming from numerical instability has emerged as a critical reliability issue. While recent studies have demonstrated the significant downstream effects of these instabilities, the root causes and underlying mechanisms remain poorly understood. In this paper, we present a rigorous analysis of how unpredictability is rooted in the finite numerical precision of floating-point representations, tracking how rounding errors propagate, amplify, or dissipate through Transformer computation layers. Specifically, we identify a chaotic "avalanche effect" in the early layers, where minor perturbations trigger binary outcomes: either rapid amplification or complete attenuation. Beyond specific error instances, we demonstrate that LLMs exhibit universal, scale-dependent chaotic behaviors characterized by three distinct regimes: 1) a stable regime, where perturbations fall below an input-dependent threshold and vanish, resulting in constant outputs; 2) a chaotic regime, where rounding errors dominate and drive output divergence; and 3) a signal-dominated regime, where true input variations override numerical noise. We validate these findings extensively across multiple datasets and model architectures.

Comments: 8 pages, 9 figures

Submitted: 2026-04-14 ArXiv ID: 2604.13206v1

▶

Fast Voxelization and Level of Detail for Microgeometry Rendering

Authors

Javier Fabre, Carlos Castillo, Carlos Rodriguez-Pardo, et al.

Abstract

Many materials show anisotropic light scattering patterns due to the shape and local alignment of their underlying micro structures: surfaces with small elements such as fibers, or the ridges of a brushed metal, are very sparse and require a high spatial resolution to be properly represented as a volume. The acquisition of voxel data from such objects is a time and memory-intensive task, and most rendering approaches require an additional Level-of-Detail (LoD) data structure to aggregate the visual appearance, as observed from multiple distances, in order to reduce the number of samples computed per pixel (E.g.: MIP mapping). In this work we introduce first, an efficient parallel voxelization method designed to facilitate fast data aggregation at multiple resolution levels, and second, a novel representation based on hierarchical SGGX clustering that provides better accuracy than baseline methods. We validate our approach with a CUDA-based implementation of the voxelizer, tested both on triangle meshes and volumetric fabrics modeled with explicit fibers. Finally, we show the results generated with a path tracer based on the proposed LoD rendering model.

Comments: Accepted for publication in The Visual Computer. 16 pages, 7 figures, 3 tables. Supplementary material: https://javierfabre.com/projects/voxel-lod/supp.pdf

Submitted: 2026-04-14 ArXiv ID: 2604.13191v1

▶

HUANet: Hard-Constrained Unrolled ADMM for Constrained Convex Optimization

▶

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

Authors

Aadyot Bhatnagar, Peter Mørch Groth, Ali Madani

Abstract

Large language models can be aligned with human preferences through offline reinforcement learning (RL) on small labeled datasets. While single-objective alignment is well-studied, many real-world applications demand the simultaneous optimization of multiple conflicting rewards, e.g. optimizing both catalytic activity and specificity in protein engineering, or helpfulness and harmlessness for chatbots. Prior work has largely relied on linear reward scalarization, but this approach provably fails to recover non-convex regions of the Pareto front. In this paper, instead of scalarizing the rewards directly, we frame multi-objective RL itself as an optimization problem to be scalarized via smooth Tchebysheff scalarization, a recent technique that overcomes the shortcomings of linear scalarization. We use this formulation to derive Smooth Tchebysheff Optimization of Multi-Objective Preferences (STOMP), a novel offline RL algorithm that extends direct preference optimization to the multi-objective setting in a principled way by standardizing the individual rewards based on their observed distributions. We empirically validate STOMP on a range of protein engineering tasks by aligning three autoregressive protein language models on three laboratory datasets of protein fitness. Compared to state-of-the-art baselines, STOMP achieves the highest hypervolumes in eight of nine settings according to both offline off-policy and generative evaluations. We thus demonstrate that STOMP is a powerful, robust multi-objective alignment algorithm that can meaningfully improve post-trained models for multi-attribute protein optimization and beyond.

Submitted: 2026-04-14 ArXiv ID: 2604.13175v1

▶

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

Authors

Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar, et al.

Abstract

3D Gaussian Splatting (3DGS) has recently enabled highly photorealistic 3D reconstruction from casually captured multi-view images. However, this accessibility raises a privacy concern: publicly available images or videos can be exploited to reconstruct detailed 3D models of scenes or objects without the owner's consent. We present PatchPoison, a lightweight dataset-poisoning method that prevents unauthorized 3D reconstruction. Unlike global perturbations, PatchPoison injects a small high-frequency adversarial patch, a structured checkerboard, into the periphery of each image in a multi-view dataset. The patch is designed to corrupt the feature-matching stage of Structure-from-Motion (SfM) pipelines such as COLMAP by introducing spurious correspondences that systematically misalign estimated camera poses. Consequently, downstream 3DGS optimization diverges from the correct scene geometry. On the NeRF-Synthetic benchmark, inserting a 12 X 12 pixel patch increases reconstruction error by 6.8x in LPIPS, while the poisoned images remain unobtrusive to human viewers. PatchPoison requires no pipeline modifications, offering a practical, "drop-in" preprocessing step for content creators to protect their multi-view data.

Comments: CVPR Workshop on Security, Privacy, and Adversarial Robustness in 3D Generative Vision Models (SPAR-3D), 2026

Authors

Yecheng Wu, Song Han, Hai Cai

Abstract

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting in substantial infrastructure overhead. In this work, we investigate whether on-policy distillation can be performed offline. A natural approach is to precompute teacher log-probabilities once over SFT rollouts and reuse them during training. In practice, however, this offline variant fails to reliably match the performance of standard OPD. To understand this discrepancy, we identify a previously overlooked condition that is critical for any OPD pipeline, which we term teacher consistency. This condition requires that the same teacher model be used for both supervised fine-tuning and OPD. We show that violating teacher consistency introduces an irreducible gradient bias, causing both offline and online OPD to converge to a suboptimal fixed point regardless of training duration. Building on this insight, we propose Lightning OPD, an offline on-policy distillation framework that enforces teacher consistency by precomputing teacher log-probabilities over SFT rollouts. This design eliminates the need for a live teacher server entirely. We further show that, under teacher consistency, Lightning OPD shares the same optimum as standard OPD, with bounded gradient discrepancy and an implicit regularization effect that helps prevent policy drift. Extensive experiments on mathematical reasoning and code generation demonstrate that Lightning OPD achieves state-of-the-art performance with significantly improved efficiency. Starting from an SFT-initialized Qwen3-8B-Base model, Lightning OPD reaches 69.9% on AIME 2024 in just 30 GPU hours, achieving a 4.0x speedup over standard OPD and substantially lowering the barrier to entry for academic research on LLM post-training.

Submitted: 2026-04-14 ArXiv ID: 2604.13010v1

▶

Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data

Authors

Farbod Alinezhad, Jianfei Cao, Gary J. Young, et al.

The most cited calibration result in deep learning -- post-temperature-scaling ECE of 0.012 on CIFAR-100 (Guo et al., 2017) -- is below the statistical noise floor. We prove this is not a failure of the experiment but a law: the minimax rate for estimating calibration error with model error rate epsilon is Theta((Lepsilon/m)^{1/3}), and no estimator can beat it. This "verification tax" implies that as AI models improve, verifying their calibration becomes fundamentally harder -- with the same exponent in opposite directions. We establish four results that contradict standard evaluation practice: (1) self-evaluation without labels provides exactly zero information about calibration, bounded by a constant independent of compute; (2) a sharp phase transition at mepsilon approx 1 below which miscalibration is undetectable; (3) active querying eliminates the Lipschitz constant, collapsing estimation to detection; (4) verification cost grows exponentially with pipeline depth at rate L^K. We validate across five benchmarks (MMLU, TruthfulQA, ARC-Challenge, HellaSwag, WinoGrande; ~27,000 items) with 6 LLMs from 5 families (8B-405B parameters, 27 benchmark-model pairs with logprob-based confidence), 95% bootstrap CIs, and permutation tests. Self-evaluation non-significance holds in 80% of pairs. Across frontier models, 23% of pairwise comparisons are indistinguishable from noise, implying that credible calibration claims must report verification floors and prioritize active querying once gains approach benchmark resolution.

Comments: 25 pages, 16 figures, 6 tables. Code and data at https://github.com/Jason-Wang313/verification-tax

Submitted: 2026-04-14 ArXiv ID: 2604.12951v1

▶

Parcae: Scaling Laws For Stable Looped Language Models

Authors

Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, et al.

Abstract

Traditional fixed-depth architectures scale quality by increasing training FLOPs, typically through increased parameterization, at the expense of a higher memory footprint, or data. A potential alternative is looped architectures, which instead increase FLOPs by sending activations through a block of layers in a loop. While promising, existing recipes for training looped architectures can be unstable, suffering from residual explosion and loss spikes. We address these challenges by recasting looping as a nonlinear time-variant dynamical system over the residual stream. Via a linear approximation to this system, we find that instability occurs in existing looped architectures as a result of large spectral norms in their injection parameters. To address these instability issues, we propose Parcae, a novel stable, looped architecture that constrains the spectral norm of the injection parameters via discretization of a negative diagonal parameterization. As a result, Parcae achieves up to 6.3% lower validation perplexity over prior large-scale looped models. Using our stable looped architecture, we investigate the scaling properties of looping as a medium to improve quality by increasing FLOPs in training and test-time. For training, we derive predictable power laws to scale FLOPs while keeping parameter count fixed. Our initial scaling laws suggest that looping and data should be increased in tandem, given a fixed FLOP budget. At test-time, we find that Parcae can use looping to scale compute, following a predictable, saturating exponential decay. When scaled up to 1.3B parameters, we find that Parcae improves CORE and Core-Extended quality by 2.99 and 1.18 points when compared to strong Transformer baselines under a fixed parameter and data budget, achieving a relative quality of up to 87.5% a Transformer twice the size.

Submitted: 2026-04-14 ArXiv ID: 2604.12946v1

▶

Adaptive Data Dropout: Towards Self-Regulated Learning in Deep Neural Networks

▶

Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

Authors

Dorival Leão, Alberto Ohashi, Simone Scotti, et al.

Abstract

Comments: 74 pages, 3 figures

▶

Characterization of the 20-inch Photomultiplier Tubes for RENE Detector

▶

Implementation and commissioning of an experimental system towards sub-eV axion-like particle searches with 0.1 PW laser at ELI-NP

▶

Filtering hits for speeding up online track reconstruction at hadron colliders

Authors

Andrea Coccaro, Carlo Schiavi, Alessandro Zaio

Abstract

Collider experiments are equipped with trigger systems that rapidly inspect the physics content emerging from collisions to decide whether the resulting products are worth saving for later analysis. One crucial aspect for analyzing the final states originating from the collisions is to process the information produced by charged particles in the innermost detectors to reconstruct the corresponding trajectories. This task is a challenge for the experiments running at the Large Hadron Collider (LHC) at CERN because of the large number of secondary collisions per bunch crossing, the so-called pile-up vertices, giving rise to extremely high hit occupancies in the detector layers close to the beam line. Reconstructing tracks is a combinatorial problem and its processing time strongly depends on the average pile-up per event. The future accelerator-complex upgrade to the High-Luminosity LHC, implying even higher detector occupancies, will result in a considerable growth of the computational cost of the current trigger strategies. To face this issue, a new technique for assisting track reconstruction by filtering out unnecessary detector information is presented and characterized in this work. The algorithm is based on a convolutional-neural-network architecture which can be easily deployed on accelerator cards. The impact of this approach is assessed and future prospects are also discussed.

Submitted: 2026-04-13 ArXiv ID: 2604.11648v1

▶

All-charm tetraquarks at hadron colliders: A high-precision fragmentation perspective

Authors

Francesco Giovanni Celiberto

Abstract

We present the TQ4Q2.0 fragmentation functions for the production of all-heavy (fully heavy) $S$-wave tetraquarks ($T_{4Q}$) with scalar ($0^{++}$), axial-vector ($1^{+-}$), and tensor ($2^{++}$) quantum numbers in high-energy hadronic collisions. This work extends the previous TQ4Q1.1 framework by incorporating nonconstituent heavy-quark contributions and introducing a replica-based uncertainty-quantification strategy derived from multi-scale variations (MHOUs). The construction follows a nonrelativistic QCD factorization approach, combining gluon- and heavy-quark-initiated fragmentation channels at leading power. Initial-scale inputs are modeled through updated potential-inspired wave functions, while the subsequent DGLAP evolution is performed via the threshold-aware HF-NRevo scheme. A comprehensive systematic analysis of uncertainties is carried out, with contributions from color-composite long-distance matrix elements (LDMEs) and perturbative multiscale inputs. The resulting TQ4Q2.0 grids, publicly released in LHAPDF6 format, provide the first complete phenomenological set for all-heavy exotics, enabling precise studies of all-charm tetraquark production and jet-associated observables within the JETHAD environment. This article completes the high-energy resummation-driven generation of the TQ4Q program and establishes a definitive baseline for future collider-oriented analyses of all-heavy multiquark dynamics.

Comments: 49 pages, 13 figures, 5 tables. Six NLO collinear FF sets for fully heavy tetraquarks (TQ4Q2.0), covering scalar, axial, and tensor states. Includes MHOU replicas, LDME variations, and DGLAP evolution, released in LHAPDF format at https://github.com/FGCeliberto/Collinear_FFs. Supplemental Mathematica notebook with all short-distance coefficients

Authors

Kiminad A. Mamo

Abstract

Holographic QCD reproduces the leading short-distance vector-current two-point function in vacuum, fixing the bulk gauge coupling by matching the logarithmic $Q^2$ dependence of the boundary current correlator. We show that this vacuum matching extends to the off-forward hadronic current-current correlator relevant for DDVCS/DVCS. Starting from the fixed-$j$ $t$-channel Witten diagram, we derive a factorized holographic Compton amplitude whose ultraviolet photon vertex is universal and model independent, while all infrared sensitivity is isolated in hadronic conformal moments. In the conformal limit this upper vertex depends only on the pure-AdS bulk wave functions of the virtual photons and yields an exact Gauss hypergeometric kernel. In the collinear window and at a single matching scale $Q=μ=μ_0=μ_\ast$, this kernel matches exactly the $\pm$-basis Wilson coefficients of the singlet conformal operator product expansion in perturbative QCD. The channel dictionary is fixed dynamically: the closed-string branch matches the protected $(-)$ eigenchannel, while the open-string branch matches the unprotected $(+)$ eigenchannel, with the first physical even moment $j=2$ and the distinct $\sqrt{j-2}$ versus $\sqrt{j-1}$ branch points providing the sharpest anchor. The result is therefore an exact fixed-scale matching statement for the hadronic current-current correlator in the fixed-$j$ channel. It identifies the holographic DDVCS/DVCS amplitude as a hadronic generalization of the familiar vacuum current-correlator matching.

Comments: 4 pages, 1 figure

Submitted: 2026-04-13 ArXiv ID: 2604.12037v1

▶

Baryogenesis and Dark Matter from non-thermally produced WIMPs

▶

Asymptotic Theorems and Averaging in Scalar Field Cosmology

▶

Operator structure of power corrections and anomalous scaling in energy correlators

▶

On the effective restoration of $U(1)_A$ symmetry at finite temperature

▶

Novel ringdown tests of general relativity with black hole greybody factors

Authors

Romeo Felice Rosato, Francesco Crescimbeni, Sophia Yi, et al.

Abstract

We present GreyRing, a new model for the post-merger signal in black-hole binary coalescences based on the greybody factor of the remnant. The model accurately reproduces the full frequency-domain ringdown signal of a large set of comparable-mass, aligned-spin numerical relativity waveforms, achieving mismatches of order ${\cal O}(10^{-6})$ for the dominant $(\ell,m)=(2,2)$ mode, and typically outperforming state-of-the-art time-domain models. Building on this model, we introduce a novel consistency test of strong gravity based on the greybody factor: the remnant mass and spin inferred from GreyRing can be compared with those obtained through standard black hole spectroscopy. This agnostic test relies exclusively on the post-merger signal and does not require the inclusion of overtones or the choice of very early ringdown starting times, combining the advantages of inspiral-merger-ringdown consistency tests and traditional black hole spectroscopy. We apply the test to GW250114 and find that the remnant mass and spin inferred from GreyRing are consistent with those measured from the full signal. Remarkably, the inferred parameters can be measured with a precision comparable to, or slightly better than, that achieved with standard black-hole spectroscopy. Our greybody-factor waveform model allows for new precision tests of strong gravity using the ringdown signal.

Submitted: 2026-04-13 ArXiv ID: 2604.11895v1

▶

ALP production in Lepton Flavour Violating meson, tau and gauge boson decays

Francesco Giovanni Celiberto

Abstract

Submitted: 2026-04-13 ArXiv ID: 2604.11646v1

▶

Quantum simulating multi-particle processes in high energy nuclear physics: dijet production and color (de)coherence

Authors

João Barata, Meijian Li, Wenyang Qian, et al.

Abstract

Hard scattering events in high-energy collisions produce highly virtual partons that subsequently fragment into collimated hadronic cascades. When such partonic showers evolve in a QCD medium, as in deep-inelastic scattering or heavy-ion collisions, the resulting multi-particle distributions encode information about the surrounding matter. Decades of theoretical developments have led to a consistent and order-by-order improvable perturbative description of the shower. This description needs, however, the non-perturbative input that encodes the structure of the hadronic matter. The determination of such input remains challenging within conventional computational approaches, thereby limiting the applicability of the approach. In this work, we develop a framework that employs quantum simulation techniques to compute multi-particle processes in such environments by mapping partonic cross-sections to quantum circuits. As benchmarks, we analyze dipole formation and the QCD antenna radiation pattern at leading order in the strong coupling constant, comparing the results with analytic estimates in simplified limits. The quantum circuit formulation here introduced naturally extends to higher perturbative orders and enables amplitude-level computations in complex matter backgrounds. This provides a systematic foundation for applying quantum information science methods to study multi-particle dynamics in QCD media.

Comments: 32 pages, 8 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11616v1

▶

Morphological false-vacuum decay in dipolar supersolids

Renormalization of three-quark operators with up to two derivatives at three loops

▶

Searching for apparent baryon number violation in $Λ_c^+$ decays at the Super Tau-Charm Facility

Authors

Zeren Simon Wang, Xin-Ru Tang, Yu Zhang, et al.

Abstract

Comments: 24 pages plus refs, 7 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11329v1

▶

Dirac one-loop seesaw in a non-invertible fusion rule

▶

GeV gamma-ray emission in the field of the shell-type supernova remnant Vela Jr revisited

Authors

Ting-Ting Ge, Qi-Hang Wu, Pak-Hin Thomas Tam, et al.

Abstract

We present an updated analysis of the gigaelectronvolt (GeV) gamma-ray emission from the shell-type supernova remnant (SNR) RX J0852.0-4622 (Vela Jr) using 15 yr of Fermi Large Area Telescope (Fermi-LAT) data. We quantitatively model the GeV morphology and find that it is best described by the masked H.E.S.S. shell template, indicating that the embedded pulsar wind nebula (PWN) contributes little to the GeV flux. The 0.1-500 GeV spectrum is well fitted by a hard power law with a photon index of $1.77 \pm 0.03$ and connects smoothly to the teraelectronvolt (TeV) spectrum, confirming previous results with improved precision. We further construct an independent eROSITA shell template and derive the 1-5 keV X-ray spectral energy distribution (SED) of the whole remnant, which provides new constraints on the synchrotron emission. We model the multi-wavelength (MWL) SED with a pure leptonic model and a hybrid lepton-hadron model. While the pure leptonic model reproduces the overall broadband shape, the hybrid model provides a better statistical description of the same dataset, supporting a mixed-origin picture in which the hadronic contribution is mainly relevant in the GeV band and the TeV emission remains predominantly leptonic.

Comments: 11 pages, 6 figures, accepted for publication in MNRAS

Submitted: 2026-04-13 ArXiv ID: 2604.11293v1

▶

Machine Learning Study on Single Production of a Singlet Vector-like Lepton at the Large Hadron Collider

▶

Study of $χ_{cJ}\to ηηη^\prime$ via intermediate charmed meson loop mechanisms and its implications for non-observation of $η_1(1855)$ in $χ_{cJ}$ decays

▶

Study of doubly heavy baryon lifetimes

Faruk Muritala, Austin Brown, Dhrubajyoti Ghosh, et al.

Abstract

Monitoring binomial proportions across multiple independent streams is a critical challenge in Statistical Process Control (SPC), with applications from manufacturing to cybersecurity. While EWMA charts offer sensitivity to small shifts, existing implementations rely on asymptotic variance approximations that fail during early-phase monitoring. We introduce a Cumulative Standardized Binomial EWMA (CSB-EWMA) chart that overcomes this limitation by deriving the exact time-varying variance of the EWMA statistic for binary multiple-stream data, enabling adaptive control limits that ensure statistical rigor from the first sample. Through extensive simulations, we identify optimal smoothing (λ) and limit (L) parameters to achieve target in-control average run length (ARL0) of 370 and 500. The CSB-EWMA chart demonstrates rapid shift detection across both ARL0 targets, with out-of-control average run length (ARL1) dropping to 3-7 samples for moderate shifts (δ=0.2), and exhibits exceptional robustness across different data distributions, with low ARL1 Coefficients of Variation (CV < 0.10 for small shifts) for both ARL0 = 370 and 500. This work provides practitioners with a distribution-free, sensitive, and theoretically sound tool for early change detection in binomial multiple-stream processes.

Submitted: 2026-04-13 ArXiv ID: 2604.12095v1

▶

On the continuum limit of t-SNE for data visualization

Authors

Jeff Calder, Zhonggan Huang, Ryan Murray, et al.

Abstract

This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is still poorly understood from a theoretical standpoint. The t-SNE algorithm produces visualizations by minimizing the Kullback-Leibler divergence between similarity matrices representing the high dimensional data and its low dimensional representation. We prove that as the number of data points $n \to \infty$, after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent as the number of data points $n \to \infty$ and the similarity graph remains sparse with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density function in the visualization space. These two terms represent the continuum limits of the attraction and repulsion forces in the t-SNE algorithm. Due to the lack of convexity in the continuum variational problem, the question of well-posedeness is only partially resolved. We show that when both dimensions are $1$, the problem admits a unique smooth minimizer, along with an infinite number of discontinuous minimizers (interpreted in a relaxed sense). This aligns well with the empirically observed ability of t-SNE to separate data in seemingly arbitrary ways in the visualization. The energy is also very closely related to the famously ill-posed Perona-Malik equation, which is used for denoising and simplifying images. We present numerical results validating the continuum limit, provide some preliminary results about the delicate nature of the limiting energetic problem in higher dimensions, and highlight several problems for future work.

Submitted: 2026-04-13 ArXiv ID: 2604.12041v1

▶

Convolutional Maximum Mean Discrepancy for Inference in Noisy Data

Paula Arguello, Berk Tinaz, Mohammad Shahab Sepehri, et al.

Abstract

Deep learning underpins a wide range of applications in MRI, including reconstruction, artifact removal, and segmentation. However, progress has been driven largely by public datasets focused on brain and knee imaging, shaping how models are trained and evaluated. As a result, careful studies of the reliability of these models across diverse anatomical settings remain limited. In this work, we introduce MosaicMRI, a large and diverse collection of fully sampled raw musculoskeletal (MSK) MR measurements designed for training and evaluating machine-learning-based methods. MosaicMRI is the largest open-source raw MSK MRI dataset to date, comprising 2,671 volumes and 80,156 slices. The dataset offers substantial diversity in volume orientation (e.g., axial, sagittal), imaging contrasts (e.g., PD, T1, T2), anatomies (e.g., spine, knee, hip, ankle, and others), and numbers of acquisition coils. Using VarNet as a baseline for accelerated reconstruction task, we perform a comprehensive set of experiments to study scaling behavior with respect to both model capacity and dataset size. Interestingly, models trained on the combined anatomies significantly outperform anatomy-specific models in low-sample regimes, highlighting the benefits of anatomical diversity and the presence of exploitable cross-anatomical correlations. We further evaluate robustness and cross-anatomy generalization by training models on one anatomy (e.g., spine) and testing them on another (e.g., knee). Notably, we identify groups of body parts (e.g., foot and elbow) that generalize well with each other, and highlight that performance under domain shifts depends on both training set size, anatomy, and protocol-specific factors.

Comments: 15 pages, 6 figures, preliminary version

Submitted: 2026-04-13 ArXiv ID: 2604.11762v1

▶

Inferring Change Points in Regression via Sample Weighting

▶

Nested Atoms Model with Application to Clustering Big Population-Scale Single-Cell Data

Authors

Arhit Chakrabarti, Yang Ni, Yuchao Jiang, et al.

Abstract

We consider the problem of clustering nested or hierarchical data, where observations are grouped and there are both group-level and observation-level variables. In our motivating OneK1K dataset, observations consist of single-cell RNA-sequencing (scRNA-seq) data from 982 individuals (groups), totaling 1.27 million cells (observations), along with individual-specific genotype data. This type of data would enable the identification of cell types and the investigation of how genetic variations among individuals influence differences in cell-type profiles. Our goal, therefore, is to jointly cluster cells and individuals to capture the heterogeneity across both levels using cell-specific gene expressions as well as individual-specific genotypes. However, existing grouped clustering methods do not incorporate group-level variables, thereby limiting their ability to capture the heterogeneity of genotypes in our motivating application. To address this, we propose the Nested Atoms Model (NAM), a new Bayesian nonparametric approach that enables the desired two-layered clustering, accounting for both group-level and observation-level variables. To scale NAM for high-dimensional data, we develop a fast variational Bayesian inference algorithm. Simulations show that NAM outperforms existing methods that ignore group-level variables. Applied to the OneK1K dataset, NAM identifies clusters of genetically similar individuals with homogeneous cell-type profiles. The resulting cell clusters align with known immune cell types based on differential gene expression, underscoring the ability of NAM to capture nested heterogeneity and provide biologically meaningful insights.

Submitted: 2026-04-13 ArXiv ID: 2604.11731v1

▶

Minimizing classical resources in variational measurement-based quantum computation for generative modeling

Authors

Arunava Majumder, Hendrik Poulsen Nautrup, Hans J. Briegel

Abstract

Measurement-based quantum computation (MBQC) is a framework for quantum information processing in which a computational task is carried out through one-qubit measurements on a highly entangled resource state. Due to the indeterminacy of the outcomes of a quantum measurement, the random outcomes of these operations, if not corrected, yield a variational quantum channel family. Traditionally, this randomness is corrected through classical processing in order to ensure deterministic unitary computations. Recently, variational measurement-based quantum computation (VMBQC) has been introduced to exploit this measurement-induced randomness to gain an advantage in generative modeling. A limitation of this approach is that the corresponding channel model has twice as many parameters compared to the unitary model, scaling as $N \times D$, where $N$ is the number of logical qubits (width) and $D$ is the depth of the VMBQC model. This can often make optimization more difficult and may lead to poorly trainable models. In this paper, we present a restricted VMBQC model that extends the unitary setting to a channel-based one using only a single additional trainable parameter. We show, both numerically and algebraically, that this minimal extension is sufficient to generate probability distributions that cannot be learned by the corresponding unitary model.

Comments: 14 pages

Submitted: 2026-04-13 ArXiv ID: 2604.11578v1

Conformal selection (CS) uses calibration data to identify test inputs whose unobserved outcomes are likely to satisfy a pre-specified minimal quality requirement, while controlling the false discovery rate (FDR). Existing methods fix the target FDR level before observing data, which prevents the user from adapting the balance between number of selected test inputs and FDR to downstream needs and constraints based on the available data. For example, in genomics or neuroimaging, researchers often inspect the distribution of test statistics, and decide how aggressively to pursue candidates based on observed evidence strength and available follow-up resources. To address this limitation, we introduce {post-hoc CS} (PH-CS), which generates a path of candidate selection sets, each paired with a data-driven false discovery proportion (FDP) estimate. PH-CS lets the user select any operating point on this path by maximizing a user-specified utility, arbitrarily balancing selection size and FDR. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS is proved to provide a finite-sample post-hoc reliability guarantee whereby the ratio between estimated FDP level and true FDP is, on average, upper bounded by $1$, so that the average estimated FDP is, to first order, a valid upper bound on the true FDR. PH-CS is extended to control quality defined in terms of a general risk. Experiments on synthetic and real-world datasets demonstrate that, unlike CS, PH-CS can consistently satisfy user-imposed utility constraints while producing reliable FDP estimates and maintaining competitive FDR control.

Comments: 32 pages, 29 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11305v1

▶

Trustworthy Feature Importance Avoids Unrestricted Permutations

▶

Regional Explanations: Bridging Local and Global Variable Importance

Authors

Salim I. Amoukou, Nicolas J-B. Brunel

Abstract

We analyze two widely used local attribution methods, Local Shapley Values and LIME, which aim to quantify the contribution of a feature value $x_i$ to a specific prediction $f(x_1, \dots, x_p)$. Despite their widespread use, we identify fundamental limitations in their ability to reliably detect locally important features, even under ideal conditions with exact computations and independent features. We argue that a sound local attribution method should not assign importance to features that neither influence the model output (e.g., features with zero coefficients in a linear model) nor exhibit statistical dependence with functionality-relevant features. We demonstrate that both Local SV and LIME violate this fundamental principle. To address this, we propose R-LOCO (Regional Leave Out COvariates), which bridges the gap between local and global explanations and provides more accurate attributions. R-LOCO segments the input space into regions with similar feature importance characteristics. It then applies global attribution methods within these regions, deriving an instance's feature contributions from its regional membership. This approach delivers more faithful local attributions while avoiding local explanation instability and preserving instance-specific detail often lost in global methods.

Comments: Accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

Submitted: 2026-04-13 ArXiv ID: 2604.11223v1

▶

ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values

▶

Cost-optimal Sequential Testing via Doubly Robust Q-learning

Authors

Doudou Zhou, Yiran Zhang, Dian Jin, et al.

Abstract

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.

Submitted: 2026-04-13 ArXiv ID: 2604.11165v1

▶

Gradient-Variation Regret Bounds for Unconstrained Online Learning

▶

DDO-RM for LLM Preference Optimization: A Minimal Held-Out Benchmark against DPO

Authors

Tiantian Zhang, Jierui Zuo, Wenping Wang

Abstract

This paper reorganizes the current manuscript around the DPO versus DDO-RM preference-optimization project and focuses on two parts: the algorithmic view and the preliminary held-out benchmark. The benchmark asks a narrow question: even in a minimal pairwise chosen-versus-rejected setting, can a reward-guided decision-distribution update outperform a direct pairwise objective? We compare Direct Preference Optimization (DPO) against DDO-RM on EleutherAI/pythia-410m using HuggingFaceH4/ultrafeedback\_binarized, evaluate on the held-out test\_prefs split, and report results for seeds 42, 13, and 3407. Algorithmically, DDO-RM treats each prompt as a finite decision problem over candidate responses. Instead of optimizing only a binary chosen-rejected relation, it forms a policy distribution over candidates, centers reward-model scores under that distribution, and distills a reward-guided target distribution back into the policy. In the current public benchmark, DDO-RM improves mean pair accuracy from 0.5238 to 0.5602, AUC from 0.5315 to 0.5382, and mean margin from 0.1377 to 0.5353 relative to DPO. These are encouraging but still preliminary results: the study covers one model family, one dataset, one held-out evaluation split, and three seeds.

Comments: 8 pages, 4 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11119v1

▶

Distributionally Robust K-Means Clustering

▶

Neural Generalized Mixed-Effects Models

Vitor F. Grizzi, Luke N. Pretzie, Jiayi Xu, et al.

Abstract

We present XANE(3), a physics-based E(3)-equivariant graph neural network for predicting X-ray absorption near-edge structure (XANES) spectra directly from atomic structures. The model combines tensor-product message passing with spherical harmonic edge features, absorber-query attention pooling, custom equivariant layer normalization, adaptive gated residual connections, and a spectral readout based on a multi-scale Gaussian basis with an optional sigmoidal background term. To improve line-shape fidelity, training is performed with a composite objective that includes pointwise spectral reconstruction together with first- and second-derivative matching terms. We evaluate the model on a dataset of 5,941 FDMNES simulations of iron oxide surface facets and obtain a spectrum mean squared error of $1.0 \times 10^{-3}$ on the test set. The model accurately reproduces the main edge structure, relative peak intensities, pre-edge features, and post-edge oscillations. Ablation studies show that the derivative-aware objective, custom equivariant normalization, absorber-conditioned attention pooling, adaptive gated residual mixing, and global background term each improve performance. Interestingly, a capacity-matched scalar-only variant achieves comparable pointwise reconstruction error but reduced derivative-level fidelity, indicating that explicit tensorial channels are not strictly required for low intensity error on this dataset, although they remain beneficial for capturing finer spectral structure. These results establish XANE(3) as an accurate and efficient surrogate for XANES simulation and offer a promising route toward accelerated spectral prediction, ML-assisted spectroscopy, and data-driven materials discovery.

Submitted: 2026-04-13 ArXiv ID: 2604.12140v1

▶

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

Arun Sharma

Abstract

We introduce compute-grounded reasoning (CGR), a design paradigm for spatial-aware research agents in which every answerable sub-problem is resolved by deterministic computation before a language model is asked to generate. Spatial Atlas instantiates CGR as a single Agent-to-Agent (A2A) server that handles two challenging benchmarks: FieldWorkArena, a multimodal spatial question-answering benchmark spanning factory, warehouse, and retail environments, and MLE-Bench, a suite of 75 Kaggle machine learning competitions requiring end-to-end ML engineering. A structured spatial scene graph engine extracts entities and relations from vision descriptions, computes distances and safety violations deterministically, then feeds computed facts to large language models, thereby avoiding hallucinated spatial reasoning. Entropy-guided action selection maximizes information gain per step and routes queries across a three-tier frontier model stack (OpenAI + Anthropic). A self-healing ML pipeline with strategy-aware code generation, a score-driven iterative refinement loop, and a prompt-based leak audit registry round out the system. We evaluate across both benchmarks and show that CGR yields competitive accuracy while maintaining interpretability through structured intermediate representations and deterministic spatial computations.

Comments: 11 pages. Submitted to NeurIPS 2026. Code: https://github.com/arunshar/spatial-atlas

Submitted: 2026-04-13 ArXiv ID: 2604.12102v1

▶

A Nonparametric Adaptive EWMA Control Chart for Binary Monitoring of Multiple Stream Processes

Authors

Faruk Muritala, Austin Brown, Dhrubajyoti Ghosh, et al.

Abstract

Submitted: 2026-04-13 ArXiv ID: 2604.12095v1

▶

Robust Optimization for Mitigating Reward Hacking with Correlated Proxies

Authors

Zixuan Liu, Xiaolin Sun, Zizhan Zheng

Abstract

Designing robust reinforcement learning (RL) agents in the presence of imperfect reward signals remains a core challenge. In practice, agents are often trained with proxy rewards that only approximate the true objective, leaving them vulnerable to reward hacking, where high proxy returns arise from unintended or exploitative behaviors. Recent work formalizes this issue using r-correlation between proxy and true rewards, but existing methods like occupancy-regularized policy optimization (ORPO) optimize against a fixed proxy and do not provide strong guarantees against broader classes of correlated proxies. In this work, we formulate reward hacking as a robust policy optimization problem over the space of all r-correlated proxy rewards. We derive a tractable max-min formulation, where the agent maximizes performance under the worst-case proxy consistent with the correlation constraint. We further show that when the reward is a linear function of known features, our approach can be adapted to incorporate this prior knowledge, yielding both improved policies and interpretable worst-case rewards. Experiments across several environments show that our algorithms consistently outperform ORPO in worst-case returns, and offer improved robustness and stability across different levels of proxy-true reward correlation. These results show that our approach provides both robustness and transparency in settings where reward design is inherently uncertain. The code is available at https://github.com/ZixuanLiu4869/reward_hacking.

Comments: ICLR 2026

Submitted: 2026-04-13 ArXiv ID: 2604.12086v1

▶

Robust Reasoning and Learning with Brain-Inspired Representations under Hardware-Induced Nonlinearities

Authors

William Youngwoo Chung, Hamza Errahmouni Barkam, Tamoghno Das, et al.

Abstract

Traditional machine learning depends on high-precision arithmetic and near-ideal hardware assumptions, which is increasingly challenged by variability in aggressively scaled semiconductor devices. Compute-in-memory (CIM) architectures alleviate data-movement bottlenecks and improve energy efficiency yet introduce nonlinear distortions and reliability concerns. We address these issues with a hardware-aware optimization framework based on Hyperdimensional Computing (HDC), systematically compensating for non-ideal similarity computations in CIM. Our approach formulates encoding as an optimization problem, minimizing the Frobenius norm between an ideal kernel and its hardware-constrained counterpart, and employs a joint optimization strategy for end-to-end calibration of hypervector representations. Experimental results demonstrate that our method when applied to QuantHD achieves 84\% accuracy under severe hardware-induced perturbations, a 48\% increase over naive QuantHD under the same conditions. Additionally, our optimization is vital for graph-based HDC reliant on precise variable-binding for interpretable reasoning. Our framework preserves the accuracy of RelHD on the Cora dataset, achieving a 5.4$\times$ accuracy improvement over naive RelHD under nonlinear environments. By preserving HDC's robustness and symbolic properties, our solution enables scalable, energy-efficient intelligent systems capable of classification and reasoning on emerging CIM hardware.

Comments: 8 pages, 7 figures, accepted to Great Lakes Symposium on VLSI (GLSVLSI) 2025

Submitted: 2026-04-13 ArXiv ID: 2604.12079v1

▶

OpenTME: An Open Dataset of AI-powered H&E Tumor Microenvironment Profiles from TCGA

▶

Robust Explanations for User Trust in Enterprise NLP Systems

Authors

Guilin Zhang, Kai Zhao, Jeffrey Friedman, et al.

Abstract

Robust explanations are increasingly required for user trust in enterprise NLP, yet pre-deployment validation is difficult in the common case of black-box deployment (API-only access) where representation-based explainers are infeasible and existing studies provide limited guidance on whether explanations remain stable under real user noise, especially when organizations migrate from encoder classifiers to decoder LLMs. To close this gap, we propose a unified black-box robustness evaluation framework for token-level explanations based on leave-one-out occlusion, and operationalize explanation robustness with top-token flip rate under realistic perturbations (swap, deletion, shuffling, and back-translation) at multiple severity levels. Using this protocol, we conduct a systematic cross-architecture comparison across three benchmark datasets and six models spanning encoder and decoder families (BERT, RoBERTa, Qwen 7B/14B, Llama 8B/70B; 64,800 cases). We find that decoder LLMs produce substantially more stable explanations than encoder baselines (73% lower flip rates on average), and that stability improves with model scale (44% gain from 7B to 70B). Finally, we relate robustness improvements to inference cost, yielding a practical cost-robustness tradeoff curve that supports model and explanation selection prior to deployment in compliance-sensitive applications.

Submitted: 2026-04-13 ArXiv ID: 2604.12069v1

▶

Interpretable DNA Sequence Classification via Dynamic Feature Generation in Decision Trees

▶

LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models

Authors

Haocheng Xi, Harman Singh, Yuezhou Hu, et al.

Abstract

Block-wise diffusion language models (DLMs) generate multiple tokens in any order, offering a promising alternative to the autoregressive decoding pipeline. However, they still remain bottlenecked by memory-bound attention in long-context scenarios. Naive sparse attention fails on DLMs due to a KV Inflation problem, where different queries select different prefix positions, making the union of accessed KV pages large. To address this, we observe that between consecutive denoising steps, only a small fraction of active tokens exhibit significant hidden-state changes, while the majority of stable tokens remain nearly constant. Based on this insight, we propose LOSA (Locality-aware Sparse Attention), which reuses cached prefix-attention results for stable tokens and applies sparse attention only to active tokens. This substantially shrinks the number of KV indices that must be loaded, yielding both higher speedup and higher accuracy. Across multiple block-wise DLMs and benchmarks, LOSA preserves near-dense accuracy while significantly improving efficiency, achieving up to +9 points in average accuracy at aggressive sparsity levels while maintaining 1.54x lower attention density. It also achieves up to 4.14x attention speedup on RTX A6000 GPUs, demonstrating the effectiveness of the proposed method.

Comments: 16 pages, 11 figures, 6 tables

Submitted: 2026-04-13 ArXiv ID: 2604.12056v1

▶

VISTA: Validation-Informed Trajectory Adaptation via Self-Distillation

▶

On the continuum limit of t-SNE for data visualization

Authors

Jeff Calder, Zhonggan Huang, Ryan Murray, et al.

Abstract

Submitted: 2026-04-13 ArXiv ID: 2604.12041v1

▶

Constant-Factor Approximation for the Uniform Decision Tree

▶

TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction

Authors

Seungik Cho

Jingzhou Shen, Luis Lago Enamorado, Shiwen Mao, et al.

Abstract

Abstract

We introduce INDOTABVQA, a benchmark for evaluating cross-lingual Table Visual Question Answering (VQA) on real-world document images in Bahasa Indonesia. The dataset comprises 1,593 document images across three visual styles (bordered, borderless, and colorful) with one or more than one tables, and 1,593 question-answer sets in four languages: Bahasa Indonesia, English, Hindi, and Arabic. This enables evaluation of Vision-Language Models (VLMs) in both monolingual (Bahasa documents with Bahasa questions) and cross-lingual settings (Bahasa documents with questions in other languages). We benchmark leading open-source VLMs (Qwen2.5-VL, Gemma-3, LLaMA-3.2) and GPT-4o and reveal substantial performance gaps, particularly on structurally complex tables and in low-resource languages. Fine-tuning a compact 3B and LoRA-finetuned 7B model on our dataset yields 11.6% and 17.8% improvements in accuracy. Providing explicit table region coordinates as additional input further improves performance by 4-7%, demonstrating the value of Spatial priors for table-based reasoning. Our findings underscore the importance of language-diverse, domain-specific datasets and demonstrate that targeted fine-tuning can significantly enhance VLM performance on specialized document understanding tasks. INDOTABVQA provides a valuable resource for advancing research in cross-lingual, structure-aware document understanding, especially in underrepresented regions of the world. Full dataset can be accessed in huggingface at: https://huggingface.co/datasets/NusaBharat/INDOTABVQA}

Comments: Accepted in ACL 2026 (Findings)

Submitted: 2026-04-13 ArXiv ID: 2604.11970v1

▶

The Linear Centroids Hypothesis: How Deep Network Features Represent Data

Authors

Thomas Walker, Ahmed Imtiaz Humayun, Randall Balestriero, et al.

Abstract

Identifying and understanding the features that a deep network (DN) extracts from its inputs to produce its outputs is a focal point of interpretability research. The Linear Representation Hypothesis (LRH) identifies features in terms of the linear directions formed by the inputs in a DN's latent space. However, the LRH is limited as it abstracts away from individual components (e.g., neurons and layers), is susceptible to identifying spurious features, and cannot be applied across sub-components (e.g., multiple layers). In this paper, we introduce the Linear Centroids Hypothesis (LCH) as a new framework for identifying the features of a DN. The LCH posits that features correspond to linear directions of centroids, which are vector summarizations of the functional behavior of a DN in a local region of its input space. Interpretability studies under the LCH can leverage existing LRH tools, such as sparse autoencoders, by applying them to the DN's centroids rather than to its latent activations. We demonstrate that doing so yields sparser feature dictionaries for DINO vision transformers, which also perform better on downstream tasks. The LCH also inspires novel approaches to interpretability; for example, LCH can readily identify circuits in GPT2-Large. For code to study the LCH https://github.com/ThomasWalker1/LinearCentroidsHypothesis .

Comments: 20 pages, 17 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11962v1

▶

Agentic LLM Reasoning in a Self-Driving Laboratory for Air-Sensitive Lithium Halide Spinel Conductors

Authors

Yuxing Fei, Bernardus Rendy, Xiaochen Yang, et al.

Abstract

Self-driving laboratories promise to accelerate materials discovery. Yet current automated solid-state synthesis platforms are limited to ambient conditions, thereby precluding their use for air-sensitive materials. Here, we present A-Lab for Glovebox Powder Solid-state Synthesis (A-Lab GPSS), a robotic platform capable of synthesizing and characterizing air-sensitive inorganic materials under strict air-free conditions. By integrating an agentic AI framework into the A-Lab GPSS platform, we structure autonomous experimental design through abductive and inductive reasoning. We deploy this platform to explore the vast compositional space of lithium halide spinel solid-state ionic conductors. Across a synthesis campaign comprising 352 samples with diverse compositions, the system explores a broad chemical space, experimentally realizing 72% of the 171 possible pairwise combinations among the 19 metals considered in this study. Over the course of the campaign, the fraction of compositions exhibiting both good ionic conductivity (> 0.05 mS/cm) and high halide spinel phase purity increases from 1.33% in the first 75 agent-proposed samples to 5.33% in the final 75. Furthermore, by inspecting the AI's reasoning processes, we reveal distinct yet complementary discovery strategies: abductive reasoning interrogates abnormal observations within already explored regions, whereas inductive reasoning expands the search into broader, previously unvisited chemical space. This work establishes a scalable platform for the autonomous discovery of complex, air-sensitive solid-state materials.

Submitted: 2026-04-13 ArXiv ID: 2604.11957v1

▶

Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Authors

Yixian Shen, Chaoyao Shen, Jan Deen, et al.

Abstract

Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-performance general-purpose CPUs, especially emerging 3D-stacked Static Non-Uniform Cache Architecture (3D S-NUCA) systems. These architectures offer enhanced bandwidth and locality but suffer from severe thermal challenges and uneven cache latencies due to 3D Networks-on-Chip (NoC). Optimal management of thread migration and V/f scaling is non-trivial due to LFM kernel diversity and system heterogeneity. Existing thermal management approaches often rely on oversimplified analytical models and lack adaptability. We propose AILFM, an Active Imitation Learning (AIL)-based scheduling framework that learns near-optimal thermal-aware scheduling policies from Oracle demonstrations with minimal run-time overhead. AILFM accounts for both core-level performance heterogeneity and kernel-specific behavior in LFMs to maintain thermal safety while maximizing performance. Extensive experiments show that AILFM outperforms state-of-the-art baselines and generalizes well across diverse LFM workloads.

Comments: Accepted for publication at the 63rd ACM/IEEE Design Automation Conference (DAC 2026)

Submitted: 2026-04-13 ArXiv ID: 2604.11948v1

▶

ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism

Authors

Alan Aboudib, Rodrigo Lopez Portillo A., Kalei Brady, et al.

Abstract

Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enabled by data and pipeline parallelism, two techniques that require ultra-high-bandwidth communication. While efficient methods now exist for decentralized data parallelism, pipeline parallelism remains the primary challenge. Recent efforts, such as Subspace Models (SM), have claimed up to 100x activation compression but rely on complex constrained optimization and diverge from true end-to-end training. In this paper, we propose a different approach, based on an architecture designed from the ground up to be native to low-bandwidth communication environments while still applicable to any standard transformer-based architecture. We call this architecture the Residual Bottleneck Model or ResBM, it introduces a residual encoder-decoder bottleneck module across pipeline boundaries that can be trained end-to-end as part of the model's parameters while preserving an explicit low-rank identity path. We show that ResBMs achieve state-of-the-art 128x activation compression without significant loss in convergence rates and without significant memory or compute overhead.

Submitted: 2026-04-13 ArXiv ID: 2604.11947v1

▶

AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow

Authors

Jiale Liu, Nanzhe Wang

Abstract

High-fidelity numerical simulation of subsurface flow is computationally intensive, especially for many-query tasks such as uncertainty quantification and data assimilation. Deep learning (DL) surrogates can significantly accelerate forward simulations, yet constructing them requires substantial machine learning (ML) expertise - from architecture design to hyperparameter tuning - that most domain scientists do not possess. Furthermore, the process is predominantly manual and relies heavily on heuristic choices. This expertise gap remains a key barrier to the broader adoption of DL surrogate techniques. For this reason, we present AutoSurrogate, a large-language-model-driven multi-agent framework that enables practitioners without ML expertise to build high-quality surrogates for subsurface flow problems through natural-language instructions. Given simulation data and optional preferences, four specialized agents collaboratively execute data profiling, architecture selection from a model zoo, Bayesian hyperparameter optimization, model training, and quality assessment against user-specified thresholds. The system also handles common failure modes autonomously, including restarting training with adjusted configurations when numerical instabilities occur and switching to alternative architectures when predictive accuracy falls short of targets. In our setting, a single natural-language sentence can be sufficient to produce a deployment-ready surrogate model, with minimum human intervention required at any intermediate stage. We demonstrate the utility of AutoSurrogate on a 3D geological carbon storage modeling task, mapping permeability fields to pressure and CO$_2$ saturation fields over 31 timesteps. Without any manual tuning, AutoSurrogate is able to outperform expert-designed baselines and domain-agnostic AutoML methods, demonstrating strong potential for practical deployment.

Submitted: 2026-04-13 ArXiv ID: 2604.11945v1

▶

A unified data format for managing diabetes time-series data: DIAbetes eXchange (DIAX)

▶

ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems

Authors

Daeyeon Son

Abstract

An OS kernel that runs LLM inference internally can read logit distributions before any text is generated -- and act on them as a governance primitive. I present ProbeLogits, a kernel-level operation that performs a single forward pass and reads specific token logits to classify agent actions as safe or dangerous, with zero learned parameters. On a 260-prompt OS action benchmark (9 categories including adversarial attacks), ProbeLogits achieves F1=0.980, Precision=1.000, and Recall=0.960 using a general-purpose 7B model at 4-bit quantization. On ToxicChat (1,000 human-annotated real conversations), it achieves F1=0.790 at default calibration strength $α$=1.0, improving to F1=0.837 at $α$=0.5 -- 89% of Llama Guard 3's F1~0.939 with zero learned parameters. A key design contribution is the calibration strength $α$, which serves as a deployment-time policy knob rather than a learned hyperparameter. By adjusting $α$, the OS can enforce strict policies for privileged operations ($α\geq 0.8$, maximizing recall) or relaxed policies for conversational agents ($α$=0.5, maximizing precision). Contextual calibration improves accuracy from 64.8% to 97.3% on the custom benchmark. I implement ProbeLogits within Anima OS, a bare-metal x86_64 OS written in 80,400 lines of Rust. Because agent actions must pass through kernel-mediated host functions, ProbeLogits enforcement operates below the WASM sandbox boundary, making it significantly harder to circumvent than application-layer classifiers. Each classification costs 65ms on 7B -- fast enough for per-action governance. I also show that treating KV cache as process state enables checkpoint, restore, and fork operations analogous to traditional process management. To my knowledge, no prior system exposes LLM logit vectors as OS-level governance primitives.

Comments: 13 pages, 9 tables

Submitted: 2026-04-13 ArXiv ID: 2604.11943v1

▶

Fast and principled equation discovery from chaos to climate

Authors

Yuzheng Zhang, Weizhen Li, Rui Carvalho

Abstract

Our ability to predict, control, and ultimately understand complex systems rests on discovering the equations that govern their dynamics. Identifying these equations directly from noisy, limited observations has therefore become a central challenge in data-driven science, yet existing library-based sparse regression methods force a compromise between automation, statistical rigor, and computational efficiency. Here we develop Bayesian-ARGOS, a hybrid framework that reconciles these demands by combining rapid frequentist screening with focused Bayesian inference, enabling automated equation discovery with principled uncertainty quantification at a fraction of the computational cost of existing methods. Tested on seven chaotic systems under varying data scarcity and noise levels, Bayesian-ARGOS outperforms two state-of-the-art methods in most scenarios. It surpasses SINDy in data efficiency for all systems and noise tolerance for six out of the seven, with a two-order-of-magnitude reduction in computational cost compared to bootstrap-based ARGOS. The probabilistic formulation additionally enables a suite of standard statistical diagnostics, including influence analysis and multicollinearity detection that expose failure modes otherwise opaque. When integrated with representation learning (SINDy-SHRED) for high dimensional sea surface temperature reconstruction, Bayesian-ARGOS increases the yield of valid latent equations with significantly improved long horizon stability. Bayesian-ARGOS thus provides a principled, automated, and computationally efficient route from scarce and noisy observations to interpretable governing equations, offering a practical framework for equation discovery across scales, from benchmark chaotic systems to the latent dynamics underlying global climate patterns.

Comments: 34 pages, 8 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11929v1

▶

INTARG: Informed Real-Time Adversarial Attack Generation for Time-Series Regression

▶

FlowBoost Reveals Phase Transitions and Spectral Structure in Finite Free Information Inequalities

Mohammed Ezzaldin Babiker Abdullah

Abstract

The stable operation of autonomous off-grid photovoltaic systems requires solar forecasting algorithms that respect atmospheric thermodynamics. Contemporary deep learning models consistently exhibit critical anomalies, primarily severe temporal phase lags during cloud transients and physically impossible nocturnal power generation. To resolve this divergence between data-driven modeling and deterministic celestial mechanics, this research introduces the Thermodynamic Liquid Manifold Network. The methodology projects 22 meteorological and geometric variables into a Koopman-linearized Riemannian manifold to systematically map complex climatic dynamics. The architecture integrates a Spectral Calibration unit and a multiplicative Thermodynamic Alpha-Gate. This system synthesizes real-time atmospheric opacity with theoretical clear-sky boundary models, structurally enforcing strict celestial geometry compliance. This completely neutralizes phantom nocturnal generation while maintaining zero-lag synchronization during rapid weather shifts. Validated against a rigorous five-year testing horizon in a severe semi-arid climate, the framework achieves an RMSE of 18.31 Wh/m2 and a Pearson correlation of 0.988. The model strictly maintains a zero-magnitude nocturnal error across all 1826 testing days and exhibits a sub-30-minute phase response during high-frequency optical transients. Comprising exactly 63,458 trainable parameters, this ultra-lightweight design establishes a robust, thermodynamically consistent standard for edge-deployable microgrid controllers.

Authors

Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, et al.

Abstract

Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached, attention-head behavior stabilizes, leading to constant behavior across recurrences. Empirically, we discover that recurrent blocks learn stages of inference that closely mirror those of feedforward models, repeating these stages in depth with each iteration. We study how recurrent block size, input injection, and normalization influence the emergence and stability of these cyclic fixed points. We believe these findings help translate mechanistic insights into practical guidance for architectural design.

Comments: 39 pages, 63 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11791v1

▶

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Authors

Fei Tang, Zhiqiong Lu, Boxuan Zhang, et al.

Abstract

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers from environment instability and closed pipelines, evaluation protocols drift silently across works, and trained agents rarely reach real users on real devices. We present \textbf{ClawGUI}, an open-source framework addressing these three gaps within a single harness. \textbf{ClawGUI-RL} provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. \textbf{ClawGUI-Eval} enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8\% reproduction against official baselines. \textbf{ClawGUI-Agent} brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms with hybrid CLI-GUI control and persistent personalized memory. Trained end to end within this pipeline, \textbf{ClawGUI-2B} achieves 17.1\% Success Rate on MobileWorld GUI-Only, outperforming the same-scale MAI-UI-2B baseline by 6.0\%.

Submitted: 2026-04-13 ArXiv ID: 2604.11784v1

▶

Autonomous Diffractometry Enabled by Visual Reinforcement Learning

▶

Disposition Distillation at Small Scale: A Three-Arc Negative Result

Authors

Hari Sadasivan

Abstract

We set out to train behavioral dispositions (self-verification, uncertainty acknowledgment, feedback integration) into small language models (0.6B to 2.3B effective parameters) through a four-stage all-MIT distillation pipeline, with follow-on experiments on inference-time attention-head interventions and a frozen-base confidence-gated sidecar. An internal draft reported +33.9-point MCAS and +15.3-point HumanEval gains on a Qwen3-0.6B student; a second-pass sanity check falsified both numbers before publication. The HumanEval delta was a truncation artifact (n_predict=512) that inverted to -8.0 points at n_predict=1024; the MCAS gain disappeared under apples-to-apples scoring. That falsification triggered three subsequent arcs. Across (1) SFT/DPO LoRA on three model families and two domains, (2) inference-time attention-head tempering on o_proj, and (3) a training-free frozen-base sidecar reading the final-token hidden state h_last, we find no operator that moves judge-measured disposition without damaging content or collapsing into stylistic mimicry. The failure is consistent across five models (Qwen3-0.6B, Qwen3-1.7B, Qwen3.5-0.8B, Gemma 4 E2B, and SmolLM2-1.7B-Instruct). A within-distribution cross-validation pass (AUC=0.683) collapsed to chance on fresh prompts (AUC=0.516). We contribute a three-arc negative result with mechanism, a two-failure-mode taxonomy for linear h_last probes, and an honest falsification pipeline that converts the class of false positives we ourselves produced into publishable negatives. As an independent finding, Gemma 4 E2B exhibits near-complete confidence-correctness decoupling on the Chef domain (assertion asymmetry -0.009; the model asserts at 91% regardless of correctness).

Comments: 16 pages, 4 figures

Submitted: 2026-04-13 ArXiv ID: 2604.11867v1

▶

MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI

Authors

Paula Arguello, Berk Tinaz, Mohammad Shahab Sepehri, et al.

Abstract

Comments: 15 pages, 6 figures, preliminary version

Submitted: 2026-04-13 ArXiv ID: 2604.11762v1

▶

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

All-charm tetraquarks at hadron colliders: A high-precision fragmentation perspective

Authors

Francesco Giovanni Celiberto

Abstract

Submitted: 2026-04-13 ArXiv ID: 2604.11646v1

▶

SemiCharmTag: a tool for Semileptonic Charm tagging

▶

ArXiv Daily Papers

High Energy Physics - Experiment

Prospects for measuring exclusive diffractive $η,η'$ at the LHC

Authors

Abstract

Ultra-High-Energy Tau Neutrinos as Probes of Lorentz Invariance

Authors

Abstract

A First Account of the Impact of Ion Electromagnetic Dissociation on Event Exclusivity in Ultraperipheral LHC Collisions

Authors

Abstract

Low-Multiplicity Jets as Probes of GeV-Scale Light-Quark-Coupled Particles

Authors

Abstract

Probing the Tau Anomalous Magnetic Moment at Colliders: From Ultra-Peripheral Collisions to the Precision Frontier

Authors

Abstract

Radon-induced backgrounds in the NEXT-100 experiment

Authors

Abstract

QCD-factorization amplitudes from flavour symmetries: beyond the $SU(3)$ symmetric case

Authors

Abstract

Search for quantum black holes in lepton+jet final states using proton-proton collisions at $\sqrt{s}=13.6$ TeV with the ATLAS detector

Authors

Abstract

Neural posterior estimation of the neutrino direction in IceCube using transformer-encoded normalizing flows on the sphere

Authors

Abstract

Probing the neutrino trident process using the Scattering and Neutrino Detector at HL-LHC and SHiP

Authors

Abstract

Flavour Physics beyond the LHC

Authors

Abstract

The FASER experiment at the Large Hadron Collider

Authors

Abstract

RL-ABC: Reinforcement Learning for Accelerator Beamline Control

Authors

Abstract

Three-dimensional recoil-electron reconstruction using combined optical imaging and waveform readout for electron-tracking Compton cameras

Authors

Abstract

High Energy Physics - Phenomenology

Purely Quadratic Non-Gaussianity from Tachyonic Instability: Primordial Black Holes and Scalar-Induced Gravitational Waves

Authors

Abstract

Unraveling Chemical Enrichment in Extreme Emission-Line Galaxies: A Multi-Element Bayesian View of Bursty Star Formation and Galaxy Evolution in DESI

Authors

Abstract

Prospects for measuring exclusive diffractive $η,η'$ at the LHC

Authors

Abstract

Introduction to transverse momentum imaging

Authors

Abstract

Sub-GeV dark matter from cosmic ray bremsstrahlung in the atmosphere

Authors

Abstract

Ultra-High-Energy Tau Neutrinos as Probes of Lorentz Invariance

Authors

Abstract

A First Account of the Impact of Ion Electromagnetic Dissociation on Event Exclusivity in Ultraperipheral LHC Collisions

Authors

Abstract

Self-Interaction and Galactic Magnetic Field Bounds on Millicharged Magnetic Monopole Dark Matter

Authors

Abstract

Asymptotic charges as detectors and the memory effect in massive QED and perturbative quantum gravity

Authors

Abstract

Low-Multiplicity Jets as Probes of GeV-Scale Light-Quark-Coupled Particles

Authors

Abstract

Probing the Tau Anomalous Magnetic Moment at Colliders: From Ultra-Peripheral Collisions to the Precision Frontier

Authors

Abstract

Finite-density equation of state of hot QCD using the complex Langevin equation

Authors

Exotic $T^_{csJ}$ and $T^_{c\bar{s}J}$ states and coupled-channel scattering at the $SU(3)$ flavour symmetric point from lattice QCD