Central exclusive diffractive production in proton-proton collisions at hadron colliders is characterised by hadronic activity at or close to midrapidity, and by the two forward scattered protons, or their remnants. In such events, no particles are produced between the midrapidity system and the forward beam particles. These events can hence be identified with appropriately placed detectors for measuring the forward scattered protons, or their remnants, and a detector system covering the midrapidity range. At the energies of the LHC, central diffractive production in proton-proton collisions is dominated by pomeron-pomeron fusion. The description of the pomeron within the Regge approach is summarized, and the feasibility of identifying pseudoscalar mesons $η,η'$ in pomeron-pomeron fusion is studied for determining the spin structure of the pomeron.
Neutrino telescopes have detected astrophysical neutrinos with energies up to ${O}(100)$ PeV. Several current and proposed experiments aim to observe neutrinos at even higher energies, with the goal of detecting cosmogenic neutrinos. This increase in neutrino energy makes tests of Lorentz invariance violation (LIV) particularly appealing, since the effects of higher-dimension LIV operators on neutrino propagation grow rapidly with energy. In this work, we investigate the potential of the upcoming experiments GRAND and POEMMA to probe LIV in the neutrino sector through the detection of ultra-high-energy tau neutrinos. We generate the cosmogenic neutrino flux using SimProp and interface it with a calculation of neutrino flavor transition probabilities in the presence of LIV effects. Deviations from standard flavor transition probabilities manifest as changes in the expected tau neutrino event rates at GRAND and POEMMA. We first consider the case with a single nonzero LIV operator of various dimensions, and find that the projected sensitivities exceed existing limits from lower-energy probes by orders of magnitude. We then explore scenarios with multiple nonzero LIV parameters and show that their interplay can significantly modify the sensitivities compared to the single-parameter case. Overall, we find that upcoming observations of ultra-high-energy tau neutrinos will place some of the most stringent constraints on LIV.
In this Letter we explore the modelling of hadron production in electromagnetic ion dissociation (EMD) processes in high-energy ultraperipheral collisions at LHC energies. Since EMD can accompany exclusive particle production in these interactions, we demonstrate that the resulting hadrons can break the exclusivity vetos typically imposed by experiments. As two representative examples, we calculate the impact on existing LHC measurements of exclusive muon pair production ($γγ\toμμ$) and exclusive coherent $J/ψ$ production. We demonstrate that accounting for this effect resolves long-standing tensions between theoretical predictions and experimental measurements.
We propose a search at the LHC for GeV-scale particles coupling predominantly to light quarks based on low-multiplicity jets. The search targets production in association with a hard photon and uses the feature that a light gauge-singlet can only decay into a small number of hadronic channels, yielding jets with anomalously low charged-track multiplicity and mass compared to QCD jets at the same transverse momentum. We determine the sensitivity to scalar and pseudoscalar couplings to up-quarks, and suggest a data-driven estimate that reduces the sensitivity to jet modeling uncertainties. This search extends the reach to hadronically-coupled particles into a previously inaccessible regime.
The anomalous magnetic moment of the tau lepton, $a_τ$, represents a fundamental test of the Standard Model (SM) and a high-sensitivity probe for New Physics in the third generation of leptons. Due to the tau's extremely short lifetime, traditional spin-precession measurements remain inaccessible, necessitating innovative experimental strategies at high-energy colliders. This review provides a comprehensive overview of the current experimental landscape, highlighting the recent paradigm shift from LEP-era constraints to the unprecedented precision reached at the LHC. We emphasize the importance of Ultra-Peripheral Heavy-Ion Collisions (UPCs), which act as a ``photon-photon collider'' of extreme intensity. By leveraging the $Z^4$ enhancement of the coherent photon flux in Lead-Lead ($PbPb$) interactions, these collisions provide a theoretically robust ``quasi-static'' environment. These results are critically compared with the latest measurements from proton-proton collisions, including the recent CMS observation of the $γγ\to ττ$ process and the ATLAS constraints from the high-mass Drell-Yan tail. We evaluate their complementarity and the challenges related to Effective Field Theory validity at the TeV scale. Finally, we outline the future prospects for $a_τ$ at Belle II and the Future Circular Collider (FCC) stages. While FCC-hh in $PbPb$ mode provides a theoretically clean environment, its sensitivity remains limited to $\mathcal{O}(10^{-2})$. Conversely, the next generation of lepton facilities, specifically Belle II and FCC-ee, aims for the $\mathcal{O}(10^{-5})$ level, required to probe SM electroweak loop corrections. Long-term projections for a high-energy Muon Collider suggest a potential reach of $\mathcal{O}(10^{-6})$.
The NEXT-100 detector at the LSC aims at the first competitive search for the \bbnonu decay using a high-pressure \Xe{136} electroluminescent time projection chamber. The first low-background run of NEXT-100 at 3.95 bar has been devoted to the measurement of the radon-induced backgrounds impacting this search. The contributions from both the internal and external airborne radon have been evaluated. The internal \Rn{222} activity is found to be (0.95$\pm$0.04(stat)$\pm$0.09(sys)) Bq/m$^3$, while no traces of \Rn{220} have been observed. Most of the \Rn{222} progeny plate-out on the surface of the cathode of the detector, leading to a rate of Rn-induced \Bi{214} of (0.97$\pm$0.05(stat)$\pm$0.10(sys)) Hz for visible energies above 400 keV. The corresponding background index in the \bbnonu region of interest is evaluated as (7.3$\pm$1.5(stat)$\pm$0.8(sys))$\times10^{-4}$ counts/(keV$\cdot$kg$\cdot$yr) after selection of the fully contained events. This background index is reduced to $\sim$4$\times10^{-5}$ counts/(keV$\cdot$kg$\cdot$yr) by applying a topological selection requiring only one double-electron-like track in the events. This value is one order of magnitude below the total radiogenic background expectation in NEXT-100. By analyzing the correlation of the airborne radon activity and the measured rate of events in NEXT-100, it is concluded that the detector operates in a virtualy radon-free environment thanks to the radon abatement system of the LSC.
Using experimental information on branching ratios as well as direct and mixing-induced CP asymmetries, we perform a data-driven analysis of charmless non-leptonic $B \to PP$ decays, where $P$ is any of the light pseudoscalar mesons. Implementing flavour-$SU(3)$ breaking at the level of transition form factors, decay constants and phase space factors, we find a good fit to the current experimental data. Our best-fit point materializes in QCD-factorization amplitudes whose central values resemble many features of the dynamical predictions obtained within the QCD factorization framework. Moreover, we do not find any strong indications that the size of annihilation amplitudes is numerically enhanced beyond the naïve $Λ_{\textrm{QCD}}/m_b$ scaling. Subsequently, we address a number of phenomenological applications, among which are various flavour puzzles that have been persisting in non-leptonic $B$ decays for quite some time.
A search for quantum black holes in electron+jet or muon+jet final states with high invariant mass is performed. The analysis uses data from $\sqrt{s}=13.6~\textrm{TeV}$ $pp$ collisions recorded by the ATLAS detector between 2022 and 2024 during Run~3 of the Large Hadron Collider, corresponding to an integrated luminosity of $164~\mathrm{fb}^{-1}$. This search is strongly motivated by a dramatic increase of the production cross-section by up to an order of magnitude for the highest masses considered, thanks to the small increase of $0.6~\textrm{TeV}$ in centre-of-mass energy between Run~2 and Run~3. No significant excess above the Standard Model background is observed, and 95\% CL upper limits are set on the production cross-section times branching ratio in several benchmark models, reaching a mass scale of $9.4~\textrm{TeV}$. These represent the strongest exclusion limits to date on quantum black hole production.
IceCube is a cubic-kilometer-scale neutrino detector located at the geographic South Pole. A precise directional reconstruction of IceCube neutrinos is vital for associations with astronomical objects. In this context, we discuss neural posterior estimation of the neutrino direction via a transformer encoder that maps to a normalizing flow on the 2-sphere. It achieves a new state-of-the-art angular resolution for the two main event morphologies in IceCube - tracks and showers - while being significantly faster than traditional B-spline-based likelihood reconstructions. All-sky scans can be performed within seconds rather than hours, and take constant computation time, regardless of whether the posterior extent is arc-minutes or spans the whole sky. We utilize a combination of $C^2$-smooth rational-quadratic splines, scale transformations and rotations to define a novel spherical normalizing-flow distribution whose parameters are predicted as a whole as the output of the transformer encoder. We test several structural choices diverting from the vanilla transformer architecture. In particular, we find dual residual streams, nonlinear QKV projection and a separate class token with its own cross-attention processing to boost test-time performance. The angular resolution for both showers and tracks improves substantially over the whole trained energy range from 100 GeV to 100 PeV. At 100 TeV deposited energy, for example, the median angular resolution improves by a factor of $1.3$ for throughgoing tracks, by a factor of $1.7$ for showers and by a factor of $2.5$ for starting tracks compared to state-of-the art likelihood reconstructions based on B-splines. While previous machine-learning (ML) efforts have managed to obtain competitive shower resolutions, this is the first time an ML-based method outperforms likelihood-based muon reconstructions above 100 GeV.
Neutrino trident scattering is a rare process in the Standard Model characterized by two charged leptons in the final state. In this work, we investigate the possibility of probing the neutrino trident process using the Scattering and Neutrino Detector (SND) at the Large Hadron Collider during its high - luminosity run (HL - LHC). In addition, we present, for the first time, the predictions for the neutrino trident scattering at SHiP beam - dump experiment, where a similar detector is expected to be installed. We demonstrate that these two experiments probe the process in a complementary energy range. Assuming the upgraded detector configuration, we estimate the cross-sections associated with all possibles leptonic final states in coherent and incoherent processes. The corresponding number of neutrino trident scatterings in the SND at HL-LHC and SHiP are presented. Our results indicate that this process can be observed in these forthcoming experiments for some specific combinations of leptons in the final state.
The next 20 years will be the golden age of flavour physics, with the operation of the LHCb and Belle II experiments. After that an $e^+e^-$ collider could further improve the precision with sizeable $Z$, $W^+W^-$ and $t\bar{t}$ runs.
The FASER experiment is located in the Large Hadron Collider (LHC) complex at CERN, 480 m downstream of the ATLAS collision point and aligned with the beam-collision-axis. The experiment was designed to search for light, weakly-interacting new-particles which could be produced in the LHC collisions, and, for the first-time, to study high-energy neutrinos of all flavours originating at a particle collider. This review article presents the status of FASER up to early-2026. This includes details of the FASER detector design, operation, performance and physics results, as well as briefly mentioning upgrades that have been installed since the start of FASER. In addition, future plans for the experiment are detailed.
Particle accelerator beamline optimization is a high-dimensional control problem traditionally requiring significant expert intervention. We present RLABC (Reinforcement Learning for Accelerator Beamline Control), an open-source Python framework that automatically transforms standard Elegant beamline configurations into reinforcement learning environments. RLABC integrates with the widely-used Elegant beam dynamics simulation code via SDDS-based interfaces, enabling researchers to apply modern RL algorithms to beamline optimization with minimal RL-specific development. The main contribution is a general methodology for formulating beamline tuning as a Markov decision process: RLABC automatically preprocesses lattice files to insert diagnostic watch points before each tunable element, constructs a 57-dimensional state representation from beam statistics, covariance information, and aperture constraints, and provides a configurable reward function for transmission optimization. The framework supports multiple RL algorithms through Stable-Baselines3 compatibility and implements stage learning strategies for improved training efficiency. Validation on a test beamline derived from the VEPP-5 injection complex (37 control parameters across 11 quadrupoles and 4 dipoles) demonstrates that the framework successfully enables RL-based optimization, with a Deep Deterministic Policy Gradient agent achieving 70.3\% particle transmission -- performance matching established methods such as differential evolution. The framework's stage learning capability allows decomposition of complex optimization problems into manageable subproblems, improving training efficiency. The complete framework, including configuration files and example notebooks, is available as open-source software to facilitate adoption and further research.
Accurate reconstruction of recoil-electron directions is critical for enhancing the point-spread function of electron-tracking Compton cameras (ETCCs) in gamma-ray imaging. Although full three-dimensional (3D) readout systems achieve high-precision reconstruction, they are impractical for large-area detectors because of the enormous data volume. This study proposes and demonstrates a practical alternative for inferring the 3D recoil-electron direction in Compton scattering. This method combines a high-resolution two-dimensional optical image, a one-dimensional waveform signal, and a deep-learning-based method through simulations. The proposed method achieved an angular resolution of approximately $44^\circ$ for the recoil-electron direction in the 40-50 keV range, corresponding to an improvement of a factor of about 1.3 compared with our previous strip-readout approach using pseudo-experimental data generated by Geant4 and MAGBOLTZ simulations for an argon-based gas time projection chamber. In addition, the starting-point resolution of the electron track was improved over the previous method across the 5-50 keV electron energy range. These results demonstrate that complementary information from the transverse image and longitudinal waveform can effectively recover the 3D track topology without requiring full 3D readout. The proposed approach provides a realistic pathway for improving ETCC imaging performance.
We investigate primordial black hole (PBH) formation in a cosmological scenario where curvature perturbations follow purely quadratic non-Gaussianity, $ζ= A(φ^2-\langleφ^2\rangle)$, arising from tachyonic instability in multi-component inflationary models. Within an extended Press-Schechter framework based on the compaction function, we derive the probability distribution of the linear compaction function and its asymptotic exponential tail, demonstrating that PBH abundance is exponentially sensitive not only to the amplitude of perturbations but also to the correlation coefficient $ρ$ between the smoothed field and its radial gradient. We further find that, for $A<0$, the spectral width of the curvature power spectrum plays a decisive role in avoiding PBH overproduction: broad spectra yield mildly negative $ρ$ and fail to suppress PBH formation, while sufficiently narrow spectra drive $ρ\to -1$, resulting in an exponential suppression while maintaining a sizable gravitational-wave signal. Thermal inflation provides a useful benchmark scenario with asteroid-mass PBH dark matter and high-frequency scalar-induced gravitational waves potentially detectable by future space-based interferometers, but its typically broad spectra make it challenging to reconcile PTA observations with PBH constraints.
Extreme emission-line galaxies (EELGs) probe chemical enrichment in low-mass, bursty systems where star formation, feedback, and gas accretion are poorly constrained. Using DESI DR1, we select 23 nearby EELGs with detections of 19 ionic species (S/N $\geq$ 4), stellar masses $ M_* \geq 10^7 M_{\odot}$, and extreme H$α$ and [O III] 5007 equivalent widths (EW $\geq$ 500 Angstrom). We infer non-parametric star-formation histories and fit a Bayesian single-zone chemical-evolution model to O, N, Ne, S, and Ar, allowing time-dependent star-formation efficiency, outflow mass loading, and evolving inflow metallicity. We find short depletion timescales and large mass-loading factors, indicating rapid gas cycling in a burst-driven, non-equilibrium regime, with depletion times below Kennicutt-Schmidt expectations. Star-formation efficiency and outflows are well constrained, while inflow metallicity is weaker due to degeneracies with metal production. Abundance ratios isolate physical drivers: star-formation efficiency sets evolutionary tracks, outflows regulate metal retention and X/O normalization, and inflow metallicity sets baseline enrichment. N/O strongly constrains burst timing and gas flows, Ne/O remains nearly invariant, and S/O and Ar/O show intermediate sensitivity. These results demonstrate that multi-element abundances provide a direct probe of baryon-cycle processes in extreme low-mass starbursts.
Central exclusive diffractive production in proton-proton collisions at hadron colliders is characterised by hadronic activity at or close to midrapidity, and by the two forward scattered protons, or their remnants. In such events, no particles are produced between the midrapidity system and the forward beam particles. These events can hence be identified with appropriately placed detectors for measuring the forward scattered protons, or their remnants, and a detector system covering the midrapidity range. At the energies of the LHC, central diffractive production in proton-proton collisions is dominated by pomeron-pomeron fusion. The description of the pomeron within the Regge approach is summarized, and the feasibility of identifying pseudoscalar mesons $η,η'$ in pomeron-pomeron fusion is studied for determining the spin structure of the pomeron.
This set of notes complements the lectures and recitation sessions discussed in the following graduate schools: HUGS at Jefferson Lab (years 2018, 2019, 2021), the International School and Workshop on Probing Hadron Structure at the Electron-Ion Collider at ICTS (2024), Frontiers in Nuclear and Hadronic Physics at GGI (2025), and the International Workshop and School on Hadron Structure and Strong Interactions at Nanjing University (2025).
We explore the sensitivity of neutrino observatories and direct dark matter detection experiments to boosted sub-GeV dark matter produced by inelastic cosmic ray collisions in the atmosphere. We revisit earlier approaches and extend the sensitivity to higher mass by modeling the proton bremsstrahlung production mode via initial state radiation. For vector-mediated dark matter models, the peak of the cosmic ray flux allows for enhanced DM production for mediator masses near the $ρ/ω$ resonances. We determine and compare the ensuing sensitivity of direct detection experiments LZ and PandaX-4T and the neutrino detectors Borexino and Super-K.
Neutrino telescopes have detected astrophysical neutrinos with energies up to ${O}(100)$ PeV. Several current and proposed experiments aim to observe neutrinos at even higher energies, with the goal of detecting cosmogenic neutrinos. This increase in neutrino energy makes tests of Lorentz invariance violation (LIV) particularly appealing, since the effects of higher-dimension LIV operators on neutrino propagation grow rapidly with energy. In this work, we investigate the potential of the upcoming experiments GRAND and POEMMA to probe LIV in the neutrino sector through the detection of ultra-high-energy tau neutrinos. We generate the cosmogenic neutrino flux using SimProp and interface it with a calculation of neutrino flavor transition probabilities in the presence of LIV effects. Deviations from standard flavor transition probabilities manifest as changes in the expected tau neutrino event rates at GRAND and POEMMA. We first consider the case with a single nonzero LIV operator of various dimensions, and find that the projected sensitivities exceed existing limits from lower-energy probes by orders of magnitude. We then explore scenarios with multiple nonzero LIV parameters and show that their interplay can significantly modify the sensitivities compared to the single-parameter case. Overall, we find that upcoming observations of ultra-high-energy tau neutrinos will place some of the most stringent constraints on LIV.
In this Letter we explore the modelling of hadron production in electromagnetic ion dissociation (EMD) processes in high-energy ultraperipheral collisions at LHC energies. Since EMD can accompany exclusive particle production in these interactions, we demonstrate that the resulting hadrons can break the exclusivity vetos typically imposed by experiments. As two representative examples, we calculate the impact on existing LHC measurements of exclusive muon pair production ($γγ\toμμ$) and exclusive coherent $J/ψ$ production. We demonstrate that accounting for this effect resolves long-standing tensions between theoretical predictions and experimental measurements.
A dark matter sector composed of magnetic monopoles of a dark U(1) symmetry having a small kinetic mixing with the Standard Model photon has a rich and interesting phenomenology. The model in itself is also of theoretical interest. Based on the temperature of the dark sector and scale of spontaneous symmetry breaking for this U(1), three phenomenologically distinct cases for this model of dark matter are discussed. In all cases, constraints on dark matter self-interactions are translated into constraints on the model parameters. As the magnetic monopoles acquire a small visible magnetic charge, the survival of galactic magnetic fields, known as the Parker effect, places further constraints on the mixing between the dark and visible sectors.
It has been shown that there are an infinite set of asymptotic symmetries in quantum gravity and QED, and this has been extended to dressed states in some cases. Here we rederive these statements in terms of detectors in order to clarify, confirm, and generalize these results to include external hard gravitons. Using detectors and including the full t dependence in Faddeev-Kulish dressings allows us to correct discrepancies in the literature and make new statements. We show that Faddeev-Kulish dressings correctly encode the memory effect in the 'in' and 'out' scattering Fock spaces. We find a physical contribution to the memory eigenvalues arising from the dressings in both cases.
We propose a search at the LHC for GeV-scale particles coupling predominantly to light quarks based on low-multiplicity jets. The search targets production in association with a hard photon and uses the feature that a light gauge-singlet can only decay into a small number of hadronic channels, yielding jets with anomalously low charged-track multiplicity and mass compared to QCD jets at the same transverse momentum. We determine the sensitivity to scalar and pseudoscalar couplings to up-quarks, and suggest a data-driven estimate that reduces the sensitivity to jet modeling uncertainties. This search extends the reach to hadronically-coupled particles into a previously inaccessible regime.
The anomalous magnetic moment of the tau lepton, $a_τ$, represents a fundamental test of the Standard Model (SM) and a high-sensitivity probe for New Physics in the third generation of leptons. Due to the tau's extremely short lifetime, traditional spin-precession measurements remain inaccessible, necessitating innovative experimental strategies at high-energy colliders. This review provides a comprehensive overview of the current experimental landscape, highlighting the recent paradigm shift from LEP-era constraints to the unprecedented precision reached at the LHC. We emphasize the importance of Ultra-Peripheral Heavy-Ion Collisions (UPCs), which act as a ``photon-photon collider'' of extreme intensity. By leveraging the $Z^4$ enhancement of the coherent photon flux in Lead-Lead ($PbPb$) interactions, these collisions provide a theoretically robust ``quasi-static'' environment. These results are critically compared with the latest measurements from proton-proton collisions, including the recent CMS observation of the $γγ\to ττ$ process and the ATLAS constraints from the high-mass Drell-Yan tail. We evaluate their complementarity and the challenges related to Effective Field Theory validity at the TeV scale. Finally, we outline the future prospects for $a_τ$ at Belle II and the Future Circular Collider (FCC) stages. While FCC-hh in $PbPb$ mode provides a theoretically clean environment, its sensitivity remains limited to $\mathcal{O}(10^{-2})$. Conversely, the next generation of lepton facilities, specifically Belle II and FCC-ee, aims for the $\mathcal{O}(10^{-5})$ level, required to probe SM electroweak loop corrections. Long-term projections for a high-energy Muon Collider suggest a potential reach of $\mathcal{O}(10^{-6})$.
We present the results of continuum-extrapolated lattice simulations of quantum chromodynamics (QCD) above the crossover temperature and for unprecedentedly high baryon densities at the physical point, employing the complex Langevin equation. In particular, we determine the QCD equation of state by computing the baryon density as well as the pressure as functions of the baryon chemical potential and the temperature. Potential issues with wrong convergence of complex Langevin dynamics are under control and we indeed find agreement with previous lattice studies working at smaller chemical potentials, as well as with perturbative hard-thermal-loop calculations at high temperatures.
Using experimental information on branching ratios as well as direct and mixing-induced CP asymmetries, we perform a data-driven analysis of charmless non-leptonic $B \to PP$ decays, where $P$ is any of the light pseudoscalar mesons. Implementing flavour-$SU(3)$ breaking at the level of transition form factors, decay constants and phase space factors, we find a good fit to the current experimental data. Our best-fit point materializes in QCD-factorization amplitudes whose central values resemble many features of the dynamical predictions obtained within the QCD factorization framework. Moreover, we do not find any strong indications that the size of annihilation amplitudes is numerically enhanced beyond the naïve $Λ_{\textrm{QCD}}/m_b$ scaling. Subsequently, we address a number of phenomenological applications, among which are various flavour puzzles that have been persisting in non-leptonic $B$ decays for quite some time.
In this paper, we study the balance functions for pions, kaons, and protons in Pb--Pb collisions at $\sqrt{s_{\mathrm{NN}}} = 2.76$ TeV using the PYTHIA 8.3 + Angantyr model. The balance function is evaluated through two-particle azimuthal angular correlations $(Δφ, Δη)$ between particle and antiparticle. Correlations are constructed for $ππ$, $KK$, and $pp$, and their dependence on collision centrality is investigated. The results indicate that the balance function for pions is narrower compared to kaons and protons. Notably, the pion balance function width decreases from peripheral to central collisions, while the widths for kaons and protons remain nearly unchanged. For the Monash 2013 tune used in this study, PYTHIA 8.3 + Angantyr describes peripheral collisions reasonably well but does not quantitatively reproduce central Pb--Pb data. This suggests that an improved description of central Pb--Pb collisions may require a dedicated heavy-ion tuning of the Angantyr framework. We further explore the influence of resonance decays and collective effects by incorporating multi-parton interactions and color reconnection into the analysis. Owing to resonance effects and Bose--Einstein correlations, a dip at $Δη= 0$ and $Δφ= 0$ is observed for pions and kaons.
Motivated by recent experimental observations of the flavour-exotic $T^*_{cs0}(2870)^0$ and $T^*_{c\bar{s}0}(2900)$, we present the first lattice QCD study of coupled-channel scattering of a charm meson with a light meson in the flavour-exotic sectors at the $SU(3)_f$ flavour symmetric point. Utilising five volumes with $m_π\approx 700$ MeV and employing large bases of meson-meson operators, finite-volume spectra are extracted and used to constrain infinite-volume scattering amplitudes with $J^P = \{0, 1, 2, 3, 4\}^+$ via the Lüscher formalism. In the flavour $\mathbf{6}$ sector, each $S$-wave channel considered is found to be attractive with the scattering amplitudes having an associated pole singularity on an unphysical sheet below threshold, giving six flavour-exotic poles in the energy region constrained. In $J^P = 0^+$ there is a virtual bound state and a resonance. The latter is identified with the $T^*_{cs0}(2870)^0$ and $T^*_{c\bar{s}0}(2900)$, appearing as one state in the $SU(3)_f$ flavour symmetric limit, and suggests the existence of an isospin-$\frac{1}{2}$ partner. In $J^P =1^+$ there are three poles, one of which is identified as a $J^P =1^+$ partner of the $T^*_{cs0}(2870)^0$ and $T^*_{c\bar{s}0}(2900)$, and $J^P =2^+$ contains one pole which is identified as their $J^P =2^+$ partner. Only mild interactions and no poles are seen in the $J^P = \{3, 4\}^+$ scattering amplitudes. In the flavour $\overline{\mathbf{15}}$ sector, weak interactions are observed in $J^P = \{0, 1, 2, 3, 4\}^+$ with no well-determined poles in the energy region constrained.
In this paper, we calculate the decay widths and branching fractions for the decays $Z \to J/ψ+Υ(nS)$ ($n=1,2,3$) at future super $Z$ factory and at the CEPC/FCC-ee, including both the relativistic and QCD corrections within the framework of nonrelativistic QCD. Both the relativistic and QCD corrections are found to be large and negative. Compared to the leading-order results, the decay widths are significantly reduced by the higher-order corrections. Therefore, it is essential to take these corrections into account for a reliable estimation. For a high-luminosity electron positron collider running around the $Z$-pole, sizable event rates could be produced from these rare decay channels due to the $Z$-boson resonance effect.
In the first part of the talk, I review general properties of $SO(10)$-inspired leptogenesis. This high-scale leptogenesis scenario is based on the simple assumption that the neutrino Dirac mass matrix is not too different from the up quark mass matrix. After showing how this necessarily implies a production of the asymmetry from the next-to-lightest right handed neutrino decays, so-called $N_2$-leptogenesis, I discuss how this results into important testable constraints on low energy neutrino parameters. In particular inverted ordering is not viable if strict $SO(10)$-inspired conditions are assumed. This is an important test in view of the expected results from the JUNO experiment. I also discuss how a subset of the $SO(10)$-inspired leptogenesis solutions realises strong thermal leptogenesis, where the final asymmetry is independent of the initial conditions. In this case a signal might be discovered by next generation $0νββ$ decay experiments. In the second part, I present some new results from \cite{DiBari:2025zlv}, where the impact of flavour coupling on $SO(10)$-inspired leptogenesis has been studied in detail.
Neutrino trident scattering is a rare process in the Standard Model characterized by two charged leptons in the final state. In this work, we investigate the possibility of probing the neutrino trident process using the Scattering and Neutrino Detector (SND) at the Large Hadron Collider during its high - luminosity run (HL - LHC). In addition, we present, for the first time, the predictions for the neutrino trident scattering at SHiP beam - dump experiment, where a similar detector is expected to be installed. We demonstrate that these two experiments probe the process in a complementary energy range. Assuming the upgraded detector configuration, we estimate the cross-sections associated with all possibles leptonic final states in coherent and incoherent processes. The corresponding number of neutrino trident scatterings in the SND at HL-LHC and SHiP are presented. Our results indicate that this process can be observed in these forthcoming experiments for some specific combinations of leptons in the final state.
Jet quenching provides a valuable measure of the opacity of the quark-gluon plasma (QGP) produced in high-energy heavy-ion collisions. However, substantial suppression of charged hadron spectra is observed in highly peripheral collisions, despite the expectation of negligible jet-QGP interactions in this regime. To address this, we develop a HIJING-based initial condition model that accounts for the impact parameter dependence of both inelastic nucleon-nucleon (NN) collisions and the number of hard partonic scatterings per inelastic NN collision. This dependence introduces a geometric bias effect on the jet yield within a given centrality class of nucleus-nucleus (AA) collisions, suppressing the high transverse momentum hadron spectrum in peripheral collisions due to dilute nucleon overlap at large AA impact parameters. By combining this improved initial condition model with a linear Boltzmann transport model for jet-QGP interactions, we obtain a satisfactory description of the centrality dependence of charged hadron suppression in Pb+Pb collisions at $\sqrt{s_\mathrm{NN}}=5.02$ TeV.
We derive an exact solitary wave solution for the $\PTb$-symmetric nonlinear Dirac equation with a scalar-scalar interaction. We consider a power-law nonlinearity of the form $|\barΨ\,Ψ|^{k}\,Ψ$ for positive values of $k$. The system's energy is conserved despite the presence of a gain-loss term, which is quantified by the parameter $Λ$. We show that the $\PTb$-transition point is defined by the solution's existence condition and is independent of the nonlinearity exponent $k$. Furthermore, momentum is conserved, although neither the canonical momentum nor the charge is a conserved quantity. A notable result is that the stationary solution, obtained from the continuity equations, exhibits nonzero momentum in its rest frame. We also derive a moving soliton solution, where the gain-loss parameter allows the soliton's velocity to be precisely chosen so that the moving soliton possesses zero momentum. Finally, we establish that the presence of a gain-loss mechanism and higher-order nonlinearity restrict the stability domain of the solutions.
Fibre inflation is one of the most attractive models realized in the type IIB orientifold compactification. It is embedded in the framework of L(arge) V(olume) S(cenarios) using a class of compactifying Calabi-Yau (CY) threefolds having K3-fibration. The standard single-field fibre inflation is driven by a fibre modulus which needs to travel a trans-Planckian distance of the order of ${\cal O}(5-8)$M$_p$ in the effective moduli space. The global embedding attempts using concrete CY orientifold setups have shown that Kähler cone conditions can generically induce some significantly tight bounds on the inflaton range, especially in the presence of a Swiss-Cheese structure via an exceptional rigid divisor in the CY threefold. Such field range bounds usually obstruct the inflationary plateau, leading to insufficient number of efolds during the inflationary dynamics. In this context, we review our recent work about the possibility of assisting multiple fibre moduli such that the burden of traveling the required trans-Planckian distance could be shared by multiple fields, and successful inflation could be realized before hitting (or being too close to) their respective individual Kähler cone boundaries.
We review the state-of-the-art knowledge of IR singularities in multileg QCD amplitudes, identifying the key reasons for the remarkable simplicity of the soft anomalous dimension. We then present a novel strategy to compute this quantity using a lightcone expansion of correlators of semi-infinite Wilson lines by the Method of Regions. Recently, this strategy allowed us to determine the three-loop soft anomalous dimension for amplitudes consisting of a single massive coloured particle with any number of massless ones. It opens the way to computing this quantity for amplitudes involving two heavy particles at three loops and potentially going to higher loop orders.
The Dark Ages and the Cosmic Dawn are an untapped well of information about the particle physics properties of dark matter, which may become accessible with future radio telescopes able to probe the 21-cm signal from atomic hydrogen. In this work we study the impact on cosmological observables of a dark matter subcomponent composed of TeV-scale particles that decay into electrons, photons or neutrinos with a lifetime shorter than the age of the universe. We re-evaluate constraints from the Cosmic Microwave Background (CMB) on these scenarios using the most recent data sets and estimate the sensitivity of future detections of the global 21-cm signal. Our main result is that the latter is potentially more sensitive to the effects of decaying dark matter with a lifetime $τ\gtrsim 10^{15} \, \mathrm{s}$. This effect is strongest for the case of decays into neutrinos due to the different spectral distribution of the injected electromagnetic energy. For DM masses well above the TeV-scale, these differences become less pronounced and the sensitivity of both the CMB and the 21-cm signal depend primarily on the total amount of injected electromagnetic energy.
We investigate the dynamics of a cosmological first-order phase transition (FOPT) and the associated stochastic gravitational wave background (SGWB) in a hidden strongly coupled sector described by an extended Nambu--Jona-Lasinio (NJL) model with $N_f = 3$ fermion flavors. The model incorporates a CP-violating six-fermion 't Hooft interaction, an explicit chiral symmetry breaking mass term, and chirally symmetric eight-fermion operators that stabilize the vacuum. We perform a multi-field analysis of the tunneling dynamics, going beyond conventional single-field approximations. The interplay between explicit symmetry breaking and CP violation induces a vacuum misalignment, resulting in a curved tunneling path and a spatially varying CP-violating background across the bubble wall. Furthermore, the intrinsically rapid transition rate characteristic of the NJL framework ($β/H \sim \mathcal{O}(10^4)$ in the parameter regions considered) leads to a strong suppression of gravitational wave production. As a result, the predicted SGWB remains well below the projected sensitivities of future space-based interferometers. Finally, the explicit symmetry breaking mass introduces a crucial energy bias between competing vacua, triggering the prompt collapse of transient domain wall configurations and thereby ensuring the cosmological viability of the model.
Motivated by the first oscillation results from JUNO, we study the phenomenological viability of texture zeros in the Dirac neutrino mass matrix. The improved precision on the solar mixing angle $\sin^2{θ_{12}}$ and the solar mass-squared difference $Δm_{21}^2$ provide a stringent probe for scrutinizing predictive texture zero frameworks. We perform a systematic scan of the allowed parameter space for two-zero textures, identifying sharp correlations among oscillation observables arising from the reduced parameter space. Our analysis reveals that current JUNO measurements impose stringent constraints on the viable texture structures. In particular, although textures $C$, $A_2$, and $A_1$ were previously viable, current JUNO data strongly disfavor $C$, leaving only textures $A_2$ and $A_1$ compatible with the data. These findings underscore the remarkable sensitivity of Dirac texture zero scenarios to the solar sector.
We investigate the spin resolved vortex properties of electron positron pairs created from vacuum in time delayed, two color electromagnetic fields. By treating the temporal delay G as a continuous tuning parameter, we reveal a dynamic transition from interference-dominated domain patterns at G=0 to the nucleation of quantized vortex lattices at G=0.5. These topological structures exhibit a staggered arrangement analogous to von Karman vortex streets in fluid dynamics. We demonstrate that the momentum-space morphology is strictly governed by spin orbit selection rules, i.e., parallel spin configurations enforce a dipole-like connectivity, while anti-parallel configurations resolve into distinct quadrupole structures. This difference originates from the conservation of total angular momentum Jz, where the spin projection determines the required orbital angular momentum Lz of the created pairs. At large delays (G greater than 1), macroscopic vortex coherence dissolves into a chaotic phase landscope due to multi-channel interference, yet the spin-dependent nodal geometries remain robust. Our findings suggest that these topological signatures provide a high-fidelity diagnostic for the quantum dynamics of vacuum excitations in strong field QED.
Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and $\sim$200M tokens of QFT reasoning traces.
We consider what we refer to as {Decision-Focused Federated Learning (DFFL)} framework, i.e., a predict-then-optimize approach employed by a collection of agents, where each agent's predictive model is an input to a downstream linear optimization problem, and no direct exchange of raw data is allowed. Importantly, clients can differ both in objective functions and in feasibility constraints. We build on the well-known SPO+ approach and develop heterogeneity bounds for the SPO+ surrogate loss in this case. This is accomplished by employing a support function representation of the feasible region, separating (i) objective shift via norm distances between the cost vectors and (ii) feasible-set shift via shape distances between the constraint sets. In the case of strongly convex feasible regions, sharper bounds are derived due to the optimizer stability. Building on these results, we define a heuristic local-versus-federated excess risk decision rule which, under SPO+ risk, gives a condition for when federation can be expected to improve decision quality: the heterogeneity penalty must be smaller than the statistical advantage of pooling data. We implement a FedAvg-style DFFL set of experiments on both polyhedral and strongly convex problems and show that federation is broadly robust in the strongly convex setting, while performance in the polyhedral setting degrades primarily with constraint heterogeneity, especially for clients with many samples. In other words, especially for the strongly convex case, an approach following a direct implementation of FedAvg and SPO+ can still yield promising performance even when the downstream optimization problems are noticeably different.
Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly understood. In this work, we represent stochastic optimizers as random dynamical systems, which often converge to a fractal attractor set (rather than a point) with a smaller intrinsic dimension. Building on this connection and inspired by Lyapunov dimension theory, we introduce a novel notion of dimension, coined the `sharpness dimension', and prove a generalization bound based on this dimension. Our results show that generalization in the chaotic regime depends on the complete Hessian spectrum and the structure of its partial determinants, highlighting a complexity that cannot be captured by the trace or spectral norm considered in prior work. Experiments across various MLPs and transformers validate our theory while also providing new insights into the recently observed phenomenon of grokking.
We establish central and non-central limit theorems for sequences of functionals of the Gaussian output of an infinitely-wide random neural network on the d-dimensional sphere . We show that the asymptotic behaviour of these functionals as the depth of the network increases depends crucially on the fixed points of the covariance function, resulting in three distinct limiting regimes: convergence to the same functional of a limiting Gaussian field, convergence to a Gaussian distribution, convergence to a distribution in the Qth Wiener chaos. Our proofs exploit tools that are now classical (Hermite expansions, Diagram Formula, Stein-Malliavin techniques), but also ideas which have never been used in similar contexts: in particular, the asymptotic behaviour is determined by the fixed-point structure of the iterative operator associated with the covariance, whose nature and stability governs the different limiting regimes.
In [97,99,100], an fl-RDT framework is introduced to characterize \emph{statistical computational gaps} (SCGs). Studying \emph{symmetric binary perceptrons} (SBPs), [100] obtained an \emph{algorithmic} threshold estimate $α_a\approx α_c^{(7)}\approx 1.6093$ at the 7th lifting level (for $κ=1$ margin), closely approaching $1.58$ local entropy (LE) prediction [18]. In this paper, we further connect parametric RDT to overlap gap properties (OGPs), another key geometric feature of the solution space. Specifically, for any positive integer $s$, we consider $s$-level ultrametric OGPs ($ult_s$-OGPs) and rigorously upper-bound the associated constraint densities $α_{ult_s}$. To achieve this, we develop an analytical union-bounding program consisting of combinatorial and probabilistic components. By casting the combinatorial part as a convex problem and the probabilistic part as a nested integration, we conduct numerical evaluations and obtain that the tightest bounds at the first two levels, $\barα_{ult_1} \approx 1.6578$ and $\barα_{ult_2} \approx 1.6219$, closely approach the 3rd and 4th lifting level parametric RDT estimates, $α_c^{(3)} \approx 1.6576$ and $α_c^{(4)} \approx 1.6218$. We also observe excellent agreement across other key parameters, including overlap values and the relative sizes of ultrametric clusters. Based on these observations, we propose several conjectures linking $ult$-OGP and parametric RDT. Specifically, we conjecture that algorithmic threshold $α_a=\lim_{s\rightarrow\infty} α_{ult_s} = \lim_{s\rightarrow\infty} \barα{ult_s} = \lim_{r\rightarrow\infty} α_{c}^{(r)}$, and $α_{ult_s} \leq α_{c}^{(s+2)}$ (with possible equality for some (maybe even all) $s$). Finally, we discuss the potential existence of a full isomorphism connecting all key parameters of $ult$-OGP and parametric RDT.
We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set. Our approach better models the real-world setting where the cost of influencers varies and advertisers want to find the best value for their overall social advertising budget. We propose an algorithm assuming an independent cascade diffusion model and edge level semi-bandit feedback, and provide both theoretical and experimental results. Our analysis is also valid for the cardinality constraint setting and improves the state of the art regret bound in this case.
The goal of machine learning is to find models that minimize prediction error on data that has not yet been seen. Its operational paradigm assumes access to a dataset $S$ and articulates a scheme for evaluating how well a given model performs on an arbitrary sample. The sample can be $S$ (in which case we speak of ``in-sample'' performance) or some entirely new $S'$ (in which case we speak of ``out-of-sample'' performance). Traditional analysis of generalization assumes that both in- and out-of-sample data are i.i.d.\ draws from an infinite population. However, these probabilistic assumptions cannot be verified even in principle. This paper presents an alternative view of generalization through the lens of sensitivity analysis of solutions of optimization problems to perturbations in the problem data. Under this framework, generalization bounds are obtained by purely deterministic means and take the form of variational principles that relate in-sample and out-of-sample evaluations through an error term that quantifies how close out-of-sample data are to in-sample data. Statistical assumptions can then be used \textit{ex post} to characterize the situations when this error term is small (either on average or with high probability).
Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.
Federated prognostics enable clients (e.g., companies, factories, and production lines) to collaboratively develop a failure time prediction model while keeping each client's data local and confidential. However, traditional federated models often assume homogeneity in the degradation processes across clients, an assumption that may not hold in many industrial settings. To overcome this, this paper proposes a personalized federated prognostic model designed to accommodate clients with heterogeneous degradation processes, allowing them to build tailored prognostic models. The prognostic model iteratively facilitates the underlying pairwise collaborations between clients with similar degradation patterns, which enhances the performance of personalized federated learning. To estimate parameters jointly using decentralized datasets, we develop a federated parameter estimation algorithm based on proximal gradient descent. The proposed approach addresses the limitations of existing federated prognostic models by simultaneously achieving model personalization, preserving data privacy, and providing comprehensive failure time distributions. The superiority of the proposed model is validated through extensive simulation studies and a case study using the turbofan engine degradation dataset from the NASA repository.
In uncertainty quantification, evaluating sensitivity measures under specific conditions (i.e., conditional Sobol' indices) is essential for systems with parameterized responses, such as spatial fields or varying operating conditions. Traditional approaches often rely on point-wise modeling, which is computationally expensive and may lack consistency across the parameter space. This paper demonstrates that for a pre-trained global Polynomial Chaos Expansion (PCE) model, the analytical conditional Sobol' indices are inherently embedded within its basis functions. By leveraging the tensor-product property of PCE bases, we reformulate the global expansion into a set of analytical coefficient fields that depend on the conditioning variables. Based on the preservation of orthogonality under conditional probability measures, we derive closed-form expressions for conditional variances and Sobol' indices. This framework bypasses the need for repetitive modeling or additional sampling, transforming conditional sensitivity analysis into a purely algebraic post-processing step. Numerical benchmarks indicate that the proposed method ensures physical coherence and offers superior numerical robustness and computational efficiency compared to conventional point-wise approaches.
Estimating the number of components is a fundamental challenge in unsupervised learning, particularly when dealing with high-dimensional data with many components or severely imbalanced component sizes. This paper addresses this challenge for classical Gaussian mixture models. The proposed estimator is simple: center the data, compute the singular values of the centered matrix, and count those above a threshold. No iterative fitting, no likelihood calculation, and no prior knowledge of the number of components are required. We prove that, under a mild separation condition on the component centers, the estimator consistently recovers the true number of components. The result holds in high-dimensional settings where the dimension can be much larger than the sample size. It also holds when the number of components grows to the smaller of the dimension and the sample size, even under severe imbalance among component sizes. Computationally, the method is extremely fast: for example, it processes ten million samples in one hundred dimensions within one minute. Extensive experimental studies confirm its accuracy in challenging settings such as high dimensionality, many components, and severe class imbalance.
Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data and its corresponding graph Laplacian matrix. However, the graph Laplacian matrix depends heavily on the prespecified similarity metric and may lead to inappropriate penalties when dealing with redundant or noisy input variables. To address the above issues, this paper proposes a new \textit{Semi-Supervised Meta Additive Model (S$^2$MAM) based on a bilevel optimization scheme that automatically identifies informative variables, updates the similarity matrix, and simultaneously achieves interpretable predictions. Theoretical guarantees are provided for S$^2$MAM, including the computing convergence and the statistical generalization bound. Experimental assessments across 4 synthetic and 12 real-world datasets, with varying levels and categories of corruption, validate the robustness and interpretability of the proposed approach.
We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.
Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure of transformer blocks, layer-wise dynamics across multiple LLM architectures and scales are well-approximated by locally-linear models. Exploiting this property, we model LLM inference as a linear time-varying dynamical system and adapt the classical linear quadratic regulator to compute feedback controllers using layer-wise Jacobians, steering activations toward desired semantic setpoints in closed-loop with minimal computational overhead and no offline training. We also derive theoretical bounds on setpoint tracking error, enabling formal guarantees on steering performance. Using a novel adaptive semantic feature setpoint signal, our method yields robust, fine-grained behavior control across models, scales, and tasks, including state-of-the-art modulation of toxicity, truthfulness, refusal, and arbitrary concepts, surpassing baseline steering methods. Our code is available at: https://github.com/trustworthyrobotics/lqr-activation-steering
We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only first-order in the grid width. We estimate the time-dependent generator from multi-step transitions using moment-matching coefficients that cancel lower-order truncation terms, and combine the resulting surrogate with backward regression. The main theory gives an end-to-end decomposition into generator misspecification, projection error, pooling bias, finite-sample error, and start-up error, together with a decision-frequency regime map explaining when higher-order gains should be visible. Across calibration studies, four-scale benchmarks, feature and start-up ablations, and gain-mismatch stress tests, the second-order estimator consistently improves on the Bellman baseline and remains stable in the regime where the theory predicts visible gains. These results position high-order generator regression as an interpretable continuous-time policy-evaluation method with a clear operating region.
The rise of IoT devices and the uptake of cloud computing have informed a new era of data-driven intelligence. Traditional centralized machine learning models that require a large volume of data to be stored in a single location have therefore become more susceptible to data breaches, privacy violations, and regulatory non-compliance. This report presents a thorough examination of the merging of Federated Learning (FL) and blockchain technology in a cloud-edge setting, demonstrating it as an effective solution to the stated concerns. We are proposing a detailed four-dimensional architectural categorization that meticulously assesses coordination frameworks, consensus algorithms, data storage practices, and trust models that are significant to these integrated systems. The manuscript presents a comprehensive comparative examination of two cutting-edge frameworks: the Multi-Objectives Reinforcement Federated Learning Blockchain (MORFLB), which is designed for intelligent transportation systems, and the Federated Blockchain-IoT Framework for Sustainable Healthcare Systems (FBCI-SHS), elucidating their distinctive contributions and inherent limitations. Lastly, we engage in a thorough evaluation of the literature that integrates a comparative perspective on current frameworks to discern the singular nature of this research within existing knowledge systems. The manuscript culminates in delineating the principal challenges and offering a strategic framework for prospective research trajectories, emphasizing the advancement of adaptive, resilient, and standardized BCFL systems across diverse application domains.
Self-play has recently emerged as a promising paradigm to train Large Language Models (LLMs). In self-play, the target LLM creates the task input (e.g., ask a question), which it then addresses itself by producing a task output (e.g., give an answer). A reward model evaluates the output, and the rewards are then used to train the LLM, typically via Reinforcement Learning (RL). Self-play incurs minimal supervision costs, and this is especially helpful for post-training LLMs, which require high-quality input-output pairs that traditionally have to be written by humans or expensive proprietary models. However, existing work explores self-play only for verifiable tasks such as math and coding. Instead, we seek to extend it to more realistic open-ended tasks. In particular, we propose POP, a self-play framework that uses the same LLM to synthesize evaluation rubrics, along with input-output pairs, for each example. The rubric is then used to evaluate outputs and train the model. We further ground the framework on a content-rich pretraining corpus to (1) ensure a generation-verification gap and reduce reward hacking, and (2) prevent mode collapse. On Qwen-2.5-7B, POP increases performance of both pretrained and instruction-tuned models, across different tasks ranging from long-form Healthcare QA to creative writing and instruction following.
Causal discovery through experimentation and intervention is fundamental to robust problem solving. It requires not just updating beliefs within a fixed framework but revising the hypothesis space itself, a capacity current AI agents lack when evidence demands representations they have not previously constructed. We extend the blicket detector paradigm from developmental science to test this capacity in AI agents equipped with architectural scaffolding that targets hypothesis-space restructuring. Our compositional architecture has two discrete components: context graphs, which structure exploration as typed state machines, and dynamic behaviors, which monitor for evidence that the current hypothesis space is inadequate and expand it at runtime. Across 1,085 experimental trials, these components make orthogonal contributions: context graphs drive reasoning quality within the post-switch hypothesis space, accounting for 94\% of the accuracy gain, while dynamic behaviors drive reasoning eligibility by detecting regime changes and preventing premature commitment to outdated hypotheses.
We consider what we refer to as {Decision-Focused Federated Learning (DFFL)} framework, i.e., a predict-then-optimize approach employed by a collection of agents, where each agent's predictive model is an input to a downstream linear optimization problem, and no direct exchange of raw data is allowed. Importantly, clients can differ both in objective functions and in feasibility constraints. We build on the well-known SPO+ approach and develop heterogeneity bounds for the SPO+ surrogate loss in this case. This is accomplished by employing a support function representation of the feasible region, separating (i) objective shift via norm distances between the cost vectors and (ii) feasible-set shift via shape distances between the constraint sets. In the case of strongly convex feasible regions, sharper bounds are derived due to the optimizer stability. Building on these results, we define a heuristic local-versus-federated excess risk decision rule which, under SPO+ risk, gives a condition for when federation can be expected to improve decision quality: the heterogeneity penalty must be smaller than the statistical advantage of pooling data. We implement a FedAvg-style DFFL set of experiments on both polyhedral and strongly convex problems and show that federation is broadly robust in the strongly convex setting, while performance in the polyhedral setting degrades primarily with constraint heterogeneity, especially for clients with many samples. In other words, especially for the strongly convex case, an approach following a direct implementation of FedAvg and SPO+ can still yield promising performance even when the downstream optimization problems are noticeably different.
We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $ρ$-replicable if two executions using shared internal randomness but independent reward realizations, produce the same action sequence with probability at least $1-ρ$. Prior work is primarily elimination-based and, in linear bandits with infinitely many actions, relies on discretization, leading to suboptimal dependence on the dimension $d$ and $ρ$. We develop optimistic alternatives for both settings. For stochastic multi-armed bandits, we propose RepUCB, a replicable batched UCB algorithm and show that it attains a regret $O\!\left(\frac{K^2\log^2 T}{ρ^2}\sum_{a:Δ_a>0}\left(Δ_a+\frac{\log(KT\log T)}{Δ_a}\right)\right)$. For stochastic linear bandits, we first introduce RepRidge, a replicable ridge regression estimator that satisfies both a confidence guarantee and a $ρ$-replicability guarantee. Beyond its role in our bandit algorithm, this estimator and its guarantees may also be of independent interest in other statistical estimation settings. We then use RepRidge to design RepLinUCB, a replicable optimistic algorithm for stochastic linear bandits, and show that its regret is bounded by $\widetilde{O}\!\big(\big(d+\frac{d^3}ρ\big)\sqrt{T}\big)$. This improves the best prior regret guarantee by a factor of $O(d/ρ)$, showing that our optimistic algorithm can substantially reduce the price of replicability.
Large language models are increasingly deployed as autonomous diagnostic agents, yet they conflate two fundamentally different capabilities: natural-language communication and probabilistic reasoning. We argue that this conflation is an architectural flaw, not an engineering shortcoming. We introduce BMBE (Bayesian Medical Belief Engine), a modular diagnostic dialogue framework that enforces a strict separation between language and reasoning: an LLM serves only as a sensor, parsing patient utterances into structured evidence and verbalising questions, while all diagnostic inference resides in a deterministic, auditable Bayesian engine. Because patient data never enters the LLM, the architecture is private by construction; because the statistical backend is a standalone module, it can be replaced per target population without retraining. This separation yields three properties no autonomous LLM can offer: calibrated selective diagnosis with a continuously adjustable accuracy-coverage tradeoff, a statistical separation gap where even a cheap sensor paired with the engine outperforms a frontier standalone model from the same family at a fraction of the cost, and robustness to adversarial patient communication styles that cause standalone doctors to collapse. We validate across empirical and LLM-generated knowledge bases against frontier LLMs, confirming the advantage is architectural, not informational.
As Large Language Models (LLMs) become increasingly popular, caching responses so that they can be reused by users with semantically similar queries has become a vital strategy for reducing inference costs and latency. Existing caching frameworks have proposed to decide which query responses to cache by assuming a finite, known universe of discrete queries and learning their serving costs and arrival probabilities. As LLMs' pool of users and queries expands, however, such an assumption becomes increasingly untenable: real-world LLM queries reside in an infinite, continuous embedding space. In this paper, we establish the first rigorous theoretical framework for semantic LLM response caching in continuous query space under uncertainty. To bridge the gap between discrete optimization and continuous representation spaces, we introduce dynamic $ε$-net discretization coupled with Kernel Ridge Regression. This design enables the system to formally quantify estimation uncertainty and generalize partial feedback on LLM query costs across continuous semantic query neighborhoods. We develop both offline learning and online adaptive algorithms optimized to reduce switching costs incurred by changing the cached responses. We prove that our online algorithm achieves a sublinear regret bound against an optimal continuous oracle, which reduces to existing bounds for discrete query models. Extensive empirical evaluations demonstrate that our framework approximates the continuous optimal cache well while also reducing computational and switching overhead compared to existing methods.
Rational design of covalent inhibitors requires simultaneously optimizing multiple properties, such as binding affinity, target selectivity, or electrophilic reactivity. This presents a multi-objective problem not easily addressed by screening alone. Here we present a machine learning pipeline for generating covalent inhibitor candidates using multi-objective reinforcement learning (RL), applied to two targets: epidermal growth factor receptor (EGFR) and acetylcholinesterase (ACHE). A SMILES-based pretrained LSTM serves as the generative model, optimized via policy gradient RL with Pareto crowding distance to balance competing scoring functions including synthetic accessibility, predicted covalent activity, residue affinity, and an approximated docking score. The pipeline rediscovers known covalent inhibitors at rates of up to 0.50% (EGFR) and 0.74% (ACHE) in 10,000-structure runs, with candidate structures achieving warhead-to-residue distances as short as 5.5 angstrom (EGFR) and 3.2 angstrom (ACHE) after further docking-based screening. More notably, the pipeline spontaneously generates structures bearing warhead motifs absent from the training data - including allenes, 3-oxo-$β$-sultams, and $α$-methylene-$β$-lactones - all of which have independent literature support as covalent warheads. These results suggest that RL-guided generation can explore covalent chemical space beyond its training distribution, and may be useful as a tool for medicinal chemists working on covalent drug discovery.
The integration of single-cell proteomic data is often hindered by the fragmented nature of targeted antibody panels. To address this limitation, we introduce scpFormer, a transformer-based foundation model designed for single-cell proteomics. Pre-trained on over 390 million cells, scpFormer replaces standard index-based tokenization with a continuous, sequence-anchored approach. By combining Evolutionary Scale Modeling (ESM) with value-aware expression embeddings, it dynamically maps variable panels into a shared semantic space without artificial discretization. We demonstrate that scpFormer generates global cell representations that perform competitively in large-scale batch integration and unsupervised clustering. Moreover, its open-vocabulary architecture facilitates in silico panel expansion, assisting in the reconstruction of biological manifolds in sparse clinical datasets. Finally, this learned protein co-expression logic is transferable to bulk-omics tasks, supporting applications like cancer drug response prediction. scpFormer provides a versatile, panel-agnostic framework to facilitate scalable biomarker discovery and precision oncology.
Complex-Valued Neural Networks (CVNNs) have significant advantages in handling tasks that involve complex numbers. However, existing CVNNs are unable to quantify predictive uncertainty. We propose, for the first time, dropout-based Bayesian Complex-Valued Neural Networks (BayesCVNNs) to enable uncertainty quantification for complex-valued applications, exhibiting broad applicability and efficiency for hardware implementation due to modularity. Furthermore, as the dual-part nature of complex values significantly broadens the design space and enables novel configurations based on layer-mixing and part-mixing, we introduce an automated search approach to effectively identify optimal configurations for both real and imaginary components. To facilitate deployment, we present a framework that generates customized FPGA-based accelerators for BayesCVNNs, leveraging a set of optimized building blocks. Experiments demonstrate the best configuration can be effectively found via the automated search, attaining higher performance with lower hardware costs compared with manually crafted models. The optimized accelerators achieve approximately 4.5x and 13x speedups on different models with less than 10% power consumption compared to GPU implementations, and outperform existing work in both algorithm and hardware aspects. Our code is publicly available at: https://github.com/zehuanzhang/BayesCVNN.git.
Neural fields, also known as implicit neural representations (INRs), offer a powerful framework for modeling continuous geometry, but their effectiveness in high-dimensional scientific settings is limited by slow convergence and scaling challenges. In this study, we extend INR models to handle spatiotemporal and multivariate signals and show how INR features can be transferred across scientific signals to enable efficient and scalable representation across time and ensemble runs in an amortized fashion. Across controlled transformation regimes (e.g., geometric transformations and localized perturbations of synthetic fields) and high-fidelity scientific domains-including turbulent flows, fluid-material impact dynamics, and astrophysical systems-we show that transferable features improve not only signal fidelity but also the accuracy of derived geometric and physical quantities, including density gradients and vorticity. In particular, transferable features reduce iterations to reach target reconstruction quality by up to an order of magnitude, increase early-stage reconstruction quality by multiple dB (with gains exceeding 10 dB in some cases), and consistently improve gradient-based physical accuracy.
Large language models can be uncertain yet correct, or confident yet wrong, raising the question of whether their output-level uncertainty and their actual correctness are driven by the same internal mechanisms or by distinct feature populations. We introduce a 2x2 framework that partitions model predictions along correctness and confidence axes, and uses sparse autoencoders to identify features associated with each dimension independently. Applying this to Llama-3.1-8B and Gemma-2-9B, we identify three feature populations that play fundamentally different functional roles. Pure uncertainty features are functionally essential: suppressing them severely degrades accuracy. Pure incorrectness features are functionally inert: despite showing statistically significant activation differences between correct and incorrect predictions, the majority produce near-zero change in accuracy when suppressed. Confounded features that encode both signals are detrimental to output quality, and targeted suppression of them yields a 1.1% accuracy improvement and a 75% entropy reduction, with effects transferring across the ARC-Challenge and RACE benchmarks. The feature categories are also informationally distinct: the activations of just 3 confounded features from a single mid-network layer predict model correctness (AUROC ~0.79), enabling selective abstention that raises accuracy from 62% to 81% at 53% coverage. The results demonstrate that uncertainty and correctness are distinct internal phenomena, with implications for interpretability and targeted inference-time intervention.
Vision-language models (VLMs) are increasingly used in settings where sensitivity to low-level image degradations matters, including content moderation, image restoration, and quality monitoring. Yet their ability to recognize distortion type and severity remains poorly understood. We present DistortBench, a diagnostic benchmark for no-reference distortion perception in VLMs. DistortBench contains 13,500 four-choice questions covering 27 distortion types, six perceptual categories, and five severity levels: 25 distortions inherit KADID-10k calibrations, while two added rotation distortions use monotonic angle-based levels. We evaluate 18 VLMs, including 17 open-weight models from five families and one proprietary model. Despite strong performance on high-level vision-language tasks, the best model reaches only 61.9% accuracy, just below the human majority-vote baseline of 65.7% (average individual: 60.2%), indicating that low-level perceptual understanding remains a major weakness of current VLMs. Our analysis further reveals weak and non-monotonic scaling with model size, performance drops in most base--thinking pairs, and distinct severity-response patterns across model families. We hope DistortBench will serve as a useful benchmark for measuring and improving low-level visual perception in VLMs.
With the emergence of new evaluation metrics and attack methodologies for Membership Inference Attacks (MIA), it becomes essential to reevaluate previously accepted assumptions. In this paper, we revisit the longstanding debate regarding the correlation between MIA success rates and model generalization using an empirical approach. We focused on employing augmentation techniques and early stopping to enhance model generalization and examined their impact on MIA success rates. We found that utilizing advanced generalization techniques can significantly decrease attack performance, potentially by up to 100 times. Moreover, combining these methods not only improves model generalization but also reduces attack effectiveness by introducing randomness during training. Additionally, our study confirmed the direct impact of generalization on MIA performance through an analysis of over 1K models in a controlled environment.
Neural surrogates for stiff differential-algebraic equations (DAEs) face two key challenges: soft-constraint methods leave algebraic residuals that stiffness amplifies into large errors, while hard-constraint methods require trajectory data from computationally expensive stiff integrators. We introduce an extended Newton implicit layer that enforces algebraic consistency and quasi-steady-state reduction within a single differentiable solve. Given slow-state predictions from a physics-informed DeepONet, the proposed layer recovers fast and algebraic states, eliminates the stiffness-amplification pathway within each time window, and reduces the output dimension to the slow states alone. Gradients derived via the implicit function theorem capture a stiffness-scaled coupling term that is absent in penalty-based approaches. Cascaded implicit layers further extend the framework to multi-component systems with provable convergence. On a grid-forming inverter DAE (21 states), the proposed method (7 outputs, 1.42 percent error) significantly outperforms penalty methods (39.3 percent), standard Newton approaches (57.0 percent), and augmented Lagrangian or feedback linearization baselines, which fail to converge. Two independently trained models compose into a 44-state system without retraining, achieving 0.72 to 1.16 percent error with zero algebraic residual. Conformal prediction further provides 90 percent coverage in-distribution and enables automatic out-of-distribution detection.
Cement production is among the largest contributors to industrial air pollution, emitting ~3 Mt NOx/year. The industry-standard mitigation approach, selective non-catalytic reduction (SNCR), exhibits low NH3 utilization efficiency, resulting in operational inefficiencies and increased reagent costs. Here, we develop a data-driven framework for emission control using large-scale operational data from four cement plants worldwide. Benchmarking nine machine learning architectures, we observe that prediction error varies ~3-5x across plants due to variation in data richness. Incorporating short-term process history nearly triples NOx prediction accuracy, revealing that NOx formation carries substantial process memory, a timescale dependence that is absent in CO and CO2. Further, we develop models that forecast NOx overshoots as early as nine minutes, providing a buffer for operational adjustments. The developed framework controls NOx formation at the source, reducing NH3 consumption in downstream SNCR. Surrogate model projections estimate a ~34-64% reduction in NOx while preserving clinker quality, corresponding to a reduction of ~290 t NOx/year and ~58,000 USD/year in NH3 savings. This work establishes a generalizable framework for data-driven emission control, offering a pathway toward low-emission operation without structural modifications or additional hardware, with potential applicability to other hard-to-abate industries such as steel, glass, and lime.
We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM) to predict semantic visual embeddings via learnable query tokens, which subsequently serve as conditioning signals for a diffusion model. This streamlined design effectively transfers the rich understanding and reasoning capabilities of VLMs into the visual generation process. By obviating the need for deep fusion between autoregressive and diffusion models or training from scratch, MMCORE significantly reduces computational overhead while maintaining high-fidelity synthesis. MMCORE seamlessly integrates text-to-image synthesis with interleaved image generation, demonstrating robust multimodal comprehension in complex scenarios such as spatial reasoning and visual grounding. Comprehensive evaluations indicate that MMCORE consistently outperforms state-of-the-art baselines across a broad spectrum of text-to-image and single/multi-image editing benchmarks.
Post-Training Quantization (PTQ) is critical for the efficient deployment of Large Language Models (LLMs). While 4-bit quantization is widely regarded as an optimal trade-off, reducing the precision to 2-bit usually triggers a catastrophic ``performance cliff.'' It remains unclear whether the underlying mechanisms differ fundamentally. Consequently, we conduct a systematic mechanistic analysis, revealing two qualitatively distinct failure modes: Signal Degradation, where the computational patterns remain intact but information precision is impaired by cumulative error; and Computation Collapse, where key components fail to function, preventing correct information processing and destroying the signal in the early layers. Guided by this diagnosis, we conduct mechanism-aware interventions, demonstrating that targeted, training-free repair can mitigate Signal Degradation, but remains ineffective for Computation Collapse. Our findings provide a systematic diagnostic framework for PTQ failures and suggest that addressing Computation Collapse requires structural reconstruction rather than mere compensation.
We release Super Apriel, a 15B-parameter supernet in which every decoder layer provides four trained mixer choices -- Full Attention (FA), Sliding Window Attention (SWA), Kimi Delta Attention (KDA), and Gated DeltaNet (GDN). A placement selects one mixer per layer; placements can be switched between requests at serving time without reloading weights, enabling multiple speed presets from a single checkpoint. The shared checkpoint also enables speculative decoding without a separate draft model. The all-FA preset matches the Apriel 1.6 teacher on all reported benchmarks; recommended hybrid presets span $2.9\times$ to $10.7\times$ decode throughput at 96% to 77% quality retention, with throughput advantages that compound at longer context lengths. With four mixer types across 48 layers, the configuration space is vast. A surrogate that predicts placement quality from the per-layer mixer assignment makes the speed-quality landscape tractable and identifies the best tradeoffs at each speed level. We investigate whether the best configurations at each speed level can be identified early in training or only after convergence. Rankings stabilize quickly at 0.5B scale, but the most efficient configurations exhibit higher instability at 15B, cautioning against extrapolation from smaller models. Super Apriel is trained by stochastic distillation from a frozen Apriel 1.6 teacher, followed by supervised fine-tuning. We release the supernet weights, Fast-LLM training code, vLLM serving code, and a placement optimization toolkit.
Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly understood. In this work, we represent stochastic optimizers as random dynamical systems, which often converge to a fractal attractor set (rather than a point) with a smaller intrinsic dimension. Building on this connection and inspired by Lyapunov dimension theory, we introduce a novel notion of dimension, coined the `sharpness dimension', and prove a generalization bound based on this dimension. Our results show that generalization in the chaotic regime depends on the complete Hessian spectrum and the structure of its partial determinants, highlighting a complexity that cannot be captured by the trace or spectral norm considered in prior work. Experiments across various MLPs and transformers validate our theory while also providing new insights into the recently observed phenomenon of grokking.
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.
We establish central and non-central limit theorems for sequences of functionals of the Gaussian output of an infinitely-wide random neural network on the d-dimensional sphere . We show that the asymptotic behaviour of these functionals as the depth of the network increases depends crucially on the fixed points of the covariance function, resulting in three distinct limiting regimes: convergence to the same functional of a limiting Gaussian field, convergence to a Gaussian distribution, convergence to a distribution in the Qth Wiener chaos. Our proofs exploit tools that are now classical (Hermite expansions, Diagram Formula, Stein-Malliavin techniques), but also ideas which have never been used in similar contexts: in particular, the asymptotic behaviour is determined by the fixed-point structure of the iterative operator associated with the covariance, whose nature and stability governs the different limiting regimes.
Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynamics and operating conditions can change unexpectedly. Moreover, RL controllers acting in physical environments must satisfy safety constraints throughout their learning and execution phases, rendering transient violations during adaptation unacceptable. Although continual RL and safe RL have each addressed non-stationarity and safety, respectively, their intersection remains comparatively unexplored, motivating the study of safe continual RL algorithms that can adapt over the system's lifetime while preserving safety. In this work, we systematically investigate safe continual reinforcement learning by introducing three benchmark environments that capture safety-critical continual adaptation and by evaluating representative approaches from safe RL, continual RL, and their combinations. Our empirical results reveal a fundamental tension between maintaining safety constraints and preventing catastrophic forgetting under non-stationary dynamics, with existing methods generally failing to achieve both objectives simultaneously. To address this shortcoming, we examine regularization-based strategies that partially mitigate this trade-off and characterize their benefits and limitations. Finally, we outline key open challenges and research directions toward developing safe, resilient learning-based controllers capable of sustained autonomous operation in changing environments.
Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-time scaling of diffusion-based policies without the computational cost by tracing the performance gain of action samples back to earlier in the denoising process. Our key insight is that we can model the denoising of multiple action candidates and selecting the best one as a Markov Decision Process (MDP) where the goal is to progressively filter action candidates before denoising is complete. With this MDP, we can learn a policy and value function in the denoising space that predicts the downstream value of action candidates in the denoising process and filters them while maximizing returns. The result is a method that is lightweight and can be plugged into existing generative RL algorithms. Across challenging long-horizon manipulation tasks in online and batch-online RL, FASTER consistently improves the underlying policies and achieves the best overall performance among the compared methods. Applied to a pretrained VLA, FASTER achieves the same performance while substantially reducing training and inference compute requirements. Code is available at https://github.com/alexanderswerdlow/faster .
Personalized Federated Learning (PFL) aims to learn multiple task-specific models rather than a single global model across heterogeneous data distributions. Existing PFL approaches typically rely on iterative optimization-such as model update trajectories-to cluster users that need to accomplish the same tasks together. However, these learning-dynamics-based methods are inherently vulnerable to low-quality data and noisy labels, as corrupted updates distort clustering decisions and degrade personalization performance. To tackle this, we propose FB-NLL, a feature-centric framework that decouples user clustering from iterative training dynamics. By exploiting the intrinsic heterogeneity of local feature spaces, FB-NLL characterizes each user through the spectral structure of the covariances of their feature representations and leverages subspace similarity to identify task-consistent user groupings. This geometry-aware clustering is label-agnostic and is performed in a one-shot manner prior to training, significantly reducing communication overhead and computational costs compared to iterative baselines. Complementing this, we introduce a feature-consistency-based detection and correction strategy to address noisy labels within clusters. By leveraging directional alignment in the learned feature space and assigning labels based on class-specific feature subspaces, our method mitigates corrupted supervision without requiring estimation of stochastic noise transition matrices. In addition, FB-NLL is model-independent and integrates seamlessly with existing noise-robust training techniques. Extensive experiments across diverse datasets and noise regimes demonstrate that our framework consistently outperforms state-of-the-art baselines in terms of average accuracy and performance stability.
We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, from language pretraining to action-expert fine-tuning. VLA Foundry supports both from-scratch training and pretrained backbones from Hugging Face. To demonstrate the utility of our framework, we train and release two types of models: the first trained fully from scratch through our LLM-->VLM-->VLA pipeline and the second built on the pretrained Qwen3-VL backbone. We evaluate closed-loop policy performance of both models on LBM Eval, an open-data, open-source simulator. We also contribute usability improvements to the simulator and the STEP analysis tools for easier public use. In the nominal evaluation setting, our fully-open from-scratch model is on par with our prior closed-source work and substituting in the Qwen3-VL backbone leads to a strong multi-task table top manipulation policy outperforming our baseline by a wide margin. The VLA Foundry codebase is available at https://github.com/TRI-ML/vla_foundry and all multi-task model weights are released on https://huggingface.co/collections/TRI-ML/vla-foundry. Additional qualitative videos are available on the project website https://tri-ml.github.io/vla_foundry.
Despite the remarkable success of Vision Transformers (ViTs) across a wide range of vision tasks, recent studies have revealed that they remain vulnerable to adversarial examples, much like Convolutional Neural Networks (CNNs). A common empirical defense strategy is adversarial training, yet the theoretical underpinnings of its robustness in ViTs remain largely unexplored. In this work, we present the first theoretical analysis of adversarial training under simplified ViT architectures. We show that, when trained under a signal-to-noise ratio that satisfies a certain condition and within a moderate perturbation budget, adversarial training enables ViTs to achieve nearly zero robust training loss and robust generalization error under certain regimes. Remarkably, this leads to strong generalization even in the presence of overfitting, a phenomenon known as \emph{benign overfitting}, previously only observed in CNNs (with adversarial training). Experiments on both synthetic and real-world datasets further validate our theoretical findings.
The discretization of continuous numerical attributes remains a persistent computational bottleneck in the induction of decision trees, particularly as dataset dimensions scale. Building upon the recently proposed MSD-Splitting technique -- which bins continuous data using the empirical mean and standard deviation to dramatically improve the efficiency and accuracy of the C4.5 algorithm -- we introduce Adaptive MSD-Splitting (AMSD). While standard MSD-Splitting is highly effective for approximately symmetric distributions, its rigid adherence to fixed one-standard-deviation cutoffs can lead to catastrophic information loss in highly skewed data, a common artifact in real-world biomedical and financial datasets. AMSD addresses this by dynamically adjusting the standard deviation multiplier based on feature skewness, narrowing intervals in dense regions to preserve discriminative resolution. Furthermore, we integrate AMSD into ensemble methods, specifically presenting the Random Forest-AMSD (RF-AMSD) framework. Empirical evaluations on the Census Income, Heart Disease, Breast Cancer, and Forest Covertype datasets demonstrate that AMSD yields a 2-4% accuracy improvement over standard MSD-Splitting, while maintaining near-identical O(N) time complexity reductions compared to the O(N log N) exhaustive search. Our Random Forest extension achieves state-of-the-art accuracy at a fraction of standard computational costs, confirming the viability of adaptive statistical binning in large-scale ensemble learning architectures.
In [97,99,100], an fl-RDT framework is introduced to characterize \emph{statistical computational gaps} (SCGs). Studying \emph{symmetric binary perceptrons} (SBPs), [100] obtained an \emph{algorithmic} threshold estimate $α_a\approx α_c^{(7)}\approx 1.6093$ at the 7th lifting level (for $κ=1$ margin), closely approaching $1.58$ local entropy (LE) prediction [18]. In this paper, we further connect parametric RDT to overlap gap properties (OGPs), another key geometric feature of the solution space. Specifically, for any positive integer $s$, we consider $s$-level ultrametric OGPs ($ult_s$-OGPs) and rigorously upper-bound the associated constraint densities $α_{ult_s}$. To achieve this, we develop an analytical union-bounding program consisting of combinatorial and probabilistic components. By casting the combinatorial part as a convex problem and the probabilistic part as a nested integration, we conduct numerical evaluations and obtain that the tightest bounds at the first two levels, $\barα_{ult_1} \approx 1.6578$ and $\barα_{ult_2} \approx 1.6219$, closely approach the 3rd and 4th lifting level parametric RDT estimates, $α_c^{(3)} \approx 1.6576$ and $α_c^{(4)} \approx 1.6218$. We also observe excellent agreement across other key parameters, including overlap values and the relative sizes of ultrametric clusters. Based on these observations, we propose several conjectures linking $ult$-OGP and parametric RDT. Specifically, we conjecture that algorithmic threshold $α_a=\lim_{s\rightarrow\infty} α_{ult_s} = \lim_{s\rightarrow\infty} \barα{ult_s} = \lim_{r\rightarrow\infty} α_{c}^{(r)}$, and $α_{ult_s} \leq α_{c}^{(s+2)}$ (with possible equality for some (maybe even all) $s$). Finally, we discuss the potential existence of a full isomorphism connecting all key parameters of $ult$-OGP and parametric RDT.
Reinforcement fine-tuning with verifiable rewards (RLVR) has emerged as a powerful paradigm for equipping large vision-language models (LVLMs) with agentic capabilities such as tool use and multi-step reasoning. Despite striking empirical successes, most notably Visual Agentic Reinforcement Fine-Tuning (Visual-ARFT), the theoretical underpinnings of this paradigm remain poorly understood. In particular, two critical questions lack rigorous answers: (i)~how does the composite structure of verifiable rewards (format compliance, answer accuracy, tool executability) affect the convergence of Group Relative Policy Optimization (GRPO), and (ii)~why does training on a small set of tool-augmented tasks transfer to out-of-distribution domains? We address these gaps by introducing the \emph{Tool-Augmented Markov Decision Process} (TA-MDP), a formal framework that models multimodal agentic decision-making with bounded-depth tool calls. Within this framework, we establish three main results. First, we prove that GRPO under composite verifiable rewards converges to a first-order stationary point at rate $O(1/\sqrt{T})$ with explicit dependence on the number of reward components and group size (\textbf{Theorem~1}). Second, we derive a \emph{Reward Decomposition Theorem} that bounds the sub-optimality gap between decomposed per-component optimization and joint optimization, providing a precise characterization of when reward decomposition is beneficial (\textbf{Theorem~2}). Third, we establish a PAC-Bayes generalization bound for tool-augmented policies that explains the strong out-of-distribution transfer observed in Visual-ARFT (\textbf{Theorem~3}).
Large Language Models (LLMs) show promise for generating Register-Transfer Level (RTL) code from natural language specifications, but single-shot generation achieves only 60-65% functional correctness on standard benchmarks. Multi-agent approaches such as MAGE reach 95.9% on VerilogEval yet remain untested on harder industrial benchmarks such as NVIDIA's CVDP, lack synthesis awareness, and incur high API costs. We present ChipCraftBrain, a framework combining symbolic-neural reasoning with adaptive multi-agent orchestration for automated RTL generation. Four innovations drive the system: (1) adaptive orchestration over six specialized agents via a PPO policy over a 168-dim state (an alternative world-model MPC planner is also evaluated); (2) a hybrid symbolic-neural architecture that solves K-map and truth-table problems algorithmically while specialized agents handle waveform timing and general RTL; (3) knowledge-augmented generation from a 321-pattern base plus 971 open-source reference implementations with focus-aware retrieval; and (4) hierarchical specification decomposition into dependency-ordered sub-modules with interface synchronization. On VerilogEval-Human, ChipCraftBrain achieves 97.2% mean pass@1 (range 96.15-98.72% across 7 runs, best 154/156), on par with ChipAgents (97.4%, self-reported) and ahead of MAGE (95.9%). On a 302-problem non-agentic subset of CVDP spanning five task categories, we reach 94.7% mean pass@1 (286/302, averaged over 3 runs), a 36-60 percentage-point lift per category over the published single-shot baseline; we additionally lead three of four categories shared with NVIDIA's ACE-RTL despite using roughly 30x fewer per-problem attempts. A RISC-V SoC case study demonstrates hierarchical decomposition generating 8/8 lint-passing modules (689 LOC) validated on FPGA, where monolithic generation fails entirely.
The standard Monte Carlo estimator $\widehat{I}_N^{\mathrm{MC}}$ of $\int fdω$ relies on independent samples from $ω$ and has variance of order $1/N$. Replacing the samples with a determinantal point process (DPP), a repulsive distribution, makes the estimator consistent, with variance rates that depend on how the DPP is adapted to $f$ and $ω$. We examine two existing DPP-based estimators: one by Bardenet & Hardy (2020) with a rate of $\mathcal{O}(N^{-(1+1/d)})$ for smooth $f$, but relying on a fixed DPP. The other, by Ermakov & Zolotukhin (1960), is unbiased with rate of order $1/N$, like Monte Carlo, but its DPP is tailored to $f$. We revisit these estimators, generalize them to continuous settings, and provide sampling algorithms.
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Explainable artificial intelligence (XAI) has predominantly focused on generating model-centric explanations that approximate the behavior of black-box models. However, such explanations often overlook a fundamental aspect of interpretability: different users require different explanations depending on their goals, preferences, and cognitive constraints. Although recent work has explored user-centric and personalized explanations, most existing approaches rely on heuristic adaptations or implicit user modeling, lacking a principled framework for representing and learning individual preferences. In this paper, we consider Preference-Based Explainable Artificial Intelligence (PREF-XAI), a novel perspective that reframes explanation as a preference-driven decision problem. Within PREF-XAI, explanations are not treated as fixed outputs, but as alternatives to be evaluated and selected according to user-specific criteria. In the PREF-XAI perspective, here we propose a methodology that combines rule-based explanations with formal preference learning. User preferences are elicited through a ranking of a small set of candidate explanations and modeled via an additive utility function inferred using robust ordinal regression. Experimental results on real-world datasets show that PREF-XAI can accurately reconstruct user preferences from limited feedback, identify highly relevant explanations, and discover novel explanatory rules not initially considered by the user. Beyond the proposed methodology, this work establishes a connection between XAI and preference learning, opening new directions for interactive and adaptive explanation systems.
Reinforcement learning-based control policies have been frequently demonstrated to be more effective than analytical techniques for many manipulation tasks. Commonly, these methods learn neural control policies that predict end-effector pose changes directly from observed state information. For tasks like inserting delicate connectors which induce force constraints, pose-based policies have limited explicit control over force and rely on carefully tuned low-level controllers to avoid executing damaging actions. In this work, we present hybrid position-force control policies that learn to dynamically select when to use force or position control in each control dimension. To improve learning efficiency of these policies, we introduce Mode-Aware Training for Contact Handling (MATCH) which adjusts policy action probabilities to explicitly mirror the mode selection behavior in hybrid control. We validate MATCH's learned policy effectiveness using fragile peg-in-hole tasks under extreme localization uncertainty. We find MATCH substantially outperforms pose-control policies -- solving these tasks with up to 10% higher success rates and 5x fewer peg breaks than pose-only policies under common types of state estimation error. MATCH also demonstrates data efficiency equal to pose-control policies, despite learning in a larger and more complex action space. In over 1600 sim-to-real experiments, we find MATCH succeeds twice as often as pose policies in high noise settings (33% vs.~68%) and applies ~30% less force on average compared to variable impedance policies on a Franka FR3 in laboratory conditions.
We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set. Our approach better models the real-world setting where the cost of influencers varies and advertisers want to find the best value for their overall social advertising budget. We propose an algorithm assuming an independent cascade diffusion model and edge level semi-bandit feedback, and provide both theoretical and experimental results. Our analysis is also valid for the cardinality constraint setting and improves the state of the art regret bound in this case.
Enforcing constraint satisfaction in neural network outputs is critical for safety, reliability, and physical fidelity in many control and decision-making applications. While soft-constrained methods penalize constraint violations during training, they do not guarantee constraint adherence during inference. Other approaches guarantee constraint satisfaction via specific parameterizations or a projection layer, but are tailored to specific forms (e.g., linear constraints), limiting their utility in other general problem settings. Many real-world problems of interest are nonlinear, motivating the development of methods that can enforce general nonlinear constraints. To this end, we introduce HardNet++, a constraint-enforcement method that simultaneously satisfies linear and nonlinear equality and inequality constraints. Our approach iteratively adjusts the network output via damped local linearizations. Each iteration is differentiable, admitting an end-to-end training framework, where the constraint satisfaction layer is active during training. We show that under certain regularity conditions, this procedure can enforce nonlinear constraint satisfaction to arbitrary tolerance. Finally, we demonstrate tight constraint adherence without loss of optimality in a learning-for-optimization context, where we apply this method to a model predictive control problem with nonlinear state constraints.
At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve-making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic framework to mitigate recurrent execution errors. Chat2Workflow is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be transformed and directly deployed to practical workflow platforms such as Dify and Coze. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially under complex or changing requirements. Although our agentic framework yields up to 5.34% resolve rate gains, the remaining real-world gap positions Chat2Workflow as a foundation for advancing industrial-grade automation. Code is available at https://github.com/zjunlp/Chat2Workflow.
Counterfactual explanations (CEs) provide an intuitive way to understand recommender systems by identifying minimal modifications to user-item interactions that alter recommendation outcomes. Existing CE methods for recommender systems, however, have been evaluated under heterogeneous protocols, using different datasets, recommenders, metrics, and even explanation formats, which hampers reproducibility and fair comparison. Our paper systematically reproduces, re-implement, and re-evaluate eleven state-of-the-art CE methods for recommender systems, covering both native explainers (e.g., LIME-RS, SHAP, PRINCE, ACCENT, LXR, GREASE) and specific graph-based explainers originally proposed for GNNs. Here, a unified benchmarking framework is proposed to assess explainers along three dimensions: explanation format (implicit vs. explicit), evaluation level (item-level vs. list-level), and perturbation scope (user interaction vectors vs. user-item interaction graphs). Our evaluation protocol includes effectiveness, sparsity, and computational complexity metrics, and extends existing item-level assessments to top-K list-level explanations. Through extensive experiments on three real-world datasets and six representative recommender models, we analyze how well previously reported strengths of CE methods generalize across diverse setups. We observe that the trade-off between effectiveness and sparsity depends strongly on the specific method and evaluation setting, particularly under the explicit format; in addition, explainer performance remains largely consistent across item level and list level evaluations, and several graph-based explainers exhibit notable scalability limitations on large recommender graphs. Our results refine and challenge earlier conclusions about the robustness and practicality of CE generation methods in recommender systems: https://github.com/L2R-UET/CFExpRec.
Damage identification is a core task in structural health monitoring. In practice, however, its reliability is often compromised by confounding non-damage effects, such as variations in excitation and environmental conditions, which can induce changes comparable to or larger than those caused by structural damage. To address this challenge, this study proposes a self-supervised label-free disentangled representation learning framework for robust vibration-based structural damage identification. The proposed framework employs an autoencoder with two latent representations to learn directly from raw vibration acceleration signals. A self-supervised invariance regularization, implemented via Variance-Invariance-Covariance Regularization (VICReg), is imposed on one latent representation using baseline data where structural damage is assumed constant but operational and environmental conditions vary. In addition, a frequency-domain constraint is introduced to enforce agreement between the power spectral density reconstructed from the latent representation and that computed from the corresponding input time series. Together, these mechanisms promote disentanglement, enabling the learned representation to be sensitive to damage-related characteristics while remaining invariant to nuisance variability. The framework is trained in a fully end-to-end and label-free manner, requiring no prior information on damage, excitation, or environmental conditions, making it well-suited for real-world applications. Its effectiveness is validated on two distinct real-world vibration datasets, including a bridge and a gearbox. The results demonstrate robustness to operational variability, strong generalization capability, and good performance in both damage detection and quantification.
Edge-cloud hybrid inference offloads difficult inputs to a powerful remote model, but the uplink channel imposes hard per-request constraints on the number of bits that can be transmitted. We show that selecting transmitted content based solely on attention-based importance, the standard approach in collaborative inference, is inherently limited under hard budgets. Two findings support this claim. First, replacing high-importance units with low-importance but complementary ones improves server accuracy. This shows that what matters is not individual importance but how well the transmitted set covers diverse aspects of the input. Second, spatially uniform selection without any content information achieves competitive accuracy at moderate budgets. This confirms that spatial coverage alone carries independent value. Based on this analysis, we propose SAGE (Semantic Attention-Guided Evidence), a principled, training-free method that combines importance filtering with embedding-diversity sampling. SAGE achieves 93% of the server ceiling in offloaded accuracy while transmitting fewer than half of the available evidence units on ImageNet-1K, substantially outperforming importance-only composition.
The importance of clear and correct text in legal documents cannot be understated, and, consequently, a grammatical error correction tool meant to assist a professional in the law must have the ability to understand the possible errors in the context of a legal environment, correcting them accordingly, and implicitly needs to be trained in the same environment, using realistic legal data. However, the manually annotated data required by such a process is in short supply for languages such as Romanian, much less for a niche domain. The most common approach is the synthetic generation of parallel data; however, it requires a structured understanding of the Romanian grammar. In this paper, we introduce, to our knowledge, the first Romanian-language parallel dataset for the detection and correction of grammatical errors in the legal domain, RoLegalGEC, which aggregates 350,000 examples of errors in legal passages, along with error annotations. Moreover, we evaluate several neural network models that transform the dataset into a valuable tool for both detecting and correcting grammatical errors, including knowledge-distillation Transformers, sequence tagging architectures for detection, and a variety of pre-trained text-to-text Transformer models for correction. We consider that the set of models, together with the novel RoLegalGEC dataset, will enrich the resource base for further research on Romanian.
We give a Gordon-Greenwald-Marks (GGM) style black-box reduction from online learning to online multicalibration. Concretely, we show that to achieve high-dimensional multicalibration with respect to a class of functions H, it suffices to combine any no-regret learner over H with an expected variational inequality (EVI) solver. We also prove a converse statement showing that efficient multicalibration implies efficient EVI solving, highlighting how EVIs in multicalibration mirror the role of fixed points in the GGM result for $Φ$-regret. This first set of results resolves the main open question in Garg, Jung, Reingold, and Roth (SODA '24), showing that oracle-efficient online multicalibration with $\sqrt{T}$-type guarantees is possible in full generality. Furthermore, our GGM-style reduction unifies the analyses of existing online multicalibration algorithms, enables new algorithms for challenging environments with delayed observations or censored outcomes, and yields the first efficient black-box reduction between online learning and multiclass omniprediction. Our second main result is a fine-grained reduction from high-dimensional online multicalibration to (contextual) $Φ$-regret minimization. Together with our first result, this establishes a new route from external regret to Phi-regret that bypasses sophisticated fixed-point or semi-separation machinery, dramatically simplifies a result of Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25) while improving rates, and yields new algorithms that are robust to richer deviation classes, such as those belonging to any reproducing kernel Hilbert space.
Q-learning is one of the most fundamental algorithms in reinforcement learning. We analyze constant-stepsize Q-learning through a direct stochastic switching system representation. The key observation is that the Bellman maximization error can be represented exactly by a stochastic policy. Therefore, the Q-learning error admits a switched linear conditional-mean recursion with martingale-difference noise. The intrinsic drift rate is the joint spectral radius (JSR) of the direct switching family, which can be strictly smaller than the standard row-sum rate. Using this representation, we derive a finite-time final-iterate bound via a JSR-induced Lyapunov function and then give a computable quadratic-certificate version.
Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on gold-standard outputs that are costly or impractical to obtain. Moreover, hallucination detection methods developed for text-based LLMs do not directly capture audio-specific signals. We investigate four attention-derived metrics: AUDIORATIO, AUDIOCONSISTENCY, AUDIOENTROPY, and TEXTENTROPY, designed to capture pathological attention patterns associated with hallucination, and train lightweight logistic regression classifiers on these features for efficient inference-time detection. Across automatic speech recognition and speech-to-text translation tasks, evaluations on Qwen-2-Audio and Voxtral-3B show that our approach outperforms uncertainty-based and prior attention-based baselines on in-domain data, achieving improvements of up to +0.23 PR-AUC, and generalises to out-of-domain ASR settings. We further find that strong performance can be achieved with approximately 100 attention heads, improving out-of-domain generalisation compared to using all heads. While effectiveness is model-dependent and task-specific training is required, our results demonstrate that attention patterns provide a valuable tool for hallucination detection in SpeechLLMs.
Structure-based drug discovery faces the dual challenge of accurately capturing 3D protein-ligand interactions while navigating ultra-large chemical spaces to identify synthetically accessible candidates. In this work, we present a unified framework that addresses these challenges by combining contrastive 3D structure encoding with autoregressive molecular generation conditioned on commercial compound spaces. First, we introduce an SE(3)-equivariant transformer that encodes ligand and pocket structures into a shared embedding space via contrastive learning, achieving competitive results in zero-shot virtual screening. Second, we integrate these embeddings into a multimodal Chemical Language Model (MCLM). The model generates target-specific molecules conditioned on either pocket or ligand structures, with a learned dataset token that steers the output toward targeted chemical spaces, yielding candidates with favorable predicted binding properties across diverse targets.
The goal of machine learning is to find models that minimize prediction error on data that has not yet been seen. Its operational paradigm assumes access to a dataset $S$ and articulates a scheme for evaluating how well a given model performs on an arbitrary sample. The sample can be $S$ (in which case we speak of ``in-sample'' performance) or some entirely new $S'$ (in which case we speak of ``out-of-sample'' performance). Traditional analysis of generalization assumes that both in- and out-of-sample data are i.i.d.\ draws from an infinite population. However, these probabilistic assumptions cannot be verified even in principle. This paper presents an alternative view of generalization through the lens of sensitivity analysis of solutions of optimization problems to perturbations in the problem data. Under this framework, generalization bounds are obtained by purely deterministic means and take the form of variational principles that relate in-sample and out-of-sample evaluations through an error term that quantifies how close out-of-sample data are to in-sample data. Statistical assumptions can then be used \textit{ex post} to characterize the situations when this error term is small (either on average or with high probability).
Construction workers are highly vulnerable to heat stress, yet tools that translate real-time physiological data into actionable safety intelligence remain scarce. This study addresses this gap by developing and evaluating deep learning models, specifically a baseline Long Short-Term Memory (LSTM) network and an attention-based LSTM, to predict heat stress among 19 workers in Saudi Arabia. Using Garmin Vivosmart 5 smartwatches to monitor metrics such as heart rate, HRV, and oxygen saturation, the attention-based model outperformed the baseline, achieving 95.40% testing accuracy and significantly reducing false positives and negatives. With precision, recall, and F1 scores of 0.982, this approach not only improves predictive performance but also offers interpretable results suitable for integration into IoT-enabled safety systems and BIM dashboards, advancing proactive, informatics-driven safety management in the construction industry.
Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.
This technical note revisits the relationship between RaBitQ and TurboQuant under a unified comparison framework. We compare the two methods in terms of methodology, theoretical guarantees, and empirical performance, using a reproducible, transparent, and symmetric setup. Our results show that, despite the claimed advantage of TurboQuant, TurboQuant does not provide a consistent improvement over RaBitQ in directly comparable settings; in many tested configurations, it performs worse than RaBitQ. We further find that several reported runtime and recall results in the TurboQuant paper could not be reproduced from the released implementation under the stated configuration. Overall, this note clarifies the shared structure and genuine differences between the two lines of work, while documenting reproducibility issues in the experimental results reported by the TurboQuant paper.
In this Letter we explore the modelling of hadron production in electromagnetic ion dissociation (EMD) processes in high-energy ultraperipheral collisions at LHC energies. Since EMD can accompany exclusive particle production in these interactions, we demonstrate that the resulting hadrons can break the exclusivity vetos typically imposed by experiments. As two representative examples, we calculate the impact on existing LHC measurements of exclusive muon pair production ($γγ\toμμ$) and exclusive coherent $J/ψ$ production. We demonstrate that accounting for this effect resolves long-standing tensions between theoretical predictions and experimental measurements.
We study the impact of neutron-skin thickness on $J/ψ$ photoproduction in ultra-peripheral $^{208}\mathrm{Pb}+{}^{208}\mathrm{Pb}$ collisions. Within the Color Glass Condensate framework, we calculate coherent and incoherent cross sections and examine their dependence on the momentum transfer $|t|$ for different neutron-skin thicknesses. We find a clear imprint of the neutron skin on the $|t|$ spectra: a larger neutron skin leads to a smoother and more extended color-density profile, suppressing the coherent cross section at large $|t|$ while enhancing the incoherent cross section through increased event-by-event configurational fluctuations in the nuclear periphery. We further show that the ratio of incoherent to coherent integrated cross sections provides a particularly sensitive and robust observable, with reduced theoretical uncertainties. These results establish diffractive vector-meson photoproduction in ultra-peripheral collisions as a powerful tomographic tool to constrain the neutron-skin thickness and the transverse gluon distribution at the LHC and future Electron-Ion Colliders.
Jet quenching provides a valuable measure of the opacity of the quark-gluon plasma (QGP) produced in high-energy heavy-ion collisions. However, substantial suppression of charged hadron spectra is observed in highly peripheral collisions, despite the expectation of negligible jet-QGP interactions in this regime. To address this, we develop a HIJING-based initial condition model that accounts for the impact parameter dependence of both inelastic nucleon-nucleon (NN) collisions and the number of hard partonic scatterings per inelastic NN collision. This dependence introduces a geometric bias effect on the jet yield within a given centrality class of nucleus-nucleus (AA) collisions, suppressing the high transverse momentum hadron spectrum in peripheral collisions due to dilute nucleon overlap at large AA impact parameters. By combining this improved initial condition model with a linear Boltzmann transport model for jet-QGP interactions, we obtain a satisfactory description of the centrality dependence of charged hadron suppression in Pb+Pb collisions at $\sqrt{s_\mathrm{NN}}=5.02$ TeV.
Proton-induced reactions on enriched 118Sn up to 18 MeV have been investigated. Using the stacked-foil activation technique, the excitation functions of the reactions 118Sn(p,n)118Sb, 118Sn(p,2n)117Sb, 118Sn(p,α)115mIn, and 118Sn(p,x)117mSn were measured. The available experimental data show good agreement with our measurements. The cross sections for the 118Sn(p,x)117mSn and 118Sn(p,α)115mIn reactions are reported for the first time. The measured cross sections were compared not only with previously published experimental results, but also with theoretical predictions from the TENDL-2023 (TALYS-based evaluated nuclear data library), TENDL-2025 and JENDL-5 (Japanese Evaluated Nuclear Data Library) libraries. Discrepancies between experimental and theoretical data were observed for reactions involving composite-particle emission, such as alpha particles and deuterons. These differences suggest that while current models adequately describe simple two-nucleon emission channels, further refinements are needed, particularly for modeling composite-particle emission at lower proton energies.
In this work, we investigate the mass spectra, magnetic moments, and Regge trajectories of the triply heavy baryons $Ω_{ccb}$ and $Ω_{cbb}$ within a nonrelativistic constituent quark model based on the quark--diquark approximation, which reduces the three-body problem to an effective two-body system. For each baryon, all three possible diquark clusterings are considered, providing a qualitative indication of the sensitivity of the results to the quark--diquark decomposition. The model parameters are fixed by a fit to the measured $B_c$ meson spectrum, thereby anchoring the baryon predictions to experimentally constrained inputs and establishing a consistent link between the heavy meson and baryon sectors. We obtain ground-state masses of approximately $8.0$~GeV for $Ω_{ccb}$ and $11.0$~GeV for $Ω_{cbb}$, with radial and orbital excitation patterns in good agreement with the results reported in the literature. The computed magnetic moments of the spin-$\tfrac{1}{2}$ and spin-$\tfrac{3}{2}$ states are consistent with the results of various approaches. A radial Regge analysis in the $(n_r, M^2)$ plane reveals approximately linear $P$-wave trajectories and mildly curved $S$-wave trajectories, with slope and intercept parameters that scale systematically with the heavy-quark content of the baryon. These results suggest that the nonrelativistic quark--diquark framework provides a reliable description of triply heavy baryons and serves as a useful reference for future experimental searches, particularly at LHCb.
Reproducing scientific analyses is essential for preserving knowledge, building extensible codebases, and deepening researcher understanding - yet the effort often outweighs its academic recognition. We argue that the reproduction of scientific data analyses is fundamentally a translation task: converting human-readable knowledge (papers, documentation) into machine-readable analysis code. This makes it uniquely well-suited for AI agents. We present SHARP (Scientific Human-Agent Reproduction Pipeline), a structured framework for reproducing scientific analyses through human-agent collaboration. SHARP decomposes a reproduction task into discrete steps, which an AI agent executes autonomously using specialized subagents for code generation, testing, and quality assurance. At defined checkpoints, the researcher reviews progress, provides feedback, and steers the analysis - keeping the human firmly in control of scientific judgment while the agent handles implementation. We demonstrate SHARP by reproducing a jet classification task in particle physics from a published paper. We evaluate the reproduction along three axes: analysis performance against the original results, code quality and faithfulness, and the nature of the human-agent conversation. The latter is evaluated with a novel framework for characterizing human-agent interactions. Our work highlights a practical model for AI-assisted scientific reproduction where the researcher's role shifts from writing code to understanding, evaluating, and directing - elevating human understanding rather than replacing it.
We propose a fundamental shift in the search for beyond the Standard Model long-lived particles (LLPs) at high-luminosity hadron colliders by prioritizing physical background suppression over traditional inner tracking. We introduce $\textsf{DELIGHT-SHIELD}$, a dedicated detector design for a 100 TeV Future Circular Collider at a dedicated interaction point for LLP searches. By replacing the inner parts of the detector with a multi-layered composite shield, followed by tracking volumes, we estimate a suppression of Standard Model hadronic and electromagnetic backgrounds by up to seven orders of magnitude analytically. Full Geant4 simulations validate the effectiveness of this design. Although the achieved suppression is somewhat lower than the analytical estimate, primarily due to secondary particle production within the shield, the residual background remains at a level that is manageable for LLP analyses. It can be further mitigated by applying energy thresholds, as well as vertexing and timing cuts in the downstream detector. Benchmarking against dark scalar model, we show that this shielding based detector concept achieves sensitivity to branching ratios as low as $\mathcal{O}(10^{-9})$ for $h\rightarrowφφ$ process under zero background condition $-$ outperforming general-purpose detector baselines. This strategy not only expands the discovery reach for neutral LLPs but also provides a rigorous experimental handle to distinguish new physics from Standard Model punch-through backgrounds. We further discuss a phased implementation at the High-Luminosity LHC as a critical testbed for this novel detection concept.
We experimentally demonstrate the detection of momentum transfers from individual collisions of Kr, Xe, and SF$_6$ with an optically levitated nanoparticle, finding good agreement with theoretical expectations. The observed event rates accurately measure the gas partial pressures, while the spectral shape provides a sensitive probe of the surface properties of the nanoparticle, including its temperature. The reconstruction of impulse signals as small as 200 keV/$c$ further establishes that levitated optomechanical sensors can reach the sensitivity required for precision measurements of fundamental particle interactions, and demonstrates a proof-of-principle for a primary pressure sensor based on the detection of individual gas particle collisions.
We study the rates of two-body charmed anti-charmed baryonic $\overline B\to {\bf B}_c \overline {\bf B}_c$ decays using the topological amplitude approach. All amplitudes of $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$, ${\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf { 3_f})$, ${\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf {\bar 6_f})$ and ${\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf {\bar 6_f})$ decays are decomposed topologically. SU(3) breaking effects on these amplitudes, depending on the position of the $s$-quark line, are modeled. Using existing data as inputs, we obtained the following results. (i) In the low-lying $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$ decays, we find that the exchange diagram is sizable. Furthermore, there is a large cancellation between internal $W$-tree and exchange $W$-tree amplitudes. The SU(3) breaking is sizable, 35% SU(3) breaking effects are needed, and they work differently in different amplitudes. The rates of $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$ decays with some excited ${\bf B}_c(\bf {\bar 3_f})$ are also studied. (ii) The $\overline B\to {\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf { 3_f})$ decays, with low-lying $ \overline {\bf B}_c(\bf { 3_f})$ and low-lying and some excited ${\bf B}_c(\bf 6_f)$ baryons are studied with some predictions on rates obtained. (iii) The $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf {\bar 6_f})$ decays with low-lying charmed anti-charmed baryons are studied with some predictions on rates obtained. (iv) Uncertainties in most predicted rates are large, reflecting our current poor understanding of the related SU(3) breaking effects. Measuring these rates can provide very useful information about these effects.
The STAR Collaboration reports measurements of the collision energy dependence of hypertriton (${}^{3}_Λ$H) transverse momentum spectra and $p_{\rm T}$-integrated yields at mid-rapidity ($|y|<$0.5) in Au+Au collisions at 11 collision energies between 3.2 and 27\,GeV. The measured ${}^{3}_Λ$H yields and ${}^{3}_Λ$H/$Λ$ yields ratio in central collisions increase strongly with decreasing collision energy, and are a factor of $\sim$2 lower than thermal model predictions at this energy range. The mean $p_{\rm T}$ of ${}^{3}_Λ$H is lower than the Blast-Wave expectation using the freeze-out parameters from light hadrons. Furthermore, the observed double ratio $({}^{3}_Λ{\rm{H}}/Λ)/(t/p)$ maintains a constant value of $\sim$0.4 across the measured energy range. Within the coalescence framework, this ratio directly reflects the significantly suppressed formation probability of the weakly-bound hypertriton relative to the triton, which results from the weaker hyperon-nucleon interaction compared with the nucleon-nucleon interaction.
The IceCube Neutrino Observatory has opened a new window into the high-energy Universe, providing measurements of neutrinos over a broad energy range. This contribution presents recent results, including a follow-up on the first identification of a steady neutrino source NGC 1068, measurements of the flavor composition of the diffuse astrophysical flux, limits on prompt atmospheric neutrinos, and searches for neutrinos from dark matter annihilation in the Sun. These measurements probe neutrino production mechanisms, fundamental particle interactions, and physics beyond the Standard Model. Looking forward, the recently deployed IceCube Upgrade will enhance sensitivity to lower-energy neutrinos and reduce systematic uncertainties, while the planned IceCube-Gen2 will expand the detector volume, increase the neutrino detection rate, and extend energy reach, enabling more detailed studies of cosmic sources and high-energy particle physics.
We report enhanced evidence for the $X(7200)$ state and significantly improved measurements of the $X(6900)$ resonance parameters through a combined analysis of the di-$J/ψ$ mass spectrum using published data from LHCb, ATLAS, and CMS. By performing simultaneous fits to all three experiments, we observe the $X(6900)$ with overwhelming significance ($>12σ$) and determine its mass and width with improved precision. For the $X(7200)$, we find consistent signals across multiple interference models, with significances ranging from $3.7σ$ to $6.6σ$; the best-fit model (adopting the CMS three-resonance scheme) yields $6.6σ$ significance, providing substantially strengthened evidence for this state. Our results underscore the essential role of interference effects in fully-charmed tetraquark spectroscopy and offer new constraints on their production mechanisms at the LHC.
China's first proton test beam facility, named the High-energy Proton-beam Experiment Station (HPES), is currently under construction in campus of CSNS, as part of the CSNS-II project. Utilizing protons slowly extracted from the Rapid Cycling Synchrotron of CSNS, HPES will deliver 1.6 GeV proton beam with an adjustable flux ranging from 1E3 to 1E8 protons per second. The station is composed of two dedicated test terminals designed to support comprehensive beam tests, serving as an advanced platform for particle detector development, irradiation hardness studies of aerospace chips, and GeV-proton-induced nuclear data measurements.To characterize the beam, HPES incorporates dedicated flux and profile monitors. For user experiments, the facility is equipped with a high-precision proton telescope offering a positioning resolution of 10 $μ$m, and a Time-of-Flight (TOF) spectrometer achieving an energy resolution of 1%. Furthermore, a compatible trigger logic unit have been designed to provide precise event tagging, which is essential for data alignment. This paper presents an overview of the detector systems within HPES, discusses their design considerations, and outlines the future prospects of the facility.
The LHCb detector has demonstrated a proven competitiveness across a wide range of physics analyses thanks to its forward coverage. These proceedings describe: i) complementary measurements using heavy flavour jets, ii) Electroweak (EW) measurements with the top and W boson, and iii) searches for New Physics states such as axion-like particles (ALPs), heavy-neutral lepton (HNLs) and B-meson decays to multi-muon final states.
The influence of the QED-analog of the inverse Compton effect on the transverse momentum spectra of particles produced in proton-proton collisions at energies of \sqrt{s}=14 TeV has been investigated. The analysis is based on the quark-gluon scattering process g + q --> g + q, which is the QCD analogue of Compton scattering of a photon on an electron and can lead to energy redistribution between partons, analogous to the mechanism of the inverse Compton effect. Data obtained numerically using the PYTHIA event generator (version 8.316) were used. A total of 5*10^5 proton-proton collisions at \sqrt{s}=14 TeV were analyzed. Events were classified based on the relative energies of the initial quark and gluon, which allowed us to distinguish Compton scattering (DCE) events from inverse Compton scattering (ICE) events. Particle transverse momentum spectra were obtained in the region: p_{T} < 10 GeV/c. The results showed that including inverse Compton scattering events in the analysis leads to a moderate increase in particle yield. The ratio of the spectra for ICE and DCE events remains approximately constant and is about 1.1 within statistical errors. No significant broadening of the transverse momentum spectra is observed. These results show that proton-proton collisions can serve as a reliable baseline for studies of energy redistribution mechanisms in a dense QCD medium, such as quark-gluon plasma.
Motivated by the first observation of CP violation in $b$-baryon decays, the search for baryonic decays exhibiting large CP violation will be a primary focus in the coming years. We propose that significant CP-violating effects exist in the decay $Λ_b \to ΛD$, where $D$ denotes a CP eigenstate of the $D^0 - \bar{D}^0$ system. The predicted CP asymmetries for both the CP-even and CP-odd modes can reach magnitudes as large as $50\%$, making these decays promising targets for measurement at the LHCb experiment. Additionally, we predict for the first time several nonzero CP-violating observables associated with angular distribution parameters, providing valuable complementary information in the search for CP violation in baryon decays. Furthermore, we propose a novel strategy to extract the CKM angle $γ$ by combining data on angular distribution parameters and decay rates from the relevant channels. We emphasize that $Λ_b \rightarrow ΛD$ decays are among the most promising candidates for determining $γ$ in the baryon sector. Our findings may offer new insights for future theoretical and experimental investigations.
We establish a quantitative relation between local spin polarization and quantum entanglement in two-qubit systems by deriving an upper bound on the concurrence at fixed local polarization, showing that increasing polarization constrains the maximum achievable entanglement. We further demonstrate that this bound is saturated by pure states in certain cases. As a concrete physical application, we consider the parity-violating process $e^+e^- \to Z^0 \to q\bar{q}$, which generates final-state spin polarization. We show that the maximal concurrence is attained in specific kinematic regions and is significantly reduced relative to the unpolarized case. These results establish a general, process-independent framework connecting local polarization, maximal entanglement, and the role of pure states.
In this work, we investigate the mass spectra, magnetic moments, and Regge trajectories of the triply heavy baryons $Ω_{ccb}$ and $Ω_{cbb}$ within a nonrelativistic constituent quark model based on the quark--diquark approximation, which reduces the three-body problem to an effective two-body system. For each baryon, all three possible diquark clusterings are considered, providing a qualitative indication of the sensitivity of the results to the quark--diquark decomposition. The model parameters are fixed by a fit to the measured $B_c$ meson spectrum, thereby anchoring the baryon predictions to experimentally constrained inputs and establishing a consistent link between the heavy meson and baryon sectors. We obtain ground-state masses of approximately $8.0$~GeV for $Ω_{ccb}$ and $11.0$~GeV for $Ω_{cbb}$, with radial and orbital excitation patterns in good agreement with the results reported in the literature. The computed magnetic moments of the spin-$\tfrac{1}{2}$ and spin-$\tfrac{3}{2}$ states are consistent with the results of various approaches. A radial Regge analysis in the $(n_r, M^2)$ plane reveals approximately linear $P$-wave trajectories and mildly curved $S$-wave trajectories, with slope and intercept parameters that scale systematically with the heavy-quark content of the baryon. These results suggest that the nonrelativistic quark--diquark framework provides a reliable description of triply heavy baryons and serves as a useful reference for future experimental searches, particularly at LHCb.
Reproducing scientific analyses is essential for preserving knowledge, building extensible codebases, and deepening researcher understanding - yet the effort often outweighs its academic recognition. We argue that the reproduction of scientific data analyses is fundamentally a translation task: converting human-readable knowledge (papers, documentation) into machine-readable analysis code. This makes it uniquely well-suited for AI agents. We present SHARP (Scientific Human-Agent Reproduction Pipeline), a structured framework for reproducing scientific analyses through human-agent collaboration. SHARP decomposes a reproduction task into discrete steps, which an AI agent executes autonomously using specialized subagents for code generation, testing, and quality assurance. At defined checkpoints, the researcher reviews progress, provides feedback, and steers the analysis - keeping the human firmly in control of scientific judgment while the agent handles implementation. We demonstrate SHARP by reproducing a jet classification task in particle physics from a published paper. We evaluate the reproduction along three axes: analysis performance against the original results, code quality and faithfulness, and the nature of the human-agent conversation. The latter is evaluated with a novel framework for characterizing human-agent interactions. Our work highlights a practical model for AI-assisted scientific reproduction where the researcher's role shifts from writing code to understanding, evaluating, and directing - elevating human understanding rather than replacing it.
Multiparticle production in hadron and lepton interactions still attracts our attention. Simulation by using Monte Carlo event generators is performed before planning any experiment. But it often overestimates (or underestimates) experimental data. These generators are based on the theory of strong interactions, quantum chromodynamics (QCD), which is capable of performing calculations only in the perturbation theory. Soft processes that make up a significant contribution in high-energy interactions are forced to involve phenomenological models. Of all multiparticle production processes, electron-positron annihilation is the theoretically cleanest, proceeding via an intermediate virtual photon or $Z^0$-boson followed by quark-antiquark pair creation. QCD describes well the development of quark-gluon ($qg$) cascade as marcovian branching process, that is called first stage. The transformation of quarks and gluons produced in the $qg$-cascade into observable hadrons occurs in the second stage, hadronization, to which perturbation theory is no longer applicable. The choice of a scheme for it is based on experimental data. Convolution of $qg$-cascade and hadronization allowed us to describe the multiplicity in practice all processes of multiple production in both lepton and hadron high-energy collisions. This model is called the gluon dominance model. Several decades have passed since a series of $e^+e^-$ annihilation experiments were carried out. Now, the main interests of high energy physicists are focused on the study of multiparticle production in proton and heavy ion collisions. Their research revealed many new results in the theory of strong interactions, including the hadronization. That is why it appeared necessary to analyze multiplicity n $e^+e^-$-annihilation again.
We propose a novel method to determine the mass scale of ambient dark matter, applicable to (at least effectively) two-dimensional direct detection experiments that allow for directionality observables. Due to the motion of the solar system and Earth relative to the Galactic Center and the Sun, the dark-matter flux exhibits a directional preference. We first demonstrate that dark-matter event rates depend non-trivially on the angle between the detection plane and the overall dark-matter flow, with the curvature of this angular spectrum encoding mass information. As proof of principle, we take the recently proposed Graphene-Josephson-Junction-based superlight dark-matter detector as a concrete example and validate these theoretical expectations through numerical analyses.
We propose a fundamental shift in the search for beyond the Standard Model long-lived particles (LLPs) at high-luminosity hadron colliders by prioritizing physical background suppression over traditional inner tracking. We introduce $\textsf{DELIGHT-SHIELD}$, a dedicated detector design for a 100 TeV Future Circular Collider at a dedicated interaction point for LLP searches. By replacing the inner parts of the detector with a multi-layered composite shield, followed by tracking volumes, we estimate a suppression of Standard Model hadronic and electromagnetic backgrounds by up to seven orders of magnitude analytically. Full Geant4 simulations validate the effectiveness of this design. Although the achieved suppression is somewhat lower than the analytical estimate, primarily due to secondary particle production within the shield, the residual background remains at a level that is manageable for LLP analyses. It can be further mitigated by applying energy thresholds, as well as vertexing and timing cuts in the downstream detector. Benchmarking against dark scalar model, we show that this shielding based detector concept achieves sensitivity to branching ratios as low as $\mathcal{O}(10^{-9})$ for $h\rightarrowφφ$ process under zero background condition $-$ outperforming general-purpose detector baselines. This strategy not only expands the discovery reach for neutral LLPs but also provides a rigorous experimental handle to distinguish new physics from Standard Model punch-through backgrounds. We further discuss a phased implementation at the High-Luminosity LHC as a critical testbed for this novel detection concept.
Flavor instabilities develop in neutrino plasmas through emission of flavomons, the quanta of flavor waves. We derive the flavomon equations of motion in slowly varying environments, notably the matter gradients of supernovae, and use them to construct a flavomon ray tracing framework. Combined with a quasi-linear description of flavomon growth, we thus develop a new approach to the global evolution of flavor instabilities. As a first application, we show that the growth of neutrino-mass-induced instabilities is slowed down, but not suppressed, by the inevitable matter gradients. Local stability analysis alone cannot gauge the impact of inhomogeneities and instead must be coupled to flavomon ray tracing.
The theory of bremsstrahlung $e \to eγ$ by extremely high energy electrons passing through ordinary matter has been qualitatively incomplete. We revisit the suppression of bremsstrahlung by the Landau-Pomeranchuk-Migdal (LPM) effect, here accounting for quantum disruption of that effect from pair production. Our analysis covers the full range of ultra-relativistic electron and photon energies (subject to a few simplifying approximations).
High energy neutrinos can be injected in the early Universe from the decay or annihilation of long lived primordial relics. We analyse the possibility that the ultrahigh energy neutrino event recently observed by the KM3NeT neutrino telescope could have such an origin. This possibility has the advantage of leading to a sharp spectral feature in a way that the neutrino flux can be small at all energies except at the KM3NeT event energy. Thus, along this scenario the tension with null results from other experiments is reduced with respect to the usual power law case analysed by the KM3NeT and IceCube experiments. At such energies and for an emission around the recombination time, interactions of these neutrinos with background neutrinos prove to be relevant and must be determined from the development of a dedicated code. These interactions, as well as final state radiation processes, modify the spectrum. Interestingly, it turns out that the scenario can also leave an imprint in the CMB that could be probed in the near future. Interestingly too, this scenario does not predict an associated $γ$-ray flux beyond observation. All in all we do find that the high energy neutrino could be a primordial high energy neutrino, provided it has been produced around the recombination time or later.
We describe the implementation and usage of `fermionic_amplitudes.m', a Mathematica package for the computation of tree amplitudes involving arbitrary numbers of gauge bosons and arbitrarily-charged massless fermions of (possibly) distinct flavours in pure (non-supersymmetric) gauge theory. These are given in terms of a basis of partial amplitudes involving distinct-flavoured fermions dressed by specific colour tensors. Distinct-flavour partial amplitudes are expressed as linear combinations of those involving only a single flavour, which may be evaluated as component amplitudes of (maximally) supersymmetric Yang-Mills theory. All relevant colour tensors can be realized as explicit, numeric arrays given any choice of charge generators (for any gauge theory -- including $u_1$); from these, all colour contractions relevant to cross sections may be readily computed. The complete package and a notebook demonstrating its primary usage and functionality are included in this work's submission's ancillary files on the arXiv.
We investigate the sensitivities of upcoming MeV gamma-ray telescopes to sterile neutrino dark matter in the mass range $(0.2-100)\,{\rm MeV}$. Sterile neutrinos in this regime can produce observable photon signals through radiative two-body decays and three-body decays with final-state radiation. We perform a Fisher forecasting analysis incorporating realistic astrophysical background modeling and detector response to derive projected constraints on the sterile neutrino decay rate. We find that future MeV instruments can improve existing limits by several orders of magnitude across a wide region of parameter space. Our results highlight the discovery potential of next-generation MeV telescopes in probing sterile neutrino dark matter.
We reevaluate the vacuum polarization functions for electroweak gauge bosons at three loops in QCD, employing state-of-the-art perturbative techniques. We apply these results to determine the ${\mathcal{O}}(αα_s^2)$ corrections to the electroweak radiative parameters $Δρ$, $Δr$ and $Δκ$. We improve the accuracy of the calculation at this perturbative order, compared to the existing literature, and present some phenomenological implications of these results. We find a shift in the prediction of the $W$ boson mass, significant in view of the FCC precision targets. We improve the prediction of the $\overline{\mathrm{MS}}$ electric charge at $q^2=m_Z^2$ with the inclusion of these ${\mathcal{O}}(αα_s^2)$ corrections.
We present the first global phenomenological fit of inclusive $\bar B\to X_c\ell\barν$ observables to all available experimental data allowing for generic dimension-six New Physics interactions in the Weak Effective Theory. The fit includes both New Physics Wilson coefficients and non-perturbative QCD parameters. New Physics contributions are calculated including power corrections up to $\mathcal O(Λ_{\rm QCD}^3/m_b^3)$ and perturbative QCD corrections up to $\mathcal O(α_s)$. We find no significant preference for New Physics and obtain bounds on the size of the relevant Wilson Coefficients competitive with those coming from exclusive modes.
We compare the correlation functions of inflationary perturbations computed either with quantum or classical dynamics. Even if they are enforced to agree at a specific time during inflation, classical and quantum correlations will differ at the end of inflation, provided that interactions are relevant. The difference between the results of the classical and quantum computations is exponentially sensitive to the number of e-folds elapsed from the time of agreement. We illustrate this finding with the tree-level bispectrum of the primordial curvature fluctuation and the one-loop power spectrum of tensor modes. We also show that classical evolution from a finite time does not imply the appearance of poles in the scalar bispectrum.
The axion-like particles $a$ can be produced in the Sun via the process of $p + D \to {}^3{\rm He} +a$, with mass up to 5.5 MeV. The photons in the subsequent decay $a \to γγ$ can deviate significantly from the Sun, or even from roughly the opposite direction of the Sun. The nontrivial angular and spectral distributions of such photons enable us new methods to detect the {\it lights from the darkness}. In this letter, we consider both the space detection and terrestrial experiments at the South Pole. As a result of the two-body decay and the geometric effects, there exists a critical height for the terrestrial experiments, below which there is no photon for some regions of the parameter space. With the sensitivities of $10^{-16}$ ($10^{-17}$) erg cm$^{-2}$ s$^{-1}$ for the MeV-scale photons in future space and terrestrial experiments, the coupling $g_{aγ}$ of $a$ to photons can be probed up to $3\times10^{-12}$ ($1\times10^{-12}$) GeV$^{-1}$, well surpassing the current supernova limits.
We study the rates of two-body charmed anti-charmed baryonic $\overline B\to {\bf B}_c \overline {\bf B}_c$ decays using the topological amplitude approach. All amplitudes of $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$, ${\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf { 3_f})$, ${\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf {\bar 6_f})$ and ${\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf {\bar 6_f})$ decays are decomposed topologically. SU(3) breaking effects on these amplitudes, depending on the position of the $s$-quark line, are modeled. Using existing data as inputs, we obtained the following results. (i) In the low-lying $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$ decays, we find that the exchange diagram is sizable. Furthermore, there is a large cancellation between internal $W$-tree and exchange $W$-tree amplitudes. The SU(3) breaking is sizable, 35% SU(3) breaking effects are needed, and they work differently in different amplitudes. The rates of $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf { 3_f})$ decays with some excited ${\bf B}_c(\bf {\bar 3_f})$ are also studied. (ii) The $\overline B\to {\bf B}_c(\bf 6_f) \overline {\bf B}_c(\bf { 3_f})$ decays, with low-lying $ \overline {\bf B}_c(\bf { 3_f})$ and low-lying and some excited ${\bf B}_c(\bf 6_f)$ baryons are studied with some predictions on rates obtained. (iii) The $\overline B\to {\bf B}_c(\bf {\bar 3_f}) \overline {\bf B}_c(\bf {\bar 6_f})$ decays with low-lying charmed anti-charmed baryons are studied with some predictions on rates obtained. (iv) Uncertainties in most predicted rates are large, reflecting our current poor understanding of the related SU(3) breaking effects. Measuring these rates can provide very useful information about these effects.
We compute the generalized parton distributions (GPDs) of valence quarks, sea quarks, and gluons in the proton using light-front wave functions obtained within the basis light-front quantization (BLFQ) framework, providing a realistic description of the nucleon at a low resolution scale. The wave functions are derived from a light-front QCD Hamiltonian without an explicit confining potential and include the three-quark, three-quark-gluon, and three-quark-quark-antiquark Fock sectors. For the first time within BLFQ, we evaluate quark GPDs at nonzero skewness in both the DGLAP and ERBL regions, while gluon GPDs are computed in the DGLAP region. The resulting GPDs exhibit qualitative features similar to, but smaller than the GUMP1.0 global extraction of GPDs based on experimental and lattice QCD data at next-to-leading order accuracy. We further compute the associated Compton form factors and obtain results consistent with the global analysis.
We reanalyze the lattice spectra for $I=1/2$ $Dπ$ scattering in the $A_1^+$ irreducible representation from [Phys. Rev. D 111, 014503 (2025)] to investigate the impact of chiral and SU(3) flavor symmetries in $S$-wave $Dπ$ scattering and the $D_0^*(2300)$ resonance. By fitting the phase shifts obtained via Lüscher's formula with both traditional and chirally modified effective-range expansion and $K$-matrix parameterizations, we find that the chiral factor shifts the extracted pole mass closer to the threshold (especially for resonances) and substantially reduces the resonance width. These findings are confirmed by unitarized chiral perturbation theory through a direct fit to the lattice spectra with both the single-channel and the $Dπ$-$Dη$-$D_s\bar{K}$ coupled-channel schemes. Once the coupled channels are incorporated, the two-pole structure of the $D_0^*(2300)$ emerges. The trajectories of the two poles are investigated by varying the pion mass.
A recent report on ${^7{\rm Li}}(e,e'K^+)$ electroproduction runs by the A1 collaboration at the Mainz Microtron (MAMI) assigns a sharp pion-momentum line at $p_{π^-}\approx 113.8\pm 0.1$ MeV/c to ${_Λ^3}{\rm H}\toπ^-+{^3{\rm He}}$ weak decay, resulting in exceptionally large ${_Λ^3}{\rm H}$ binding-energy $B_Λ({_Λ^3}{\rm H})=0.523\pm 0.013\pm 0.075$ MeV. Here I suggest an alternative interpretation of the observed sharp line in terms of ${_Λ^7}{\rm He}_{\rm g.s.}\toπ^-+{^7{\rm Li}}(E_{\rm x}=478$ keV) weak decay, discussing also the model dependence of $B_Λ({_Λ^7}{\rm He})$.
We study the $uuddss$ multiquark within a constituent quark model framework, solving the corresponding nonrelativistic Schrodinger equation by means of a diffusion Monte Carlo (DMC) method. The total wavefunction is written as the product of a radial component and an exact spin-color-flavor state, restricted to isospin $I$=0. For this isospin, all allowed flavor wave functions are included. We explore two distinct constructions of the six-quark system. In the first one, corresponding to a sexaquark, all six quarks are treated as indistinguishable and the wave function is fully antisymmetric with respect to the exchange of any two quarks. In the second one, corresponding to the $H$ dibaryon, the system is partitioned into two sets of three quarks, effectively mimicking a baryon-baryon-like configuration including hidden color terms in which antisymmetry is imposed only within each three-quark cluster. Only when the system is forced into a baryon-baryon-like configuration, and for certain values of the spin, color and flavor quantum numbers, do we obtain states with masses close to, but above, the two-baryon threshold. Those states are characterized by two loosely bound three-quark clusters separated from one another by a distance of $\sim$ 2.5 fm. The remaining structures are compact objects irrespectively of their internal wavefunction.
We study neutrino mass generation within the framework of non-holomorphic modular symmetry proposed by Qu and Ding. In this formalism, neutrino masses are generated via the Type-I seesaw mechanism, where the Yukawa couplings depend on non-holomorphic modular forms. The viability of the model is examined through a $χ^2$ analysis using current neutrino oscillation data. The $χ^2_{min}$ value is found to be $7.06$ for normal hierarchy(NH). All neutrino oscillation parameters are consistent within their $1σ$ allowed ranges, except the atmospheric mixing angle $\sin^2θ_{23}$, which is predicted to lie in the second octant. The Dirac CP-violating phase($δ_{CP}$) is constrained to the first and fourth quadrants, indicating relatively weak CP violation. These predictions can be tested in future long-baseline neutrino oscillation experiments. The sum of neutrino masses is compatible with the stringent bound proposed by the DESI experiment. However, the inverted hierarchy(IH) is not viable in this model, as the predicted value of $χ^2_{min}$ exceeds 100, and the mixing angles $\sin^2θ_{12}$ and $\sin^2θ_{23}$ lie outside the $3 σ$ allowed ranges.
We review the salient features of next-to-leading-order QCD and electroweak corrections to the scattering of two and the production of three weak gauge bosons at the Large Hadron Collider. Results for the tower of $O(α_s^mα^n)$ corrections are shown for the exemplary processes of like-sign WW scattering and triple-W production, emphasizing the large impact of purely electroweak corrections which generically grow to $\sim-16\%$ and $\sim-7\%$ for these process types, respectively, even for integrated cross sections. Moreover, we discuss the possibility to reproduce the results of full off-shell calculations by the "vector-boson scattering approximation", "leading-pole approximations", and the "effective vector-boson approximation".
We present perturbative quantum chromodynamics (pQCD) predictions for high-momentum particle yield modification in very light ion collisions - ${}^{10}\mathrm{B}+{}^{10}\mathrm{B}$, ${}^{6}\mathrm{Li}+{}^{6}\mathrm{Li}$, ${}^{4}\mathrm{He}+{}^{4}\mathrm{He}$, and ${}^{3}\mathrm{He}+{}^{3}\mathrm{He}$ - with and without medium-induced energy loss. We find non-trivial suppression in symmetric systems from ${}^{208}\mathrm{Pb}+{}^{208}\mathrm{Pb}$ to ${}^{3}\mathrm{He}+{}^{3}\mathrm{He}$ and in asymmetric $A+B$ systems, with the suppression scaling approximately as $R_{AB} \simeq (\sqrt{AB})^{1/3}$. Further, we find that ${}^{3}\mathrm{He}$ and ${}^{6}\mathrm{Li}$ offer particularly clean environments for observing final-state partonic energy loss from quark-gluon plasma (QGP) formation in extremely small systems. Finally, we show that energy loss models generically predict $v_2\{\mathrm{SP}\} \approx 0$ in small systems, indicating that the large measured $v_2 > 0$ in $p+{}^{208}\mathrm{Pb}$ is not due to energy loss.
In a mixed dark matter scenario in which primordial black holes (PBHs) would co-exist with thermally produced self-annihilating particles, one expects the former to be surrounded by extremely dense halos made of the latter, built up during radiation domination. Here, as a continuation of previous work, we derive observational limits on such a scenario from a full statistical analysis of cosmic microwave background (CMB) data. We quantify how a tiny fraction $\fbh$ of PBHs could restrict the parameter space available to thermal particle dark matter, limiting the $s$-wave annihilation cross section to values $\lesssim 10^{-30}\,{\rm cm^3/s}\,(\mchi/100\,{\rm GeV})\,(\fbh/10^{-6})^{-3}$ if PBHs are typically heavier than $\sim 10^{-10}\,\Msun$, which can also be turned into constraints on PBHs in this mass range. In contrast, asteroid mass or lighter PBHs could live in perfect peace with these particles. Finally, we shortly discuss the implications of the recent tentative interpretation of Subaru-HSC microlensing events as PBHs.
Prompted by misconceptions in the recent literature, we review the justifications for naturalness arguments and Occam's razor found in Bayesian statistics. We discuss the automatic Occam's razor that emerges in Bayesian formalism, bringing together points of view from diverse fields, including statistics, social sciences, physics and machine learning. In pedagogical calculations, we demonstrate that this automatic razor disfavors unnatural models in which predictions must be fine-tuned to agree with observation.
The influence of the QED-analog of the inverse Compton effect on the transverse momentum spectra of particles produced in proton-proton collisions at energies of \sqrt{s}=14 TeV has been investigated. The analysis is based on the quark-gluon scattering process g + q --> g + q, which is the QCD analogue of Compton scattering of a photon on an electron and can lead to energy redistribution between partons, analogous to the mechanism of the inverse Compton effect. Data obtained numerically using the PYTHIA event generator (version 8.316) were used. A total of 5*10^5 proton-proton collisions at \sqrt{s}=14 TeV were analyzed. Events were classified based on the relative energies of the initial quark and gluon, which allowed us to distinguish Compton scattering (DCE) events from inverse Compton scattering (ICE) events. Particle transverse momentum spectra were obtained in the region: p_{T} < 10 GeV/c. The results showed that including inverse Compton scattering events in the analysis leads to a moderate increase in particle yield. The ratio of the spectra for ICE and DCE events remains approximately constant and is about 1.1 within statistical errors. No significant broadening of the transverse momentum spectra is observed. These results show that proton-proton collisions can serve as a reliable baseline for studies of energy redistribution mechanisms in a dense QCD medium, such as quark-gluon plasma.
Motivated by the first observation of CP violation in $b$-baryon decays, the search for baryonic decays exhibiting large CP violation will be a primary focus in the coming years. We propose that significant CP-violating effects exist in the decay $Λ_b \to ΛD$, where $D$ denotes a CP eigenstate of the $D^0 - \bar{D}^0$ system. The predicted CP asymmetries for both the CP-even and CP-odd modes can reach magnitudes as large as $50\%$, making these decays promising targets for measurement at the LHCb experiment. Additionally, we predict for the first time several nonzero CP-violating observables associated with angular distribution parameters, providing valuable complementary information in the search for CP violation in baryon decays. Furthermore, we propose a novel strategy to extract the CKM angle $γ$ by combining data on angular distribution parameters and decay rates from the relevant channels. We emphasize that $Λ_b \rightarrow ΛD$ decays are among the most promising candidates for determining $γ$ in the baryon sector. Our findings may offer new insights for future theoretical and experimental investigations.
Astrophysical dark matter particles with masses well below GeV-scale can be difficult to detect using conventional nuclear recoil experiments due to their low velocities in our Milky Way halo. Elastic scattering with high-energy cosmic rays or thermal production inside core-collapse supernovae can accelerate sub-GeV DM to (semi-)relativistic velocities, producing nuclear recoil energies above the keV threshold that paleo-detectors can record over geological timescales. Using olivine as the target with 100$\,$g$\cdot$Gyr exposure, we compute track length distributions from such (semi-)relativistic dark matter fluxes, incorporating all major backgrounds (neutrinos, uranium-chain neutrons, thorium recoils) with a statistical analysis on an Asimov dataset. We derive 95 C.L. projected sensitivity of paleo-detectors to the DM-nucleon cross section for dark matter masses between a few MeV and hundreds of MeV. Our results show that paleo-detectors are able to probe large parameter regions that are not covered by current and near-future experiments designed to detect dark matter and neutrinos. In particular, paleo-detectors offer a unique ability to record the dark matter flux from Galactic supernova events over geological times. Such cumulative exposure enables sensitivity gains of a few orders of magnitude compared to conventional experiments.
We study how rotation modifies the constraints on MeV-scale axion-like particles (ALPs) coupled to photons derived from SN 1987A. We constrain the ALP parameter space based on both the energy-loss argument and the gamma-ray limits, and examine how these constraints are affected by stellar rotation. Adopting initial angular velocities of $Ω_{0} = 0.0 and 1.0 rad s^{-1}$ in the iron core, we carry out two-dimensional core-collapse supernova simulations for three progenitor models - a $14 + 9M_{\odot}$ binary and $13M_{\odot}$ and $18M_{\odot}$ single stars with solar metallicity - and estimate ALP emission rates through post-processing. We find that rotation suppresses ALP emission by reducing the core temperature via centrifugal support. Rotation also reduces the neutrino luminosity, but the suppression of ALP emission is more effective, leading to relaxed constraints within a simplified criterion based on the energy-loss argument. This relaxation is particularly pronounced in the rotating $18M_{\odot}$ model, where a substantial decrease in the central temperature occurs at $t_{pb} = 0.8 - 1 s$. In this simplified criterion, such rapid temporal variations in temperature indicate that the resulting constraints depend sensitively on both the evaluation time and the underlying supernova model. For a gamma-ray limit from the SN 1987A observation, rotation has a negligible impact on the constraint. This is because the ALP-induced gamma-ray fluence observed at Earth is proportional to the fourth power of the ALP-photon coupling constant, making the constraint relatively insensitive to the rotational suppression of ALP emission.
We investigate the internal structure of near-threshold $s$-wave eigenstates in a two-body system with Coulomb plus short-range interactions. Using a nonrelativistic effective field theory, we derive the expression for the compositeness in terms of the energy derivative of the self-energy, which is applicable to the present system with the non-separable Coulomb interaction. For near-threshold states, the compositeness can be written solely in terms of the Coulomb scattering length, the Coulomb effective range, and the Bohr radius, providing the weak-binding relation in the presence of the Coulomb interaction. We numerically study the pole trajectories and the compositeness and find that the Coulomb interaction qualitatively modifies the threshold behavior of the poles and the internal structure of the eigenstates. We show that when the Coulomb interaction is relatively strong, the enhancement of the compositeness near the threshold is absent, in contrast to purely short-range interactions. On the other hand, for a weak Coulomb interaction, a remnant of short-range universality survives, and near-threshold bound states tend to be composite dominant. Furthermore, even resonances are dominated by the composite component in the presence of the Coulomb interaction, owing to their continuous connection to the bound-state regime. We apply the formalism to realistic systems with near-threshold eigenstates, including exotic hadrons and nuclei.
We investigate the decay $φ\toπ^+π^-π^0$ in an effective-Lagrangian framework that keeps the dominant $ρπ$ mechanism and the direct three-pion term explicitly separated at the amplitude level. The resonant contribution is described with the Gounaris-Sakurai propagator, the neutral channel includes $ρ^0$-$ω$ mixing, and the leading elastic $P$-wave $ππ$ final-state interaction is incorporated through a constant on-shell Omnès factor $C_Ω\equiv|Ω_1(m_ρ^2)|$. The purpose of this approximation is not to provide a full dispersive reconstruction of $φ\to3π$, but to isolate the leading rescattering effect in a transparent phenomenological setting. With this setup, we obtain $Γ_{\rm th}=0.6950$ MeV, about $5\%$ above the empirical estimate $Γ_{\rm exp}\approx0.660\pm0.020$ MeV, while the direct integrated weight is reproduced at a realistic level, $I_{\mathrm{dir}}=8.457\times10^{-3}$. The computed on-shell Omnès factor, $C_Ω=4.794$, is quantitatively sizable, indicating that $ππ$ rescattering provides a nontrivial enhancement in the $ρ$-dominated channel. At the same time, the $x$ and especially the $y$ Dalitz projections still exhibit visible discrepancies near the phase-space boundary, showing that the present treatment should be viewed as an intermediate phenomenological step rather than a precision amplitude analysis. These residual tensions motivate the next stage: a fully $s$-dependent Omnès implementation and a direct fit to the efficiency-corrected KLOE Dalitz-bin data.
We give a systematic account of the soft mode dynamics of QCD critical point(QCD-CP) and the two-flavor color-superconductivity(2SC-CP) based on the 2-flavor Nambu-Jona-Lasinio model, and investigate their effects on electromagnetic observables in relativistic heavy-ion collisions (HIC). We first demonstrate that the collective excitations coupled to the fluctuations of the respective order parameters are the soft modes associated to the respective phase transitions, in the sense that they acquire a prominent spectral strength in the low-energy and low-momentum region above the respective critical temperatures, and the peak energy of the respective spectral functions goes down, i.e., gets softened, and eventually vanishes at the the critical point. It is shown that the diquark soft mode of the 2SC gives rise to the pseudogap, i.e., a depression in the density of states of the quark spectra around the Fermi surface above but in the vicinity of the critical temperature. Then, exploiting the ideas that were developed in condensed matter physics for describing the `para-conductivity' in the normal phase of metal superconductors, we show that the soft modes cause an anomalous enhancement of electric conductivity and the dilepton production rate, and discuss their relevance to HIC.
We establish a quantitative relation between local spin polarization and quantum entanglement in two-qubit systems by deriving an upper bound on the concurrence at fixed local polarization, showing that increasing polarization constrains the maximum achievable entanglement. We further demonstrate that this bound is saturated by pure states in certain cases. As a concrete physical application, we consider the parity-violating process $e^+e^- \to Z^0 \to q\bar{q}$, which generates final-state spin polarization. We show that the maximal concurrence is attained in specific kinematic regions and is significantly reduced relative to the unpolarized case. These results establish a general, process-independent framework connecting local polarization, maximal entanglement, and the role of pure states.
A dark photon is an Abelian gauge boson from a new $U(1)_D$ gauge symmetry, coupled to the Standard Model via kinetic mixing, with $\varepsilon$ inducing an effective coupling to the electromagnetic current and $g_χ$ to a stable dark matter particle $χ$. We study $J/ψ$ two-body and four-body decays via a light-mass dark photon ($m_U < 3.0$ GeV) in the framework of non-relativistic QCD (NRQCD), considering both visible and invisible decays of the dark photon into SM fermions or dark sector particles. Numerical results for the decay ratios $Γ/Γ_{J/ψ}$ and expected event numbers at BESIII are presented, along with the significance $S/\sqrt{B}$ and $p_T$ distributions where applicable. Our results show that, for $m_U < 2m_χ$, the two-body final state decay channels of the $J/ψ$ mediated by a dark photon have event yields $0\sim 37$ with significances of $10^{-5}\sim10^{-6}$, while the four-body final state channel yields about $94\sim172$ events in the very low mass region $m_U < 0.2\ \text{GeV}$. For $m_U \ge 2m_χ$, the invisible two-body final state decay channel yields $0\sim12$ events with better significance $10^{-1}\sim10^{-3}$, and the invisible four-body final state decay channel yields $0\sim129$ events, whereas visible decays are all severely suppressed.
Generalized Additive Models (GAMs) can be used to create non-linear glass-box (i.e. explicitly interpretable) models, where the predictive function is fully observable over the complete input space. However, glass-box interpretability itself does not allow for the incorporation of expert knowledge from the modeller. In this paper, we present ParamBoost, a novel GAM whose shape functions (i.e. mappings from individual input features to the output) are learnt using a Gradient Boosting algorithm that fits cubic polynomial functions at leaf nodes. ParamBoost incorporates several constraints commonly used in parametric analysis to ensure well-refined shape functions. These constraints include: (i) continuity of the shape functions and their derivatives (up to C2); (ii) monotonicity; (iii) convexity; (iv) feature interaction constraints; and (v) model specification constraints. Empirical results show that the unconstrained ParamBoost model consistently outperforms state-of-the-art GAMs across several real-world datasets. We further demonstrate that modellers can selectively impose required constraints at a modest trade-off in predictive performance, allowing the model to be fully tailored to application-specific interpretability and parametric-analysis requirements.
Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. To address these issues, we propose a framework for structured sparse nonnegative low-rank factorization with detection probability estimation. We impose nonconvex $\ell_{1/2}$ regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. The resulting optimization problem is nonconvex and nonsmooth. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines.
Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.
Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.
In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the label is not queried, the prediction from a machine learning model is used instead. Prior work proposed an elegant scheme that determines the query probability by combining an uncertainty-based suggestion with a constant probability that encodes a soft constraint on the query probability. We explored different values of the mixing parameter and observed an intriguing empirical pattern: the smallest confidence width tends to occur when the weight on the constant probability is close to one, thereby reducing the influence of the uncertainty-based component. Motivated by this observation, we develop a non-asymptotic analysis of the estimator and establish a data-dependent bound on its confidence interval. Our analysis further suggests that when a no-regret learning approach is used to determine the query probability and control this bound, the query probability converges to the constraint of the max value of the query probability when it is chosen obliviously to the current covariates. We also conduct simulations that corroborate these theoretical findings.
Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervised Score Ensembling (FUSE), a method for improving verification quality by ensembling verifiers without access to ground truth correctness labels. The key idea behind FUSE is to control conditional dependencies between verifiers in a manner that improves the unsupervised performance of a class of spectral algorithms from the ensembling literature. Despite requiring zero ground truth labels, FUSE typically matches or improves upon semi-supervised alternatives in test-time scaling experiments with diverse sets of generator models, verifiers, and benchmarks. In particular, we validate our method on both conventional academic benchmarks such as GPQA Diamond and on frontier, unsaturated benchmarks such as Humanity's Last Exam and IMO Shortlist questions.
Bayesian experimental design (BED) for complex physical systems is often limited by the nested inference required to estimate the expected information gain (EIG) or its gradients. Each outer sample induces a different posterior, creating a large and heterogeneous set of inference targets. Existing methods have to sacrifice either accuracy or efficiency: they either perform per-outer-sample posterior inference, which yields higher fidelity but at prohibitive computational cost, or amortize the inner inference across all outer samples for computational reuse, at the risk of degraded accuracy under posterior heterogeneity. To improve accuracy and maintain cost at the amortized level, we propose a grouped geometric pooled posterior framework that partitions outer samples into groups and constructs a pooled proposal for each group. While such grouping strategy would normally require generating separate proposal samples for different groups, our tailored ensemble Kalman inversion (EKI) formulation generates these samples without extra forward-model evaluation cost. We also introduce a conservative diagnostic to assess importance-sampling quality to guide grouping. This grouping strategy improves within-group proposal-target alignment, yielding more accurate and stable estimators while keeping the cost comparable to amortized approaches. We evaluate the performance of our method on both Gaussian-linear and high-dimensional network-based model discrepancy calibration problems.
Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that reproduces this phenomenon for gradient flow in a linear teacher--student setting. In this framework, learning occurs when an isolated eigenvalue separates from a noisy bulk, before eventually disappearing in the overfitting regime. The key ingredient is anisotropy in the input covariance, which induces fast and slow directions in the learning dynamics. In a two-block covariance model, we derive the full time-dependent bulk spectrum of the symmetrized weight matrix through a $2\times 2$ Dyson equation, and we obtain an explicit outlier condition for a rank-one teacher via a rank-two determinant formula. This yields a transient Baik-Ben Arous-Péché (BBP) transition: depending on signal strength and covariance anisotropy, the teacher spike may never emerge, emerge and persist, or emerge only during an intermediate time interval before being reabsorbed into the bulk. We map the corresponding phase diagrams and validate the theory against finite-size simulations. Our results provide a minimal solvable mechanism for early stopping as a transient spectral effect driven by anisotropy and noise.
Conformal prediction provides finite-sample, distribution-free coverage under exchangeability, but standard constructions may lack robustness in the presence of outliers or heavy tails. We propose a robust conformal method based on a non-conformity score defined as the half-mass radius around a point, equivalently the distance to its $(\lfloor n/2\rfloor+1)$-nearest neighbour. We show that the resulting conformal regions are marginally valid for any sample size and converge in probability to a robust population central set defined through a distance-to-a-measure functional. Under mild regularity conditions, we establish exponential concentration and tail bounds that quantify the deviation between the empirical conformal region and its population counterpart. These results provide a probabilistic justification for using robust geometric scores in conformal prediction, even for heavy-tailed or multi-modal distributions.
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations.
Large language models (LLMs) using chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.
Selecting an appropriate kernel is a central challenge in kernel-based spectral methods. In \emph{Kernelized Diffusion Maps} (KDM), the kernel determines the accuracy of the RKHS estimator of a diffusion-type operator and hence the quality and stability of the recovered eigenfunctions. We introduce two complementary approaches to adaptive kernel selection for KDM. First, we develop a variational outer loop that learns continuous kernel parameters, including bandwidths and mixture weights, by differentiating through the Cholesky-reduced KDM eigenproblem with an objective combining eigenvalue maximization, subspace orthonormality, and RKHS regularization. Second, we propose an unsupervised cross-validation pipeline that selects kernel families and bandwidths using an eigenvalue-sum criterion together with random Fourier features for scalability. Both methods share a common theoretical foundation: we prove Lipschitz dependence of KDM operators on kernel weights, continuity of spectral projectors under a gap condition, a residual-control theorem certifying proximity to the target eigenspace, and exponential consistency of the cross-validation selector over a finite kernel dictionary.
Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.
Variational inference (VI) is a central tool in modern machine learning, used to approximate an intractable target density by optimising over a tractable family of distributions. As the variational family cannot typically represent the target exactly, guarantees on the quality of the resulting approximation are crucial for understanding which of its properties VI can faithfully capture. Recent work has identified instances in which symmetries of the target and the variational family enable the recovery of certain statistics, even under model misspecification. However, these guarantees are inherently problem-specific and offer little insight into the fundamental mechanism by which symmetry forces statistic recovery. In this paper, we overcome this limitation by developing a general theory of symmetry-induced statistic recovery in variational inference. First, we characterise when variational minimisers inherit the symmetries of the target and establish conditions under which these pin down identifiable statistics. Second, we unify existing results by showing that previously known statistic recovery guarantees in location-scale families arise as special cases of our theory. Third, we apply our framework to distributions on the sphere to obtain novel guarantees for directional statistics in von Mises-Fisher families. Together, these results provide a modular blueprint for deriving new recovery guarantees for VI in a broad range of symmetry settings.
\We introduce the horospherical depth, an intrinsic notion of statistical depth on Hadamard manifolds, and define the Busemann median as the set of its maximizers. The construction exploits the fact that the linear functionals appearing in Tukey's half-space depth are themselves limits of renormalized distance functions; on a Hadamard manifold the same limiting procedure produces Busemann functions, whose sublevel sets are horoballs, the intrinsic replacements for halfspaces. The resulting depth is parametrized by the visual boundary, is isometry-equivariant, and requires neither tangent-space linearization nor a chosen base point.For arbitrary Hadamard manifolds, we prove that the depth regions are nested and geodesically convex, that a centerpoint of depth at least $1/(d+1)$ exists, and hence that the Busemann median exists for every Borel probability measure. Under strictly negative sectional curvature and mild regularity assumptions, the depth is strictly quasi-concave and the median is unique. We also establish robustness: the depth is stable under total-variation perturbations, and under contamination escaping to infinity the limiting median depends on the escape direction but not on how far the contaminating mass has moved along the geodesic ray, in contrast with the Fréchet mean. Finally, we establish uniform consistency of the sample depth and convergence of sample depth regions and sample Busemann medians; on symmetric spaces of noncompact type, the argument proceeds through a VC analysis of upper horospherical halfspaces, while on general Hadamard manifolds it follows from a compactness argument under a mild non-atomicity assumption.
Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.
This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a quantile-based approach for OPE using deep quantile process regression, presenting a novel algorithm called Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE). We provide new theoretical insights into the deep quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the rigorous sample complexity analysis for distributional OPE with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that DQPOPE achieves statistical advantages by estimating the full return distribution using the same sample size required to estimate a single policy value using conventional methods. Empirical studies further show that DQPOPE provides significantly more precise and robust policy value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.
Bayesian Deep Ensembles (BDEs) represent a powerful approach for uncertainty quantification in deep learning, combining the robustness of Deep Ensembles (DEs) with flexible multi-chain MCMC. While DEs are affordable in most deep learning settings, (long) sampling of Bayesian neural networks can be prohibitively costly. Yet, adding sampling after optimizing the DEs has been shown to yield significant improvements. This leaves a critical practical question: How long should the sequential sampling process continue to yield significant improvements over the initial optimized DE baseline? To tackle this question, we propose a stopping rule based on E-values. We formulate the ensemble construction as a sequential anytime-valid hypothesis test, providing a principled way to decide whether or not to reject the null hypothesis that MCMC offers no improvement over a strong baseline, to early stop the sampling. Empirically, we study this approach for diverse settings. Our results demonstrate the efficacy of our approach and reveal that only a fraction of the full-chain budget is often required.
The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction is often used to adjust the hyperparameters. However, it is not sensitive to these regularization parameters. Here, they are adjusted for the fields and couplings to satisfy a specific condition that is appropriate for protein conformations. This method has been applied to eight protein families.
Uncertainty quantification is crucial in safety-critical systems, where decisions must be made under uncertainty. In particular, we consider the problem of online uncertainty quantification, where data points arrive sequentially. Online conformal prediction is a principled online uncertainty quantification method that dynamically constructs a prediction set at each time step. While existing methods for online conformal prediction provide long-run coverage guarantees without any distributional assumptions, they typically assume a full feedback setting in which the true label is always observed. In this paper, we propose a novel learning method for online conformal prediction with partial feedback from an adaptive adversary-a more challenging setup where the true label is revealed only when it lies inside the constructed prediction set. Specifically, we formulate online conformal prediction as an adversarial bandit problem by treating each candidate prediction set as an arm. Building on an existing algorithm for adversarial bandits, our method achieves a long-run coverage guarantee by explicitly establishing its connection to the regret of the learner. Finally, we empirically demonstrate the effectiveness of our method in both independent and identically distributed (i.i.d.) and non-i.i.d. settings, showing that it successfully controls the miscoverage rate while maintaining a reasonable size of the prediction set.
Generative modeling within constrained sets is essential for scientific and engineering applications involving physical, geometric, or safety requirements (e.g., molecular generation, robotics). We present a unified framework for constrained diffusion models on generic nonconvex feasible sets $Σ$ that simultaneously enforces equality and inequality constraints throughout the diffusion process. Our framework incorporates both overdamped and underdamped dynamics for forward and backward sampling. A key algorithmic innovation is a computationally efficient landing mechanism that replaces costly and often ill-defined projections onto $Σ$, ensuring feasibility without iterative Newton solves or projection failures. By leveraging underdamped dynamics, we accelerate mixing toward the prior distribution, effectively alleviating the high simulation costs typically associated with constrained diffusion. Empirically, this approach reduces function evaluations and memory usage during both training and inference while preserving sample quality. On benchmarks featuring equality and mixed constraints, our method achieves comparable sample quality to state-of-the-art baselines while significantly reducing computational cost, providing a practical and scalable solution for diffusion on nonconvex feasible sets.
Predictions from machine learning algorithms can vary across random seeds, inducing instability in downstream debiased machine learning estimators. We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments confirm that the method achieves the targeted level of stability whereas alternatives do not. Our method incurs a small computational penalty relative to standard practice whereas alternative methods incur large penalties.
Simulation-based testing of autonomous driving systems (ADS) must uncover realistic and diverse failures in dense, heterogeneous traffic. However, existing search-based seeding methods (e.g., genetic algorithms) struggle in high-dimensional spaces, often collapsing to limited modes and missing many failure scenarios. We present PtoP, a framework that combines adaptive random seed generation with Stein Variational Gradient Descent (SVGD) to produce diverse, failure-inducing initial conditions. SVGD balances attraction toward high-risk regions and repulsion among particles, yielding risk-seeking yet well-distributed seeds across multiple failure modes. PtoP is plug-and-play and enhances existing online testing methods (e.g., reinforcement learning--based testers) by providing principled seeds. Evaluation in CARLA on two industry-grade ADS (Apollo, Autoware) and a native end-to-end system shows that PtoP improves safety violation rate (up to 27.68%), scenario diversity (9.6%), and map coverage (16.78%) over baselines.
While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchmark dataset for evaluating gender-aware generation in three typologically diverse grammatically gendered languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. We construct a high-quality synthetic dataset spanning these three languages and benchmark 15 popular multilingual LLMs (2B-70B) on their ability to perform this transformation. Our results reveal significant gaps and interesting insights into how current models handle morphological gender. MORPHOGEN provides a focused diagnostic lens for gender-aware language modeling and lays the groundwork for future research on inclusive and morphology-sensitive NLP.
Discovering optimal designs through sequential data collection is essential in many real-world applications. While Bayesian Optimization (BO) has achieved remarkable success in this setting, growing attention has recently turned to context-specific optimal design, formalized as Contextual Bayesian Optimization (CBO). Unlike BO, CBO is inherently more challenging as it must approximate an entire mapping from the context space to its corresponding optimal design, requiring simultaneous exploration across contexts and exploitation within each. In many modern applications, such tasks arise across multiple potentially heterogeneous but related clients, where collaboration can significantly improve learning efficiency. We propose CCBO, Collaborative Contextual Bayesian Optimization, a unified framework enabling multiple clients to jointly perform CBO with controllable contexts, supporting both online collaboration and offline initialization from peers' historical beliefs, with an optional privacy-preserving communication mechanism. We establish sublinear regret guarantees and demonstrate, through extensive simulations and a real-world hot rolling application, that CCBO achieves substantial improvements over existing approaches even under client heterogeneity. The code to reproduce the results can be found at https://github.com/cchihyu/Collaborative-Contextual-Bayesian-Optimization
A central challenge in program induction has long been the trade-off between symbolic and neural approaches. Symbolic methods offer compositional generalisation and data efficiency, yet their scalability is constrained by formalisms such as domain-specific languages (DSLs), which are labour-intensive to create and may not transfer to new domains. In contrast, neural networks flexibly learn from data but tend to generalise poorly in compositional and out-of-distribution settings. We bridge this divide with an instance of a Latent Adaptation Network architecture named Neural Language Interpreter (NLI), which learns its own discrete, symbolic-like programming language end-to-end. NLI autonomously discovers a vocabulary of primitive operations and uses a novel differentiable neural executor to interpret variable-length sequences of these primitives. This allows NLI to represent programs that are not bound to a constant number of computation steps, enabling it to solve more complex problems than those seen during training. To make these discrete, compositional program structures amenable to gradient-based optimisation, we employ the Gumbel-Softmax relaxation, enabling the entire model to be trained end-to-end. Crucially, this same differentiability enables powerful test-time adaptation. At inference, NLI's program inductor provides an initial program guess. This guess is then refined via gradient descent through the neural executor, enabling efficient search for the neural program that best explains the given data. We demonstrate that NLI outperforms in-context learning, test-time training, and continuous latent program networks on tasks that require combinatorial generalisation and rapid adaptation to unseen tasks. Our results establish a new path toward models that combine the compositionality of discrete languages with the gradient-based search and end-to-end learning of neural networks.
Harmful intent is geometrically recoverable from large language model residual streams: as a linear direction in most layers, and as angular deviation in layers where projection methods fail. Across 12 models spanning four architectural families (Qwen2.5, Qwen3.5, Llama-3.2, Gemma-3) and three alignment variants (base, instruction-tuned, abliterated), under single-turn, English evaluation, we characterise this geometry through six direction-finding strategies. Three succeed: a soft-AUC-optimised linear direction reaches mean AUROC 0.98 and TPR@1\%FPR 0.80; a class-mean probe reaches 0.98 and 0.71 at <1ms fitting cost; a supervised angular-deviation strategy reaches AUROC 0.96 and TPR of 0.61 along a representationally distinct direction ($73^\circ$ from projection-based solutions), uniquely sustaining detection in middle layers where projection methods collapse. Detection remains stable across alignment variants, including abliterated models from which refusal has been surgically removed: harmful intent and refusal behaviour are functionally dissociated features of the representation. A direction fitted on AdvBench transfers to held-out HarmBench and JailbreakBench with worst-case AUROC 0.96. The same picture holds at scale: across Qwen3.5 from 0.8B to 9B parameters, AUROC remains $\geq$0.98 and cross-variant transfer stays within 0.018 of own-direction performance This is consistent with a simple account: models acquire a linearly decodable representation of harmful intent as part of general language understanding, and alignment then shapes what they do with such inputs without reorganising the upstream recognition signal. As a practical consequence, AUROC in the 0.97+ regime can substantially overestimate operational detectability; TPR@$1\%$FPR should accompany AUROC in safety-adjacent evaluation.
We present a systematic empirical study of prompt engineering for formal mathematical reasoning in the context of the SAIR Equational Theories Stage 1 competition. The task requires deciding whether one equational law implies another over all magmas -- a problem that is undecidable in general but decidable for FALSE via finite model search. Over five weeks, we designed, tested, and analyzed more than 40 prompt variants, ranging from 0 to 4,878 bytes, across four evaluation splits and three language models (gpt-oss-120b, Llama 3.3 70B, Gemma 4 31B). Our central finding is a single-prompt ceiling: despite substantial engineering effort, balanced hard accuracy plateaus in an empirical saturation region of approximately 60--79% for gpt-oss-120b, compared to a 59.75% no-cheatsheet baseline. We identify three mechanisms underlying this ceiling: (1) the mathematical undecidability of the TRUE case limits what any finite prompt can encode; (2) complex rule systems decrease performance on weaker models (Llama 3.3 70B collapses to 0% TRUE recall with prompts exceeding 2KB); and (3) prompt ordering effects interact with model attention in fragile, non-monotonic ways. Our best submission (AN45c, 2,252 bytes) achieves 79.25% accuracy on hard3 (n=400; 95% CI: [75.0%, 82.9%]), with TRUE recall of 95.9% and FALSE recall of 63.4%, representing a +19.5 percentage-point improvement over the no-cheatsheet baseline (59.75%). We release all prompt variants, evaluation scripts, and results at https://github.com/israelcazares/sair-prompt-engineering
We present AC-SINDy, a compositional extension of the Sparse Identification of Nonlinear Dynamics (SINDy) framework that replaces explicit feature libraries with a structured representation based on arithmetic circuits. Rather than enumerating candidate basis functions, the proposed approach constructs nonlinear features through compositions of linear functions and multiplicative interactions, yielding a compact and scalable parameterization and enabling sparsity to be enforced directly over the computational graph. We also introduce a formulation that separates state estimation from dynamics identification by combining latent state inference with shared dynamics and multi-step supervision, improving robustness to noise while preserving interpretability. Experiments on nonlinear and chaotic systems demonstrate that the method recovers accurate and interpretable governing equations while scaling more favorably than standard SINDy.
The reasoning process of Graph Neural Networks is complex and considered opaque, limiting trust in their predictions. To alleviate this issue, prior work has proposed concept-based explanations, extracted from clusters in the model's node embeddings. However, a limitation of concept-based explanations is that they only explain the node embedding space and are obscured by pooling in graph classification. To mitigate this issue and provide a deeper level of understanding, we propose the Subgraph Concept Network. The Subgraph Concept Network is the first graph neural network architecture that distils subgraph and graph-level concepts. It achieves this by performing soft clustering on node concept embeddings to derive subgraph and graph-level concepts. Our results show that the Subgraph Concept Network allows to obtain competitive model accuracy, while discovering meaningful concepts at different levels of the network.
Vision-Language Models (VLMs) can perform zero-shot classification but are susceptible to adversarial attacks. While robust fine-tuning improves their robustness, existing approaches align fixed text embeddings with an image embedding, sacrificing natural performance and robustness. A robustness degradation also occurs when a model faces adversarial attacks targeting superclasses (parent classes, e.g., mammal) in addition to their base (leaf) classes (e.g., cat). Thus, to enhance adversarial robustness and leverage the inherent hierarchical properties of class space, we propose a novel adversarial fine-tuning framework based on hierarchical embeddings and several levels of adversarially robust alignment of image-text modalities. Additional mechanisms place visual embeddings at the desired depth of hierarchy, and we provide a theoretical connection between the depth of embedding in the hierarchy and the maximum viable margin size. Our model naturally realizes several margin sizes, boosting generalization of adversaries for robustification. As various trees with different parent labels can share the same leaf labels, we also consider aligning over multiple trees to boost semantic variety. Experiments across several datasets are performed.
Generalized Additive Models (GAMs) can be used to create non-linear glass-box (i.e. explicitly interpretable) models, where the predictive function is fully observable over the complete input space. However, glass-box interpretability itself does not allow for the incorporation of expert knowledge from the modeller. In this paper, we present ParamBoost, a novel GAM whose shape functions (i.e. mappings from individual input features to the output) are learnt using a Gradient Boosting algorithm that fits cubic polynomial functions at leaf nodes. ParamBoost incorporates several constraints commonly used in parametric analysis to ensure well-refined shape functions. These constraints include: (i) continuity of the shape functions and their derivatives (up to C2); (ii) monotonicity; (iii) convexity; (iv) feature interaction constraints; and (v) model specification constraints. Empirical results show that the unconstrained ParamBoost model consistently outperforms state-of-the-art GAMs across several real-world datasets. We further demonstrate that modellers can selectively impose required constraints at a modest trade-off in predictive performance, allowing the model to be fully tailored to application-specific interpretability and parametric-analysis requirements.
In continual learning, the primary challenge is to learn new information without forgetting old knowledge. A common solution addresses this trade-off through regularization, penalizing changes to parameters critical for previous tasks. In most cases, this regularization term is directly added to the training loss and optimized with standard gradient descent, which blends learning and retention signals into a single update and does not explicitly separate essential parameters from redundant ones. As task sequences grow, this coupling can over-constrain the model, limiting forward transfer and leading to inefficient use of capacity. We propose a different approach that separates task learning from stability enforcement via operator splitting. The learning step focuses on minimizing the current task loss, while a proximal stability step applies a sparse regularizer to prune unnecessary parameters and preserve task-relevant ones. This turns the stability-plasticity into a negotiated update between two complementary operators, rather than a conflicting gradient. We provide theoretical justification for the splitting method on the continual-learning objective, and demonstrate that our proposed solver achieves state-of-the-art results on standard benchmarks, improving both stability and adaptability without the need for replay buffers, Bayesian sampling, or meta-learning components.
Barren-plateau results have established exponential gradient suppression as a widely cited obstacle to the scalability of variational quantum algorithms. When and whether these results extend to a given objective has been addressed through loss-specific arguments, but a general structural characterization has remained open. We show that the objective itself admits a fixed-observable representation if and only if the loss is affine in the measured statistics, thereby identifying the exact boundary of the standard concentration-based proof template. Existing transfer results for non-affine losses achieve this reduction under additional assumptions; our characterization implies that such a reduction is not structurally available for a class of non-affine objectives, placing them outside the automatic reach of the existing proof template. Beyond the affine regime, a chain-rule decomposition reveals three governing factors -- model responsivity, loss-side signal, and transmittance -- and induces a loss-class dichotomy: bounded-gradient losses inherit suppression, while amplification-capable losses can in principle counteract it. In the exponentially wide setting, both classes fail, but for different structural reasons. When the interface is instead designed at polynomial width -- exposing coarse-grained statistics rather than individual bitstring probabilities -- the exponential-dimensional obstruction is relaxed and the dichotomy plays a genuine role. In a numerical demonstration on a charge-conserving quantum system, the amplification-capable objective produces resolved gradients several orders of magnitude larger than affine and inheriting baselines at comparable shot budgets. Over the tested interval, its scaling trend is statistically distinguished from the exponential trend of both alternatives. The boundary is affine; what lies beyond it is a representation-design problem.
Looped transformers scale computational depth without increasing parameter count by repeatedly applying a shared transformer block and can be used for iterative refinement, where each loop rewrites a full fixed-size prediction in parallel. On difficult problems, such as those that require search-like computation, reaching a highly structured solution starting from noise can require long refinement trajectories. Learning such trajectories is challenging when training specifies only the target solution and provides no supervision over the intermediate refinement path. Diffusion models tackle this issue by corrupting data with varying magnitudes of noise and training the model to reverse it in a \textit{single step}. However, this process misaligns training and testing behaviour. We introduce Denoising Recursion Models, a method that similarly corrupts data with noise but trains the model to reverse the corruption over \textit{multiple} recursive steps. This strategy provides a tractable curriculum of intermediate states, while better aligning training with testing and incentivizing non-greedy, forward-looking generation. Through extensive experiments, we show this approach outperforms the Tiny Recursion Model (TRM) on ARC-AGI, where it recently achieved breakthrough performance.
Quantum kernel methods have been proposed as a promising approach for leveraging near-term quantum computers for supervised learning, yet rigorous benchmarks against strong classical baselines remain scarce. We present a comprehensive empirical study of quantum kernel support vector machines (QSVMs) across nine binary classification datasets, four quantum feature maps, three classical kernels, and multiple noise models, totalling 970 experiments with strict nested cross-validation. Our analysis spans four phases: (i) statistical significance testing, revealing that none of 29 pairwise quantum-classical comparisons reach significance at $α= 0.05$; (ii) learning curve analysis over six training fractions, showing steeper quantum slopes on six of eight datasets that nonetheless fail to close the gap to the best classical baseline; (iii) hardware validation on IBM ibm_fez (Heron r2), demonstrating kernel fidelity $r \geq 0.976$ across six experiments; and (iv) seed sensitivity analysis confirming reproducibility (mean CV 1.4%). A Kruskal-Wallis factorial analysis reveals that dataset choice dominates performance variance ($\varepsilon^2 = 0.73$), while kernel type accounts for only 9%. Spectral analysis offers a mechanistic explanation: current quantum feature maps produce eigenspectra that are either too flat or too concentrated, missing the intermediate profile of the best classical kernel, the radial basis function (RBF). Quantum kernel training (QKT) via kernel-target alignment yields the single competitive result -- balanced accuracy 0.968 on breast cancer -- but with ~2,000x computational overhead. Our findings provide actionable guidelines for quantum kernel research. The complete benchmark suite is publicly available to facilitate reproduction and extension.
We propose a scalable, multifactorial experimental framework that systematically probes LLM sensitivity to subtle semantic changes in pairwise document comparison. We analogize this as a needle-in-a-haystack problem: a single semantically altered sentence (the needle) is embedded within surrounding context (the hay), and we vary the perturbation type (negation, conjunction swap, named entity replacement), context type (original vs. topically unrelated), needle position, and document length across all combinations, testing five LLMs on tens of thousands of document pairs. Our analysis reveals several striking findings. First, LLMs exhibit a within-document positional bias distinct from previously studied candidate-order effects: most models penalize semantic differences more harshly when they occur earlier in a document. Second, when the altered sentence is surrounded by topically unrelated context, it systematically lowers similarity scores and induces bipolarized scores that indicate either very low or very high similarity. This is consistent with an interpretive frame account in which topically-related context may allow models to contextualize and downweight the alterations. Third, each LLM produces a qualitatively distinct scoring distribution, a stable "fingerprint" that is invariant to perturbation type, yet all models share a universal hierarchy in how leniently they treat different perturbation types. Together, these results demonstrate that LLM semantic similarity scores are sensitive to document structure, context coherence, and model identity in ways that go beyond the semantic change itself, and that the proposed framework offers a practical, LLM-agnostic toolkit for auditing and comparing scoring behavior across current and future models.
Artificial Intelligence (AI) surrogate models provide a computationally efficient alternative to full-physics simulations, but no public datasets currently exist for training and validating models of high-explosive-driven, multi-material shock dynamics. Simulating shock propagation is challenging due to the need for material-specific equations of state (EOS) and models of plasticity, phase change, damage, fluid instabilities, and multi-material interactions. Explosive-driven shocks further require reactive material models to capture detonation physics. To address this gap, we introduce the High-Explosives and Affected Targets (HEAT) dataset, a physics-rich collection of two-dimensional, cylindrically symmetric simulations generated using an Eulerian multi-material shock-propagation code developed at Los Alamos National Laboratory. HEAT consists of two partitions: expanding shock-cylinder (CYL) simulations and Perturbed Layered Interface (PLI) simulations. Each entry includes time series of thermodynamic fields (pressure, density, temperature), kinematic fields (position, velocity), and continuum quantities such as stress. The CYL partition spans a range of materials, including metals (aluminum, copper, depleted uranium, stainless steel, tantalum), a polymer, water, gases (air, nitrogen), and a detonating material. The PLI partition explores varied geometries with fixed materials: copper, aluminum, stainless steel, polymer, and high explosive. HEAT captures key phenomena such as shock propagation, momentum transfer, plastic deformation, and thermal effects, providing a benchmark dataset for AI/ML models of multi-material shock physics.
Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. To address these issues, we propose a framework for structured sparse nonnegative low-rank factorization with detection probability estimation. We impose nonconvex $\ell_{1/2}$ regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. The resulting optimization problem is nonconvex and nonsmooth. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines.
Principal Component Analysis (PCA) is a fundamental tool for representation learning, but its global linear formulation fails to capture the structure of data supported on curved manifolds. In contrast, manifold learning methods model nonlinearity but often sacrifice the spectral structure and stability of PCA. We propose \emph{Geodesic Tangent Space Aggregation PCA (GTSA-PCA)}, a geometric extension of PCA that integrates curvature awareness and geodesic consistency within a unified spectral framework. Our approach replaces the global covariance operator with curvature-weighted local covariance operators defined over a $k$-nearest neighbor graph, yielding local tangent subspaces that adapt to the manifold while suppressing high-curvature distortions. We then introduce a geodesic alignment operator that combines intrinsic graph distances with subspace affinities to globally synchronize these local representations. The resulting operator admits a spectral decomposition whose leading components define a geometry-aware embedding. We further incorporate semi-supervised information to guide the alignment, improving discriminative structure with minimal supervision. Experiments on real datasets show consistent improvements over PCA, Kernel PCA, Supervised PCA and strong graph-based baselines such as UMAP, particularly in small sample size and high-curvature regimes. Our results position GTSA-PCA as a principled bridge between statistical and geometric approaches to dimensionality reduction.
Despite the perceived success of large-scale dataset distillation (DD) methods, recent evidence finds that simple random image baselines perform on-par with state-of-theart DD methods like SRe2L due to the use of soft labels during downstream model training. This is in contrast with the findings in coreset literature, where high-quality coresets consistently outperform random subsets in the hardlabel (HL) setting. To understand this discrepancy, we perform a detailed scalability analysis to examine the role of data quality under different label regimes, ranging from abundant soft labels (termed as SL+KD regime) to fixed soft labels (SL) and hard labels (HL). Our analysis reveals that high-quality coresets fail to convincingly outperform the random baseline in both SL and SL+KD regimes. In the SL+KD setting, performance further approaches nearoptimal levels relative to the full dataset, regardless of subset size or quality, for a given compute budget. This performance saturation calls into question the widespread practice of using soft labels for model evaluation, where unlike the HL setting, subset quality has negligible influence. A subsequent systematic evaluation of five large-scale and four small-scale DD methods in the HL setting reveals that only RDED reliably outperforms random baselines on ImageNet-1K, but can still lag behind strong coreset methods due to its over-reliance on easy sample patches. Based on this, we introduce CAD-Prune, a compute-aware pruning metric that efficiently identifies samples of optimal difficulty for a given compute budget, and use it to develop CA2D, a compute-aligned DD method, outperforming current DD methods on ImageNet-1K at various IPC settings. Together, our findings uncover many insights into current DD research and establish useful tools to advance dataefficient learning for both coresets and DD.
3D-IC netlist partitioning is commonly optimized using proxy objectives, while final PPA is treated as a costly evaluation rather than an optimization signal. This proxy-driven paradigm makes it difficult to reliably translate additional PPA evaluations into better PPA outcomes. To bridge this gap, we present DOPP (D-Optimal PPA-driven partitioning selection), an approach that bridges the gap between proxies and true PPA metrics. Across eight 3D-IC designs, our framework improves PPA over Open3DBench (average relative improvements of 9.99% congestion, 7.87% routed wirelength, 7.75% WNS, 21.85% TNS, and 1.18% power). Compared with exhaustive evaluation over the full candidate set, DOPP achieves comparable best-found PPA while evaluating only a small fraction of candidates, substantially reducing evaluation cost. By parallelizing evaluations, our method delivers these gains while maintaining wall-clock runtime comparable to traditional baselines.
Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to the epistemic norms that make scientific inquiry self-correcting is poorly understood. Here, we evaluate LLM-based scientific agents across eight domains, spanning workflow execution to hypothesis-driven inquiry, through more than 25,000 agent runs and two complementary lenses: (i) a systematic performance analysis that decomposes the contributions of the base model and the agent scaffold, and (ii) a behavioral analysis of the epistemological structure of agent reasoning. We observe that the base model is the primary determinant of both performance and behavior, accounting for 41.4% of explained variance versus 1.5% for the scaffold. Across all configurations, evidence is ignored in 68% of traces, refutation-driven belief revision occurs in 26%, and convergent multi-test evidence is rare. The same reasoning pattern appears whether the agent executes a computational workflow or conducts hypothesis-driven inquiry. They persist even when agents receive near-complete successful reasoning trajectories as context, and the resulting unreliability compounds across repeated trials in epistemically demanding domains. Thus, current LLM-based agents execute scientific workflows but do not exhibit the epistemic patterns that characterize scientific reasoning. Outcome-based evaluation cannot detect these failures, and scaffold engineering alone cannot repair them. Until reasoning itself becomes a training target, the scientific knowledge produced by such agents cannot be justified by the process that generated it.
Lossy compression is widely used to reduce storage and I/O costs for large-scale particle datasets in scientific applications such as cosmology, molecular dynamics, and fluid dynamics, where clustering structures (e.g., single-linkage or Friends-of-Friends) are critical for downstream analysis; however, existing compressors typically provide only pointwise error bounds on particle positions and offer no guarantees on preserving clustering outcomes, and even small perturbations can alter cluster connectivity and compromise scientific validity. We propose a correction-based technique to preserve single-linkage clustering under lossy compression, operating on decompressed data from off-the-shelf compressors such as SZ3 and Draco. Our key contributions are threefold: (1) a clustering-aware correction algorithm that identifies vulnerable particle pairs via spatial partitioning and local neighborhood search; (2) an optimization-based formulation that enforces clustering consistency using projected gradient descent with a loss that encodes pairwise distance violations; and (3) a scalable GPU-accelerated and distributed implementation for large-scale datasets. Experiments on cosmology and molecular dynamics datasets show that our method effectively preserves clustering results while maintaining competitive compression performance compared with SZ3, ZFP, Draco, LCP, and space-filling-curve-based schemes.
We study online learning for new products on a platform that makes capacity-constrained assortment decisions on which products to offer. For a newly listed product, its quality is initially unknown, and quality information propagates through social learning: when a customer purchases a new product and leaves a review, its quality is revealed to both the platform and future customers. Since reviews require purchases, the platform must feature new products in the assortment ("explore") to generate reviews to learn about new products. Such exploration is costly because customer demand for new products is lower than for incumbent products. We characterize the optimal assortments for exploration to minimize regret, addressing two questions. (1) Should the platform offer a new product alone or alongside incumbent products? The former maximizes the purchase probability of the new product but yields lower short-term revenue. Despite the lower purchase probability, we show it is always optimal to pair the new product with the top incumbent products. (2) With multiple new products, should the platform explore them simultaneously or one at a time? We show that the optimal number of new products to explore simultaneously has a simple threshold structure: it increases with the "potential" of the new products and, surprisingly, does not depend on their individual purchase probabilities. We also show that two canonical bandit algorithms, UCB and Thompson Sampling, both fail in this setting for opposite reasons: UCB over-explores while Thompson Sampling under-explores. Our results provide structural insights on how platforms should learn about new products through assortment decisions.
Vision-Language-Action (VLA) models fail systematically on long-horizon manipulation tasks despite strong short-horizon performance. We show that this failure is not resolved by extending context length alone in the current reactive execution setting; instead, it stems from three recurring execution-loop deficiencies: the memory gap, the verification gap, and the recovery gap. We present HELM, a model-agnostic framework that addresses these deficiencies with three components: an Episodic Memory Module (EMM) that retrieves key task history via CLIP-indexed keyframes, a learned State Verifier (SV) that predicts action failure before execution from observation, action, subgoal, and memory-conditioned context, and a Harness Controller (HC) that performs rollback and replanning. The SV is the core learning contribution: it consistently outperforms rule-based feasibility checks and ensemble uncertainty baselines, and its effectiveness depends critically on access to episodic memory. On LIBERO-LONG, HELM improves task success rate by 23.1 percentage points over OpenVLA (58.4% to 81.5%), while extending the context window to H=32 yields only a 5.4-point gain and same-budget LoRA adaptation remains 12.2 points below HELM. HELM also improves long-horizon performance on CALVIN and substantially boosts recovery success under controlled perturbations. Ablations and mechanism analyses isolate the contribution of each component, and we release LIBERO-Recovery as a perturbation-injection protocol for evaluating failure recovery in long-horizon manipulation.
Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches primarily target policy-level weaknesses, they overlook what we term systemic weaknesses cases where both the core LLM and the RM fail in tandem. We present ARES, a framework that systematically discovers and mitigates such dual vulnerabilities. ARES employs a ``Safety Mentor'' that dynamically composes semantically coherent adversarial prompts by combining structured component types (topics, personas, tactics, goals) and generates corresponding malicious and safe responses. This dual-targeting approach exposes weaknesses in both the core LLM and the RM simultaneously. Using the vulnerabilities gained, ARES implements a two-stage repair process: first fine-tuning the RM to better detect harmful content, then leveraging the improved RM to optimize the core model. Experiments across multiple adversarial safety benchmarks demonstrate that ARES substantially enhances safety robustness while preserving model capabilities, establishing a new paradigm for comprehensive RLHF safety alignment.
Apple Neural Engine (ANE) is a dedicated neural processing unit (NPU) present in every Apple Silicon chip. Mixture-of-Experts (MoE) LLMs improve inference efficiency via sparse activation but are challenging for NPUs in three ways: expert routing is unpredictable and introduces dynamic tensor shapes that conflict with the shape-specific constraints of NPUs; several irregular operators, e.g., top-k, scatter/gather, etc., are not NPU-friendly; and launching many small expert kernels incurs substantial dispatch and synchronization overhead. NPUs are designed to offload AI compute from CPU and GPU; our goal is to enable such offloading for MoE inference, particularly during prefill, where long-context workloads consume substantial system resources. This paper presents NPUMoE, a runtime inference engine that accelerates MoE execution on Apple Silicon by offloading dense, static computation to NPU, while preserving a CPU/GPU fallback path for dynamic operations. NPUMoE uses offline calibration to estimate expert capacity and popularity that drives three key techniques: (1) Static tiers for expert capacity to address dynamic expert routing (2) Grouped expert execution to mitigate NPU concurrency limits (3) Load-aware expert compute graph residency to reduce CPU-NPU synchronization overhead. Experiments on Apple M-series devices using three representative MoE LLMs and four long-context workloads show that NPUMoE consistently outperforms baselines, reducing latency by 1.32x-5.55x, improving energy efficiency by 1.81x-7.37x, and reducing CPU-cycle usage by 1.78x-5.54x through effective NPU offloading.
Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize a large edge potential tensor whose size grows with sequence length, maximum segment length, and label count, becoming prohibitive for speech-scale state spaces and intractable at genomic scales where sequences can exceed 100,000 positions. This memory bottleneck has limited the adoption of exact segment-level inference for long sequences and large label sets. We identify that the core inefficiency is materializing edge potentials that can instead be evaluated on-the-fly from a compact prefix-sum array, and make several improvements. First, replacing the stored edge tensor with prefix-sum lookup reduces the memory footprint by a factor proportional to the product of segment length and label count. Second, a streaming forward-backward pass with checkpoint-boundary normalization keeps working memory sublinear in sequence length while preserving exact gradients. Third, zero-centered cumulative scores control numerical drift and induce an adaptive duration prior under label imbalance. We integrate these ideas into Flash-SemiCRF, a fused Triton kernel that enables exact semi-CRF inference on previously intractable problem sizes. Available at https://github.com/biobenkj/flash-semicrf.
Detecting jailbreak behaviour in large language models remains challenging, particularly when strongly aligned models produce harmful outputs only rarely. In this work, we present an empirical study of output based jailbreak detection under realistic conditions using the JailbreakBench Behaviors dataset and multiple generator models with varying alignment strengths. We evaluate both a lexical TF-IDF detector and a generation inconsistency based detector across different sampling budgets. Our results show that single output evaluation systematically underestimates jailbreak vulnerability, as increasing the number of sampled generations reveals additional harmful behaviour. The most significant improvements occur when moving from a single generation to moderate sampling, while larger sampling budgets yield diminishing returns. Cross generator experiments demonstrate that detection signals partially generalise across models, with stronger transfer observed within related model families. A category level analysis further reveals that lexical detectors capture a mixture of behavioural signals and topic specific cues, rather than purely harmful behaviour. Overall, our findings suggest that moderate multi sample auditing provides a more reliable and practical approach for estimating model vulnerability and improving jailbreak detection in large language models. Code will be released.
Fault detection and diagnosis are critical for the optimal and safe operation of industrial processes. The correlations among sensors often display non-Euclidean structures where graph neural networks (GNNs) are widely used therein. However, for large-scale systems, local, global, and dynamic relations extensively exist among sensors, and traditional GNNs often overlook such complex and multi-level structures for various problems including the fault diagnosis. To address this issue, we propose a structure-aware multi-level temporal graph network with local-global feature fusion for industrial fault diagnosis. First, a correlation graph is dynamically constructed using Pearson correlation coefficients to capture relationships among process variables. Then, temporal features are extracted through long short-term memory (LSTM)-based encoder, whereas the spatial dependencies among sensors are learned by graph convolution layers. A multi-level pooling mechanism is used to gradually coarsen and learn meaningful graph structures, to capture higher-level patterns while keeping important fault related details. Finally, a fusion step is applied to combine both detailed local features and overall global patterns before the final prediction. Experimental evaluations on the Tennessee Eastman process (TEP) demonstrate that the proposed model achieves superior fault diagnosis performance, particularly for complex fault scenarios, outperforming various baseline methods.
Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse Autoencoders (SAEs) are widely used for interpretability, their robustness implications remain underexplored. We present a study of integrating pretrained SAEs into transformer residual streams at inference time, without modifying model weights or blocking gradients. Across four model families (Gemma, LLaMA, Mistral, Qwen) and two strong white-box attacks (GCG, BEAST) plus three black-box benchmarks, SAE-augmented models achieve up to a 5x reduction in jailbreak success rate relative to the undefended baseline and reduce cross-model attack transferability. Parametric ablations reveal (i) a monotonic dose-response relationship between L0 sparsity and attack success rate, and (ii) a layer-dependent defense-utility tradeoff, where intermediate layers balance robustness and clean performance. These findings are consistent with a representational bottleneck hypothesis: sparse projection reshapes the optimization geometry exploited by jailbreak attacks.
An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as an autoregressive sequence modeling task, utilizing causal decoders from large language models (LLMs) to model a patient's multimodal trajectory. We first introduce a missingness-aware contrastive pre-training objective that integrates multiple modalities in datasets with missingness in a shared latent space. We then show that autoregressive sequence modeling with transformer-based architectures outperforms baselines on the MIMIC-IV and eICU fine-tuning benchmarks. Finally, we use interpretability techniques to move beyond performance boosts and find that across various patient stays, removing modalities leads to divergent behavior that our contrastive pre-training mitigates. By abstracting clinical diagnosis as sequence modeling and interpreting patient stay trajectories, we develop a framework to profile and handle missing modalities while addressing the canonical desideratum of safe, transparent clinical AI.
Nonlinear machine-learning models are increasingly used to discover causal relationships in time-series data, yet the interpretation of their outputs remains poorly understood. In particular, causal scores produced by regularized neural autoregressive models are often treated as analogues of regression coefficients, leading to misleading claims of statistical significance. In this paper, we argue that causal relevance in nonlinear time-series models should be evaluated through forecast necessity rather than coefficient magnitude, and we present a practical evaluation procedure for doing so. We present an interpretable evaluation framework based on systematic edge ablation and forecast comparison, which tests whether a candidate causal relationship is required for accurate prediction. Using Neural Additive Vector Autoregression as a case study model, we apply this framework to a real-world case study of democratic development, modeled as a multivariate time series of panel data - democracy indicators across 139 countries. We show that relationships with similar causal scores can differ dramatically in their predictive necessity due to redundancy, temporal persistence, and regime-specific effects. Our results demonstrate how forecast-necessity testing supports more reliable causal reasoning in applied AI systems and provides practical guidance for interpreting nonlinear time-series models in high-stakes domains.
Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.
Many neural network (NN) verification systems represent the network's input-output relation as a constraint program. Sound and complete, representations involve integer constraints, for simulating the activations. Recent works convexly relax the integer constraints, improving performance, at the cost of soundness. Convex relaxations consider outputs that are unreachable by the original network. We study the worst case divergence between the original network and its convex relaxations; both qualitatively and quantitatively. The relaxations' space forms a lattice, where the top element corresponds to a full relaxation, with every neuron linearized. The bottom element corresponds to the original network. We provide analytical upper and lower bounds for the $\ell_\infty$-distance between the fully relaxed and original outputs. This distance grows exponentially, w.r.t. the network's depth, and linearly w.r.t. the input's radius. The misclassification probability exhibits a step-like behavior, w.r.t. input radius. Our results are supported by experiments on MNIST, Fashion MNIST and random networks.
Today, machine learning is widely applied in sensitive, security-related, and financially lucrative applications. Model extraction attacks undermine current business models where a model owner sells model access, e.g., via MLaaS APIs. Additionally, stolen models can enable powerful white-box attacks, facilitating privacy attacks on sensitive training data, and model evasion. In this paper, we focus on Decision Trees (DT), which are widely deployed in practice. Existing black-box extraction attacks for DTs are either query-intensive, make strong assumptions about the DT structure, or rely on rich API information. To limit attacks to the black-box setting, CPU vendors introduced Trusted Execution Environments (TEE) that use hardware-mechanisms to isolate workloads from external parties, e.g., MLaaS providers. We introduce TrEEStealer, a high-fidelity extraction attack for stealing TEE-protected DTs. TrEEStealer exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. We use two primitives to acquire CFI: for AMD SEV, we follow previous work using the SEV-Step framework and performance counters. For Intel SGX, we reproduce prior findings on current Xeon 6 CPUs and construct a new primitive to efficiently extract the branch history of inference runs through the Branch-History-Register. We found corresponding vulnerabilities in three popular libraries: OpenCV, mlpack, and emlearn. We show that TrEEStealer achieves superior efficiency and extraction fidelity compared to prior attacks. Our work establishes a new state-of-the-art for DT extraction and confirms that TEEs fail to protect against control-flow leakage.
Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.
Indistinguishability properties such as differential privacy bounds or low empirically measured membership inference are widely treated as proxies to show a model is sufficiently protected against broader memorization risks. However, we show that indistinguishability properties are neither sufficient nor necessary for preventing data extraction in LLM APIs. We formalize a privacy-game separation between extraction and indistinguishability-based privacy, showing that indistinguishability and inextractability are incomparable: upper-bounding distinguishability does not upper-bound extractability. To address this gap, we introduce $(l, b)$-inextractability as a definition that requires at least $2^b$ expected queries for any black-box adversary to induce the LLM API to emit a protected $l$-gram substring. We instantiate this via a worst-case extraction game and derive a rank-based extraction risk upper bound for targeted exact extraction, as well as extensions to cover untargeted and approximate extraction. The resulting estimator captures the extraction risk over multiple attack trials and prefix adaptations. We show that it can provide a tight and efficient estimation for standard greedy extraction and an upper bound on the probabilistic extraction risk given any decoding configuration. We empirically evaluate extractability across different models, clarifying its connection to distinguishability, demonstrating its advantage over existing extraction risk estimators, and providing actionable mitigation guidelines across model training, API access, and decoding configurations in LLM API deployment. Our code is publicly available at: https://github.com/Emory-AIMS/Inextractability.
Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and two decades of competitions, comprising 30,676 expert-authored problems with solutions across diverse domains. In addition to the core dataset, we construct a retrieval benchmark consisting of mathematically equivalent and structurally similar problem pairs curated by human experts. MathNet supports three tasks: (i) Problem Solving, (ii) Math-Aware Retrieval, and (iii) Retrieval-Augmented Problem Solving. Experimental results show that even state-of-the-art reasoning models (78.4% for Gemini-3.1-Pro and 69.3% for GPT-5) remain challenged, while embedding models struggle to retrieve equivalent problems. We further show that retrieval-augmented generation performance is highly sensitive to retrieval quality; for example, DeepSeek-V3.2-Speciale achieves gains of up to 12%, obtaining the highest scores on the benchmark. MathNet provides the largest high-quality Olympiad dataset together with the first benchmark for evaluating mathematical problem retrieval, and we publicly release both the dataset and benchmark at https://mathnet.mit.edu.
Modern sequence modeling is dominated by two families: Transformers, whose self-attention can access arbitrary elements of the visible sequence, and structured state-space models, which propagate information through an explicit recurrent state. These mechanisms face different limitations on long contexts: when attention is diffuse, the influence of individual tokens is diluted across the effective support, while recurrent state propagation can lose long-range sensitivity unless information is actively preserved. As a result, both mechanisms face challenges in preserving and selectively retrieving information over long contexts. We propose Sessa, a decoder that places attention inside a recurrent feedback path. This creates many attention-based paths through which past tokens can influence future states, rather than relying on a single attention read or a single recurrent chain. We prove that, under explicit assumptions and matched regimes, Sessa admits power-law memory tails $O(\ell^{-β})$ for $0 < β< 1$, with slower decay than in the corresponding Transformer and Mamba-style baselines. We further give an explicit construction that achieves this power-law rate. Under the same assumptions, Sessa is the only model class among those considered that realizes flexible selective retrieval, including profiles whose influence does not decay with distance. Consistent with this theoretical advantage, across matched experiments, Sessa achieves the strongest performance on long-context benchmarks while remaining competitive with Transformer and Mamba-style baselines on short-context language modeling.
Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in PPO. In this paper, we bridge this gap by introducing the Bounded Ratio Reinforcement Learning (BRRL) framework. We formulate a novel regularized and constrained policy optimization problem and derive its analytical optimal solution. We prove that this solution ensures monotonic performance improvement. To handle parameterized policy classes, we develop a policy optimization algorithm called Bounded Policy Optimization (BPO) that minimizes an advantage-weighted divergence between the policy and the analytic optimal solution from BRRL. We further establish a lower bound on the expected performance of the resulting policy in terms of the BPO loss function. Notably, our framework also provides a new theoretical lens to interpret the success of the PPO loss, and connects trust region policy optimization and the Cross-Entropy Method (CEM). We additionally extend BPO to Group-relative BPO (GBPO) for LLM fine-tuning. Empirical evaluations of BPO across MuJoCo, Atari, and complex IsaacLab environments (e.g., Humanoid locomotion), and of GBPO for LLM fine-tuning tasks, demonstrate that BPO and GBPO generally match or outperform PPO and GRPO in stability and final performance.
Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of supervision. We conduct a systematic empirical study across diverse model families and reasoning domains under three weak supervision settings: scarce data, noisy rewards, and self-supervised proxy rewards. We find that generalization is governed by training reward saturation dynamics: models that generalize exhibit a prolonged pre-saturation phase during which training reward and downstream performance climb together, while models that saturate rapidly memorize rather than learn. We identify reasoning faithfulness, defined as the extent to which intermediate steps logically support the final answer, as the pre-RL property that predicts which regime a model falls into, while output diversity alone is uninformative. Motivated by these findings, we disentangle the contributions of continual pre-training and supervised fine-tuning, finding that SFT on explicit reasoning traces is necessary for generalization under weak supervision, while continual pre-training on domain data amplifies the effect. Applied together to Llama3.2-3B-Base, these interventions enable generalization across all three settings where the base model previously failed.
The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If true, this has significant implications for whether modality choice matters at all. We show that the experimental evidence for this hypothesis is fragile and depends critically on the evaluation regime. Alignment is measured using mutual nearest neighbors on small datasets ($\approx$1K samples) and degrades substantially as the dataset is scaled to millions of samples. The alignment that remains between model representations reflects coarse semantic overlap rather than consistent fine-grained structure. Moreover, the evaluations in Huh et al. are done in a one-to-one image-caption setting, a constraint that breaks down in realistic many-to-many settings and further reduces alignment. We also find that the reported trend of stronger language models increasingly aligning with vision does not appear to hold for newer models. Overall, our findings suggest that the current evidence for cross-modal representational convergence is considerably weaker than subsequent works have taken it to be. Models trained on different modalities may learn equally rich representations of the world, just not the same one.
Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a major US hospital system, composed of 25 billion records from 7.2 million patients, representing 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space integrating over 100 thousand unique medical events in our clinical vocabulary as well as images and clinical text. This "atlas of medical concepts" forms a computational substrate for modeling entire patient care journeys comprised of sequences of structured and unstructured events, which are compressed by Apollo into virtual patient representations. To assess the potential of these whole-patient representations, we created 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. We demonstrate the generalized clinical forecasting potential of Apollo embeddings, including predicting new disease onset risk up to five years in advance (95 tasks), disease progression (78 tasks), treatment response (59 tasks), risk of treatment-related adverse events (17 tasks), and hospital operations endpoints (12 tasks). Using feature attribution techniques, we show that model predictions align with clinically-interpretable multimodal biomarkers. We evaluate semantic similarity search on 61 retrieval tasks, and moreover demonstrate the potential of Apollo as a multimodal medical search engine using text and image queries. Together, these modeling capabilities establish the foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning.
In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the label is not queried, the prediction from a machine learning model is used instead. Prior work proposed an elegant scheme that determines the query probability by combining an uncertainty-based suggestion with a constant probability that encodes a soft constraint on the query probability. We explored different values of the mixing parameter and observed an intriguing empirical pattern: the smallest confidence width tends to occur when the weight on the constant probability is close to one, thereby reducing the influence of the uncertainty-based component. Motivated by this observation, we develop a non-asymptotic analysis of the estimator and establish a data-dependent bound on its confidence interval. Our analysis further suggests that when a no-regret learning approach is used to determine the query probability and control this bound, the query probability converges to the constraint of the max value of the query probability when it is chosen obliviously to the current covariates. We also conduct simulations that corroborate these theoretical findings.
Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit, detect abrupt directional reversals (phase shifts) via a cosine-similarity $+$ entropy dual gate, and respond by rolling back the KV-cache and injecting a pre-computed steering vector. No fine-tuning, gradient computation, or additional forward passes are required. LPSR achieves $\mathbf{44.0\%}$ on MATH-500 with an 8B model versus $28.8\%$ for standard AR ($+15.2$ pp; McNemar $χ^2 = 66.96$, $p < 10^{-15}$). Critically, prompted self-correction, the most natural inference-time baseline, scores only $19.8\%$, below standard AR; LPSR exceeds it by $+24.2$ pp ($χ^2 = 89.4$, $p \approx 0$). LPSR also outperforms Best-of-16 ($+7.8$ pp) at $5.4\times$ lower token cost, and surpasses a standard 70B model ($35.2\%$) with $8.75\times$ fewer parameters at ${\sim}3\times$ the token budget. A 32-layer sweep reveals a novel \textbf{detection-correction dissociation}: error-detection AUC peaks at layer~14 ($0.718$) but task accuracy peaks at layer~16 ($44.0\%$ vs.\ $29.2\%$), demonstrating that optimal monitoring depth differs for detection and correction.
We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics AI assistance: the \textbf{CLD Leaderboard} (53 tests, structured causal loop diagram extraction) and the \textbf{Discussion Leaderboard} (interactive model discussion, feedback explanation, and model building coaching). On CLD extraction, cloud models achieve 77--89\% overall pass rates; the best local model reaches 77\% (Kimi~K2.5~GGUF~Q3, zero-shot engine), matching mid-tier cloud performance. On Discussion, the best local models achieve 50--100\% on model building steps and 47--75\% on feedback explanation, but only 0--50\% on error fixing -- a category dominated by long-context prompts that expose memory limits in local deployments. A central contribution of this paper is a systematic analysis of \textit{model type effects} on performance: we compare reasoning vs.\ instruction-tuned architectures, GGUF (llama.cpp) vs.\ MLX (mlx\_lm) backends, and quantization levels (Q3 / Q4\_K\_M / MLX-3bit / MLX-4bit / MLX-6bit) across the same underlying model families. We find that backend choice has larger practical impact than quantization level: mlx\_lm does not enforce JSON schema constraints, requiring explicit prompt-level JSON instructions, while llama.cpp grammar-constrained sampling handles JSON reliably but causes indefinite generation on long-context prompts for dense models. We document the full parameter sweep ($t$, $p$, $k$) for all local models, cleaned timing data (stuck requests excluded), and a practitioner guide for running 671B--123B parameter models on Apple~Silicon.
Models from the AlphaFold (AF) family reliably predict one dominant conformation for most well-ordered proteins but struggle to capture biologically relevant alternate states. Several efforts have focused on eliciting greater conformational variability through ad hoc inference-time perturbations of AF models or their inputs. Despite their progress, these approaches remain inefficient and fail to consistently recover major conformational modes. Here, we investigate both the optimal location and manner-of-operation for perturbing latent representations in the AF3 architecture. We distill our findings in ConforNets: channel-wise affine transforms of the pre-Pairformer pair latents. Unlike previous methods, ConforNets globally modulate AF3 representations, making them reusable across proteins. On unsupervised generation of alternate states, ConforNets achieve state-of-the-art success rates on all existing multi-state benchmarks. On the novel supervised task of conformational transfer, ConforNets trained on one source protein can induce a conserved conformational change across a protein family. Collectively, these results introduce a mechanism for conformational control in AF3-based models.
Weight quantization has become a standard tool for efficient LLM deployment, especially for local inference, where models are now routinely served at 2-3 bits per parameter. The state of the art is currently split into two sets of methods: simple scalar quantization techniques, such as GPTQ or AWQ, which are widely deployed but plateau in accuracy at 3-4 bits per parameter (bpp), and "second-generation" vector- or trellis-quantized methods, such as QTIP, GPTVQ and AQLM, which push the accuracy frontier at low bit-widths but are notoriously hard to implement and to scale, and have gained relatively less traction. In this paper, we ask whether this gap is fundamental, or whether a carefully optimized scalar quantizer can recover most of it. We answer in the affirmative, by introducing GSQ (Gumbel-Softmax Quantization), a post-training scalar quantization method which jointly learns the per-coordinate grid assignments and the per-group scales using a Gumbel-Softmax relaxation of the discrete grid. GSQ matches the cardinality of the relaxation to the small number of levels available in the target bit-width regime (e.g., 3-8 levels for ternary and 3 bpp, respectively), making the relaxation tight and the optimization tractable. Practically, on the standard Llama-3.1-8B/70B-Instruct models, GSQ closes most of the gap between scalar quantization and the QTIP frontier at 2 and 3 bits, while using a symmetric scalar grid with group-wise quantization, and thus fully compatible with existing scalar inference kernels. We further show that GSQ scales to trillion-scale Mixture-of-Experts models such as Kimi-K2.5, where vector-quantized methods are difficult to apply.
This note clarifies the relationship between the recent TurboQuant work and the earlier DRIVE (NeurIPS 2021) and EDEN (ICML 2022) schemes. DRIVE is a 1-bit quantizer that EDEN extended to any $b>0$ bits per coordinate; we refer to them collectively as EDEN. First, TurboQuant$_{\text{mse}}$ is a special case of EDEN obtained by fixing EDEN's scalar scale parameter to $S=1$. EDEN supports both biased and unbiased quantization, each optimized by a different $S$ (chosen via methods described in the EDEN works). The fixed choice $S=1$ used by TurboQuant is generally suboptimal, although the optimal $S$ for biased EDEN converges to $1$ as the dimension grows; accordingly TurboQuant$_{\text{mse}}$ approaches EDEN's behavior for large $d$. Second, TurboQuant$_{\text{prod}}$ combines a biased $(b-1)$-bit EDEN step with an unbiased 1-bit QJL quantization of the residual. It is suboptimal in three ways: (1) its $(b-1)$-bit step uses the suboptimal $S=1$; (2) its 1-bit unbiased residual quantization has worse MSE than (unbiased) 1-bit EDEN; (3) chaining a biased $(b-1)$-bit step with a 1-bit unbiased residual step is inferior to unbiasedly quantizing the input directly with $b$-bit EDEN. Third, some of the analysis in the TurboQuant work mirrors that of the EDEN works: both exploit the connection between random rotations and the shifted Beta distribution, use the Lloyd-Max algorithm, and note that Randomized Hadamard Transforms can replace uniform random rotations. Experiments support these claims: biased EDEN (with optimized $S$) is more accurate than TurboQuant$_{\text{mse}}$, and unbiased EDEN is markedly more accurate than TurboQuant$_{\text{prod}}$, often by more than a bit (e.g., 2-bit EDEN beats 3-bit TurboQuant$_{\text{prod}}$). We also repeat all accuracy experiments from the TurboQuant paper, showing that EDEN outperforms it in every setup we have tried.
Physics-informed neural networks (PINNs) provide a powerful framework for learning governing equations of dynamical systems from data. Biologically-informed neural networks (BINNs) are a variant of PINNs that preserve the known differential operator structure (e.g., reaction-diffusion) while learning constitutive terms via trainable neural subnetworks, enforced through soft residual penalties. Existing BINN studies are limited to $1\mathrm{D}{+}t$ reaction-diffusion systems and focus on forward prediction, using the governing partial differential equation as a regulariser rather than an explicit identification target. Here, we extend BINNs to $2\mathrm{D}{+}t$ systems within a PINN framework that combines data preprocessing, BINN-based equation learning, and symbolic regression post-processing for closed-form equation discovery. We demonstrate the framework's real-world applicability by learning the governing equations of lung cancer cell population dynamics from time-lapse microscopy data, recovering $2\mathrm{D}{+}t$ reaction-diffusion models from experimental observations. The proposed framework is readily applicable to other spatio-temporal systems, providing a practical and interpretable tool for fast analytic equation discovery from data.
We present the first determination of the $N = 50$ empirical shell gap at $Z = 48$ by precise mass measurements of the neutron-deficient cadmium isotopes $^{96-98}$Cd with the ISOLTRAP mass spectrometer at ISOLDE-CERN, including the first precise determination of the excitation energy of the $25/2^+$ isomer in $^{97}$Cd. Through the systematics of Coulomb Displacement Energies, we further deduce the empirical shell gap in the higher-$Z$ isotopic chains, tightly constraining the $^{100}$Sn mass-surface region. The new experimental data suggest an enhancement of the gap towards $^{100}$Sn, which is discussed in comparison to state-of-the-art calculations using energy-density functional and new ab initio approaches.
A recent report on ${^7{\rm Li}}(e,e'K^+)$ electroproduction runs by the A1 collaboration at the Mainz Microtron (MAMI) assigns a sharp pion-momentum line at $p_{π^-}\approx 113.8\pm 0.1$ MeV/c to ${_Λ^3}{\rm H}\toπ^-+{^3{\rm He}}$ weak decay, resulting in exceptionally large ${_Λ^3}{\rm H}$ binding-energy $B_Λ({_Λ^3}{\rm H})=0.523\pm 0.013\pm 0.075$ MeV. Here I suggest an alternative interpretation of the observed sharp line in terms of ${_Λ^7}{\rm He}_{\rm g.s.}\toπ^-+{^7{\rm Li}}(E_{\rm x}=478$ keV) weak decay, discussing also the model dependence of $B_Λ({_Λ^7}{\rm He})$.
The STAR Collaboration reports measurements of the collision energy dependence of hypertriton (${}^{3}_Λ$H) transverse momentum spectra and $p_{\rm T}$-integrated yields at mid-rapidity ($|y|<$0.5) in Au+Au collisions at 11 collision energies between 3.2 and 27\,GeV. The measured ${}^{3}_Λ$H yields and ${}^{3}_Λ$H/$Λ$ yields ratio in central collisions increase strongly with decreasing collision energy, and are a factor of $\sim$2 lower than thermal model predictions at this energy range. The mean $p_{\rm T}$ of ${}^{3}_Λ$H is lower than the Blast-Wave expectation using the freeze-out parameters from light hadrons. Furthermore, the observed double ratio $({}^{3}_Λ{\rm{H}}/Λ)/(t/p)$ maintains a constant value of $\sim$0.4 across the measured energy range. Within the coalescence framework, this ratio directly reflects the significantly suppressed formation probability of the weakly-bound hypertriton relative to the triton, which results from the weaker hyperon-nucleon interaction compared with the nucleon-nucleon interaction.
We present the first results on the $π^0$ nuclear modification factor $R_{OO}$ in OO collisions at LHC energies by the ALICE experiment. The measurement of the modification of hadron production in nuclear collisions compared to a vacuum baseline in pp collisions is a valuable probe for parton energy loss in the hot medium. The ALICE $R_{OO}$ results show significant (up to 4$σ$) suppression of $π^0$ production in OO collisions compared to the pp reference, and up to 2.4$σ$ deviation w.r.t. model predictions that include only cold nuclear matter effects.
Sub-barrier photo-induced fission in $^{236}$U$(γ,f)$ is investigated within the non-equilibrium Green function (NEGF) method. A model space for the fission process is constructed by superposing Skyrme-Hartree-Fock wave functions along the fission path allowing the particle-hole excitation. Then, the transition from the photo-absorption channel to the fission channel is described by the non-equilibrium Green-function formalism. The calculated fission cross section in the incident gamma-ray energy range $5 ~ {\rm MeV} \leq E_γ\leq 6 ~ {\rm MeV}$ reproduces the overall behavior of the experimental data, including the suppression below the fission barrier. An eigenchannel analysis of the wave propagation in the present fission model space is also performed, and the first eigenchannel is found to dominate the fission probability. This result supports the Bohr-Wheeler transition-state picture from a microscopic viewpoint.
The multiplicity-dependent suppression of $Υ(n\mathrm{S})$ states measured by CMS in $pp$ at $\sqrt s=7\,$TeV \cite{CMS2020}, and of $ψ(2S)\big/J/ψ$ measured by LHCb at $\sqrt s=13\,$TeV \cite{LHCb2024}, is subjected to four multi-differential tests: \emph{cone isolation}, \emph{azimuthal sectors}, \emph{transverse sphericity}, and \emph{prompt vs. non-prompt}. Cone and sphericity close a \emph{scissors constraint}: the local reading of the Comover Interaction Model is in tension with the cone data, its global reading with the sphericity data. The non-prompt flatness forces the mechanism to act at early proper times. None of the considered hadronic or string-based frameworks -- CIM local or global, PYTHIA 8 MPI \cite{Sjostrand2015}, rope hadronisation \cite{Bierlich2015}, CGC \cite{Ma2015}, Trainor TCM \cite{Trainor2008} -- naturally satisfies the four constraints simultaneously in its published form. The surviving class is an early, globally correlated medium consistent with partonic degrees of freedom, co-occurring with the ALICE strangeness enhancement \cite{ALICE_SE}, the long-range ridge \cite{CMS_ridge}, and below the threshold of the partonic baryon-meson $v_2$ \cite{ALICE_v2}, in a density window compatible with the Campanini \& Ferri equation of state \cite{Campanini2011}.
The Belle and Belle II experiments have collected a 1.3 ab$^{-1}$ sample of $e^+e^-\to B\bar B$ collisions at $Υ(4S)$ centre-of-mass energy. This is ideal environment to search for rare electroweak penguin $B$ decays and notably those involving $B$ decays to final states with missing energy. Results on these datasets of $b\to s \ell^+\ell^-$ $(\ell=e,μ)$, $b\to sτ^+τ^-$, and $b\to sν\bar ν$ transitions are presented.
A search is performed for the single production of a heavy vector-like quark (VLQ), decaying into a W boson and a b quark. The analysis uses proton-proton collision data collected by the CMS experiment at the CERN LHC at a center-of-mass energy of 13 TeV and corresponding to an integrated luminosity of 138 fb$^{-1}$. The search targets events with leptonic W boson decays. The event signature consists of one electron or muon, large transverse momentum imbalance, at least one jet consistent with coming from the fragmentation of a b quark and having large transverse momentum, and at least one jet in the forward region of the detector. No significant excess over the standard model prediction is observed. Upper limits are set at the 95% confidence level on the production cross section of a VLQ and its coupling $κ_\mathrm{W}$ to the standard model sector. For a VLQ decaying exclusively into Wb, the upper limit on $κ_\mathrm{W}$ depends on the VLQ mass and reaches values as low as 0.086 for masses around 1.4 TeV. For $κ_\mathrm{W}$ = 0.2 the lower limit on the VLQ mass is 2.4 TeV. These are the most stringent limits to date on the single production of VLQs decaying into Wb.
This contribution discusses the physics potential of a future muon collider operating at a center-of-mass energy of $\sqrt{s} = 10$ TeV for precision studies in the Higgs sector. Using a detailed detector simulation that incorporates the dominant sources of machine-induced background, the expected sensitivity to key Higgs processes is evaluated. These include the measurement of production cross sections for $H\to b\bar{b}$, $H\to WW^*$, and double-Higgs production $H\!H\to b\bar{b}b\bar{b}$. A central focus of the study is the determination of the Higgs boson trilinear self-coupling, a critical parameter for understanding the structure of the Higgs potential and electroweak symmetry breaking. The analysis is based on the MUSIC multi-purpose detector concept, specifically optimized for the muon collider environment, and assumes an integrated luminosity of $10$ ab$^{-1}$ collected over five years. The results presented highlight the exceptional prospects of a multi-TeV muon collider for exploring the Higgs potential with a level of precision unattainable by any other proposed future collider within a comparable timeframe.
The $B \to πK$ system provides a rich laboratory for testing the Standard Model and studying CP violation. A particularly important channel is $B^0_d\toπ^0 K_{\rm S}$, the only mode exhibiting both direct and mixing-induced CP violation. Recent Belle II measurements of the CP asymmetries in this decay provide valuable new input. An updated analysis incorporating these new data provides new insight on the long-standing $B \to πK$ puzzle. Looking ahead to the high-precision era of flavour physics, the $B \to πK$ system can be further exploited to potentially reveal new sources of CP violation.
CP violation offers powerful probes to explore the quark-flavour sector, where decays of B mesons have been key players since decades. I discuss a variety of probes ranging from non-leptonic to rare B decays, offering exciting opportunities at the FCC in the era after the HL-LHC and Belle II.
We systematically investigate the scaling properties of the transverse momentum spectra for pions, kaons, and protons in Au+Au collisions at $\sqrt{s_{NN}}$ = 7.7, 11.5, 14.5, 19.6, 27, 39, 62.4, and 200 GeV, as well as in U+U collisions at $\sqrt{s_{NN}}$ = 193 GeV, across different centrality classes, using experimental data from the collaborations at the Relativistic Heavy Ion Collider (RHIC). Universal scaling emerges when the particle transverse momentum spectra are scaled by global physical quantities, i.e., the average total particle multiplicity and mean transverse momentum, confirming recent scaling findings from the data at the Large Hadron Collider (LHC) by the ExTrEMe collaboration. The scaling behavior breaks down in the high $p_{T}$ region and in peripheral collisions. We provide a natural explanation for these observations by invoking the Cooper-Frye formula, which is used for hadronization in hydrodynamics. Furthermore, we demonstrate the equivalence between the scaling found by the ExTrEMe collaboration and the Hwa-Yang scaling which was proposed two decades ago.
The upcoming PandaX-xT experiment will deploy over 3,000 readout channels operating at a 500 MSa/s sampling rate, generating a sustained data bandwidth up to 1.6 GB/s. To meet this demanding requirement, we present AURORA, a high-performance, distributed data acquisition (DAQ) framework designed for scalability, low latency, and efficient resource utilization. Built on a modular architecture and leveraging modern I/O and networking technologies, including multi-level buffering, deferred and asynchronous processing, AURORA achieves a projected throughput of over 3 GB/s on the aggregation node in benchmark tests. While developed to support PandaX-xT, the framework is experiment-agnostic and readily adaptable to other large-scale particle and nuclear physics experiments.
Precision Higgs physics offers a sensitive window into physics beyond the Standard Model. In parallel, neutrino-oscillation experiments have established the existence of nonzero neutrino masses, thus implying the presence of new physics. Motivated by these facts, we investigate the one-loop contributions of light and heavy Majorana neutrinos to the $ZZh$ vertex within a variant of the type-I seesaw mechanism in which light-neutrino masses vanish at tree level and are then generated radiatively. We analyze the $CP$-conserving and $CP$-violating anomalous couplings which characterize the $ZZh$ vertex and study their phenomenological implications in two relevant kinematic scenarios at future lepton colliders: Higgsstrahlung production and Higgs production via neutral-current vector-boson fusion. We find that $CP$-conserving effects can reach magnitudes of order $10^{-3}$, potentially within projected future experimental sensitivities, whereas $CP$-violating contributions are strongly suppressed, lying well beyond such projections.
The multiplicity-dependent suppression of $Υ(n\mathrm{S})$ states measured by CMS in $pp$ at $\sqrt s=7\,$TeV \cite{CMS2020}, and of $ψ(2S)\big/J/ψ$ measured by LHCb at $\sqrt s=13\,$TeV \cite{LHCb2024}, is subjected to four multi-differential tests: \emph{cone isolation}, \emph{azimuthal sectors}, \emph{transverse sphericity}, and \emph{prompt vs. non-prompt}. Cone and sphericity close a \emph{scissors constraint}: the local reading of the Comover Interaction Model is in tension with the cone data, its global reading with the sphericity data. The non-prompt flatness forces the mechanism to act at early proper times. None of the considered hadronic or string-based frameworks -- CIM local or global, PYTHIA 8 MPI \cite{Sjostrand2015}, rope hadronisation \cite{Bierlich2015}, CGC \cite{Ma2015}, Trainor TCM \cite{Trainor2008} -- naturally satisfies the four constraints simultaneously in its published form. The surviving class is an early, globally correlated medium consistent with partonic degrees of freedom, co-occurring with the ALICE strangeness enhancement \cite{ALICE_SE}, the long-range ridge \cite{CMS_ridge}, and below the threshold of the partonic baryon-meson $v_2$ \cite{ALICE_v2}, in a density window compatible with the Campanini \& Ferri equation of state \cite{Campanini2011}.
The motivation behind the present work is to adopt methodology from field theory and high-energy physics to crystallography. In particular, we establish a relationship between the electromagnetic sector of the Standard-Model Extension (SME) for Lorentz invariance violation and optical media. At an effective level, electromagnetic properties associated with different crystal structures are demonstrated to be parametrized in the SME. Crystallographic and magnetic point groups provide the mathematical tools to show this correspondence. Birefringent and magnetoelectric media merit a dedicated study. Intriguing effects, which have not been described systematically in the modern literature, are rediscovered for the latter and expressed in SME language. With the setting developed at our disposal, materials with specific symmetries such as birefringent or multiferroic crystals serve as condensed-matter analogs for SME effects. It enables us to propose materials with unusual optical properties, which have not been thoroughly looked at in recent times.
Two-particle two-hole (2p2h) excitations driven by meson-exchange currents (MEC) are among the leading nuclear uncertainties in long-baseline neutrino oscillation experiments. Three models currently implemented in neutrino event generators disagree by 20--40% on the $ω$-integrated 2p2h cross section in the dip region on carbon (differential disagreements can reach factors of 2--3), and the axial two-body current has no direct experimental constraint beyond tritium $β$-decay at $Q^2 = 0$. We propose a measurement program at the Electron-Ion Collider (EIC) using polarized electron scattering on deuteron and $^3$He. Electromagnetic (EM) scattering ($γ^*$ exchange) measures the vector MEC. Charged-current (CC) scattering ($W^-$ exchange) on the same targets measures the vector$+$axial MEC. Subtracting the two provides the first direct sensitivity to the axial two-body current, including the $V$--$A$ interference, as a function of momentum transfer. Using $^3$He (2~$pn$ $+$ 1~$pp$ pair) extends the decomposition to $pp$ pairs. Polarized beams and targets give access to six EM response functions on deuteron, four of which have not been previously measured. The tensor analyzing power provides a sign-flip test for $Δ$-excitation MEC. We present projected sensitivities at $50 fb^{-1}$ on deuteron ($\sim$5 years at $10^{33}$~cm$^{-2}$s$^{-1}$). The EM program can deliver $\sim\!5\!\times\!10^4$ events per $Q^2$ bin constraining the MEC transverse response to $\sim$2% per bin, the beam--target double-spin asymmetry reaches $6$--$13σ$ per bin, and the vector MEC $V_{pn}$ is measured to $\sim$6% per bin. The CC channel is statistics-limited, with $\sim$6--38 events per $Q^2$ bin at $50 fb^{-1}$, requiring a luminosity upgrade beyond the current EIC baseline.
Recently arxiv:2604.12897 urged that the terminology "neutrinoless double beta decay" should be changed to "Majorana double beta decay" to properly give credit to Majorana, and to focus on the positive aspects of the phenomenon -- supposed creation of matter in the laboratory -- rather than the negative: absence of something, embarrassment over false claims of detection, and a "sociology of suspicion." I argue that the current terminology is more accurate and descriptive, and that the claimed reasons for its adoption are lacking in credibility.
The $B \to πK$ system provides a rich laboratory for testing the Standard Model and studying CP violation. A particularly important channel is $B^0_d\toπ^0 K_{\rm S}$, the only mode exhibiting both direct and mixing-induced CP violation. Recent Belle II measurements of the CP asymmetries in this decay provide valuable new input. An updated analysis incorporating these new data provides new insight on the long-standing $B \to πK$ puzzle. Looking ahead to the high-precision era of flavour physics, the $B \to πK$ system can be further exploited to potentially reveal new sources of CP violation.
We present an effective numerical method that can be used to straightforwardly calculate the full spectrum of primordial gravitational waves produced during inflation and reheating. Our method is based on the Bogoliubov approach with several key improvements to overcome its shortcomings such as numerical instabilities at high frequencies and issues with tachyonic modes. We also present a few useful analytical examples from which one can gain crucial insights into the numerical instabilities. The improved method allows us to demonstrate that anharmonicity of inflaton oscillations can leave interesting fingerprints on the high-frequency part of the GW spectrum. Our numerical code is publicly available on [GitHub](We present an effective numerical method that can be used to straightforwardly calculate the full spectrum of primordial gravitational waves produced during inflation and reheating. Our method is based on the Bogoliubov approach with several key improvements to overcome its shortcomings such as numerical instabilities at high frequencies and issues with tachyonic modes. We also present a few useful analytical examples from which one can gain crucial insights into the numerical instabilities. The improved method allows us to demonstrate that anharmonicity of inflaton oscillations can leave interesting fingerprints on the high-frequency part of the GW spectrum. Our numerical code is publicly available on GitHub https://github.com/xunjiexu/Unified-Bogoliubov.git.
In a recently proposed approach to testing models of inflation by Cosmic Microwave Background (CMB) radiation the reheating temperature is directly expressed in terms of the CMB observables. Its model independent bounds translate in a given model into narrow ranges of those observables. In that approach we analyse the polynomial class of the $α$-attractor inflaton potential models (P-models), in a broad range of polynomials and with the inflaton decays and fragmentation in the reheating period taken into account. The predictions for the CMB observables, the scalar spectral index $n_s$ and tensor-to-scalar ratio $r$, are compared with the Planck and Planck combined with ACT data. Both can be accommodated by that class of the $α$ attractor models. The sensitivity of the results of that comparison to the reheating temperature and to the upper bound on the ratio $r$ is clearly demonstrated.
The 1D Mueller dipole model, its high energy limit, and its generalization were investigated. To address the ambiguity stemming from different definitions of the pseudorapidity ranges in experimental measurements, we propose the entropy as the function of the logarithm of the average multiplicity, $S(\ln\langle n\rangle$, as a universal observable. From the solutions of the models, we calculate both the entropy and the average charged particle multiplicity and compare to data measured in proton-proton collisions. We obtained these quantities directly from the measured charged particle multiplicity distributions and determine the model parameters via fits. We find that the generalized dipole model provides a significantly better description of the data than the 1D Mueller model.
We analyze the effect of using the Fermi-Dirac statistics, rather than its Boltzmann approximation, in numerical simulations of perfect spin hydrodynamics of particles with spin 1/2. The system considered is boost invariant, transversely homogeneous, with corrections to the baryon current and the energy-momentum tensor that are second order in the spin polarization tensor $ω$, and the spin tensor considered is first order in $ω$. The study shows the feasibility of this approach, as the special functions defined by integrals that appear in the coefficients in the Fermi-Dirac case can be conveniently parametrized. For sets of initial conditions used in previous works, the differences in parameter evolution between the two underlying particle statistics are about one order of magnitude smaller than corrections coming from spin feedback. We also discuss when and why the numerical solutions of the equations of perfect spin hydrodynamics break down for very large values of spin polarization in one of the geometric configurations considered.
We study the hydrodynamics of the Filtered Dark Matter (Filtered DM) scenario during a first-order phase transition (FOPT). In this scenario, the bubble wall is highly reflective of the dark matter (DM) fluid but transparent to radiation, making the hydrodynamic problem fundamentally different from that of the electroweak FOPT. Motivated by this property, we formulate the hydrodynamics of this system as a two-component fluid composed of DM and radiation, and find that the solutions can be classified into detonation-like and deflagration-like branches in the ballistic regime and in the local thermal equilibrium (LTE) regime. In the ballistic regime, the energy--momentum of DM that cannot enter the wall appears as a reflected mode, while in the LTE regime, it relaxes into the energy--momentum of radiation. We find that this difference in the fate of the DM fluid that cannot enter the interior of the wall leads to different hydrodynamic behaviors in the DM and radiation fluids independently and, in particular, results in different existence conditions for solutions in the deflagration-like branch. Based on these results, we further revisit the impact of hydrodynamic effects on the relic abundance of Filtered DM and demonstrate the change in the abundance induced by hydrodynamic effects. In addition, we also discuss the non-conservation of the entropy current from the viewpoint of the two-fluid system, and briefly comment on the similarity between the Filtered DM system and information-thermodynamic systems.
CP violation offers powerful probes to explore the quark-flavour sector, where decays of B mesons have been key players since decades. I discuss a variety of probes ranging from non-leptonic to rare B decays, offering exciting opportunities at the FCC in the era after the HL-LHC and Belle II.
We systematically investigate the scaling properties of the transverse momentum spectra for pions, kaons, and protons in Au+Au collisions at $\sqrt{s_{NN}}$ = 7.7, 11.5, 14.5, 19.6, 27, 39, 62.4, and 200 GeV, as well as in U+U collisions at $\sqrt{s_{NN}}$ = 193 GeV, across different centrality classes, using experimental data from the collaborations at the Relativistic Heavy Ion Collider (RHIC). Universal scaling emerges when the particle transverse momentum spectra are scaled by global physical quantities, i.e., the average total particle multiplicity and mean transverse momentum, confirming recent scaling findings from the data at the Large Hadron Collider (LHC) by the ExTrEMe collaboration. The scaling behavior breaks down in the high $p_{T}$ region and in peripheral collisions. We provide a natural explanation for these observations by invoking the Cooper-Frye formula, which is used for hadronization in hydrodynamics. Furthermore, we demonstrate the equivalence between the scaling found by the ExTrEMe collaboration and the Hwa-Yang scaling which was proposed two decades ago.
Recently, a mechanism for generating astrophysically relevant magnetic fields via ultralight pseudoscalar dark matter, through the coupling term $g_{φγ} φF_{μν}\tilde{F}^{μν}$ in the Lagrangian density, was proposed in Brandenberger et al (2026) (see Ref. 1). In this scenario, the electromagnetic fields are amplified through the phenomena of parametric resonance due to the oscillatory behaviour of the pseudoscalar field. However, the analysis presented in that work does not account for the effects of a conducting medium. In this paper, we incorporate the finite conductivity of the plasma into the dynamics of the pseudoscalar and electromagnetic fields. We show that, due to the large conductivity relative to the Hubble parameter, the amplification of the electromagnetic fields due to parametric resonance is significantly suppressed. Consequently, we find that, for observationally viable values of the coupling between the electromagnetic field and the ultralight pseudoscalar field, it is not possible to generate magnetic fields of sufficient strength to explain their presence in cosmic voids.
We investigate $2P$-$1F$ mixing in charmonium, focusing on the close-in-mass $χ_{c2}(2P)$ and $χ_{c2}(1F)$ states. The conventional tensor force yields negligible mixing, motivating the inclusion of coupled-channel effects. Our unquenched calculation reveals sizable mixing angles of $7.5^\circ$ and $15.4^\circ$. We predict the corresponding two-photon and two-gluon decay widths as key observables for experimental verification. Additionally, we discuss the production of these two $2P$-$1F$ mixed states of charmonium via $γγ$ fusion. Current data are insufficient to determine the mixing, highlighting the need for precise future measurements to resolve this aspect of charmonium spectroscopy.
We introduce Prior-Fitted Functional Flows, a generative foundation model for pharmacokinetics that enables zero-shot population synthesis and individual forecasting without manual parameter tuning. We learn functional vector fields, explicitly conditioned on the sparse, irregular data of an entire study population. This enables the generation of coherent virtual cohorts as well as forecasting of partially observed patient trajectories with calibrated uncertainty. We construct a new open-access literature corpus to inform our priors, and demonstrate state-of-the-art predictive accuracy on extensive real-world datasets.
Given only observational data $X = g(Z)$, where both the latent variables $Z$ and the generating process $g$ are unknown, recovering $Z$ is ill-posed without additional assumptions. Existing methods often assume linearity or rely on auxiliary supervision and functional constraints. However, such assumptions are rarely verifiable in practice, and most theoretical guarantees break down under even mild violations, leaving uncertainty about how to reliably understand the hidden world. To make identifiability actionable in the real-world scenarios, we take a complementary view: in the general settings where full identifiability is unattainable, what can still be recovered with guarantees, and what biases could be universally adopted? We introduce the problem of diverse dictionary learning to formalize this view. Specifically, we show that intersections, complements, and symmetric differences of latent variables linked to arbitrary observations, along with the latent-to-observed dependency structure, are still identifiable up to appropriate indeterminacies even without strong assumptions. These set-theoretic results can be composed using set algebra to construct structured and essential views of the hidden world, such as genus-differentia definitions. When sufficient structural diversity is present, they further imply full identifiability of all latent variables. Notably, all identifiability benefits follow from a simple inductive bias during estimation that can be readily integrated into most models. We validate the theory and demonstrate the benefits of the bias on both synthetic and real-world data.
Persistent homology (PH) encodes global information, such as cycles, and is thus increasingly integrated into graph neural networks (GNNs). PH methods in GNNs typically traverse an increasing sequence of subgraphs. In this work, we first expose limitations of this inclusion procedure. To remedy these shortcomings, we analyze contractions as a principled topological operation, in particular, for graph representation learning. We study the persistence of contraction sequences, which we call Contraction Homology (CH). We establish that forward PH and CH differ in expressivity. We then introduce Hourglass Persistence, a class of topological descriptors that interleave a sequence of inclusions and contractions to boost expressivity, learnability, and stability. We also study related families parametrized by two paradigms. We also discuss how our framework extends to simplicial and cellular networks. We further design efficient algorithms that are pluggable into end-to-end differentiable GNN pipelines, enabling consistent empirical improvements over many PH methods across standard real-world graph datasets. Code is available at \href{https://github.com/Aalto-QuML/Hourglass}{this https URL}.
The low-degree polynomial framework has emerged as a powerful tool for providing evidence of statistical-computational gaps in high-dimensional inference. For detection problems, the standard approach bounds the low-degree advantage through an explicit orthonormal basis. However, this method does not extend naturally to estimation tasks, and thus fails to capture the \emph{detection-recovery gap phenomenon} that arises in many high-dimensional problems. Although several important advances have been made to overcome this limitation \cite{SW22, SW25, CGGV25+}, the existing approaches often rely on delicate, model-specific combinatorial arguments. In this work, we develop a general approach for obtaining \emph{conditional computational lower bounds} for recovery problems from mild bounds on low-degree testing advantage. Our method combines the notion of algorithmic contiguity in \cite{Li25} with a cross-validation reduction in \cite{DHSS25} that converts successful recovery into a hypothesis test with lopsided success probabilities. In contrast to prior unconditional lower bounds, our argument is conceptually simple, flexible, and largely model-independent. We apply this framework to several canonical inference problems, including planted submatrix, planted dense subgraph, stochastic block model, multi-frequency angular synchronization, orthogonal group synchronization, and multi-layer stochastic block model. In the first three settings, our method recovers existing low-degree lower bounds for recovery in \cite{SW22, SW25} via a substantially simpler argument. In the latter three, it gives new evidence for conjectured computational thresholds including the persistence of detection-recovery gaps. Together, these results suggest that mild control of low-degree advantage is often sufficient to explain computational barriers for recovery in high-dimensional statistical models.
This paper proposes StrEBM, a structured latent energy-based model for source-wise structured representation learning. The framework is motivated by a broader goal of promoting identifiable and decoupled latent organization by assigning different latent dimensions their own learnable structural biases, rather than constraining the entire latent representation with a single shared energy. In this sense, blind source separation is adopted here as a concrete and verifiable testbed, through which the evolution of latent dimensions toward distinct underlying components can be directly examined. In the proposed framework, latent trajectories are optimized directly together with an observation-generation map and source-wise structural parameters. Each latent dimension is associated with its own energy-based formulation, allowing different latent components to gradually evolve toward distinct source-like roles during training. In the present study, this source-wise energy design is instantiated using Gaussian-process-inspired energies with learnable length-scales, but the framework itself is not restricted to Gaussian processes and is intended as a more general structured latent EBM formulation. Experiments on synthetic multichannel signals under linear and nonlinear mixing settings show that the proposed model can recover source components effectively, providing an initial empirical validation of the framework. At the same time, the study reveals important optimization characteristics, including slow late-stage convergence and reduced stability under nonlinear observation mappings. These findings not only clarify the practical behavior of the current GP-based instantiation, but also establish a basis for future investigation of richer source-wise energy families and more robust nonlinear optimization strategies.
Recursive architectures such as Tiny Recursive Models (TRMs) perform implicit reasoning through iterative latent computation, yet the geometric structure of these reasoning trajectories remains poorly understood. We investigate the activation manifold of TRMs during recursive unrolling and find that activations occupy an effectively linear, low-dimensional subspace whose principal directions can be tracked dynamically with cheap power iterations. This suggests that weight-sharing concentrates iterative computation along a small number of dominant eigendirections, and we find that this concentration varies sharply across computational sites. We exploit this structure through LASER (Low-Rank Activation SVD for Efficient Recursion), a dynamic compression framework that maintains an evolving low-rank basis via matrix-free subspace tracking with a fidelity-triggered reset mechanism, achieving ${\sim}60\%$ activation memory savings with no statistically significant accuracy degradation. Our analysis raises questions about how recursive architectures allocate representational capacity during implicit reasoning, and whether this concentration can be exploited to improve the efficiency and stability of latent computation.
We derive explicit non-asymptotic PAC-Bayes generalization bounds for Gibbs posteriors, that is, data-dependent distributions over model parameters obtained by exponentially tilting a prior with the empirical risk. Unlike classical worst-case complexity bounds based on uniform laws of large numbers, which require explicit control of the model space in terms of metric entropy (integrals), our analysis yields posterior-averaged risk bounds that can be applied to overparameterized models and adapt to the data structure and the intrinsic model complexity. The bound involves a marginal-type integral over the parameter space, which we analyze using tools from singular learning theory to obtain explicit and practically meaningful characterizations of the posterior risk. Applications to low-rank matrix completion and ReLU neural network regression and classification show that the resulting bounds are analytically tractable and substantially tighter than classical complexity-based bounds. Our results highlight the potential of PAC-Bayes analysis for precise finite-sample generalization guarantees in modern overparameterized and singular models.
Inductive bias refers to restrictions on the hypothesis class that enable a learning method to generalize effectively from limited data. A canonical example in control is linearity, which underpins low sample-complexity guarantees for stabilization and optimal control. For general nonlinear dynamics, by contrast, guarantees often rely on smoothness assumptions (e.g., Lipschitz continuity) which, when combined with covering arguments, can lead to data requirements that grow exponentially with the ambient dimension. In this paper we argue that data-efficient nonlinear control demands exploiting inductive bias embedded in nature itself, namely, structure imposed by physical laws. Focusing on Hamiltonian systems, we leverage symplectic geometry and intrinsic recurrence on energy level sets to solve target reachability problems. Our approach combines the recurrence property with a recently proposed class of policies, called chain policies, which composes locally certified trajectory segments extracted from demonstrations to achieve target reachability. We provide sufficient conditions for reachability under this construction and show that the resulting data requirements depend on explicit geometric and recurrence properties of the Hamiltonian rather than the state dimension.
Converting betting odds into accurate outcome probabilities is a fundamental challenge in order to use betting odds as a benchmark for sports forecasting and market efficiency analysis. In this study, we propose two methods to overcome the limitations of existing conversion methods. Firstly, we propose an odds-only method to convert betting odds to probabilities without using historical data for model fitting. While existing odds-only methods, such as Multiplicative, Shin, and Power exist, they do not adjust for biases or relationships we found in our betting odds dataset, which consists of 90014 football matches across five different bookmakers. To overcome these limitations, our proposed Odds-Only-Equal-Profitability-Confidence (OO-EPC) method aligns with the bookmakers' pricing objectives of having equal confidence in profitability for each outcome. We provide empirical evidence from our betting odds dataset that, for the majority of bookmakers, our proposed OO-EPC method outperforms the existing odds-only methods. Beyond controlled experiments, we applied the OO-EPC method under real-world uncertainty by using it for six iterations of an annual basketball outcome forecasting competition. Secondly, we propose a generalised linear model that utilises historical data for model fitting and then converts betting odds to probabilities. Existing generalised linear models attempt to capture relationships that the Efficient Market Hypothesis already captures. To overcome this shortcoming, our proposed Favourite-Longshot-Bias-Adjusted Generalised Linear Model (FL-GLM) fits just one parameter to capture the favourite-longshot bias, providing a more interpretable alternative. We provide empirical evidence from historical football matches where, for all bookmakers, our proposed FL-GLM outperforms the existing multinomial and logistic generalised linear models.
We introduce Prior-Fitted Functional Flows, a generative foundation model for pharmacokinetics that enables zero-shot population synthesis and individual forecasting without manual parameter tuning. We learn functional vector fields, explicitly conditioned on the sparse, irregular data of an entire study population. This enables the generation of coherent virtual cohorts as well as forecasting of partially observed patient trajectories with calibrated uncertainty. We construct a new open-access literature corpus to inform our priors, and demonstrate state-of-the-art predictive accuracy on extensive real-world datasets.
Constitution-conditioned post-training can be analysed as a structured perturbation of a model's learned representational geometry. We introduce ATLAS, a geometry-first program that traces constitution-induced hidden-state structure across charts, models, and substrates. Instead of treating the relevant unit as a single behaviour, neuron, vector, or patch, ATLAS tests a local chart whose tangent structure, occupancy distribution, and behavioural coupling can be measured under system change. On Gemma, the anchored source-local chart captures 310 / 320 reviewed source rows and all 84 / 84 reviewed score-flip rows, but compact exact-patch sufficiency does not close, so the exportable unit is the broader source-defined family. Freezing that family, we re-identify a target-local realisation in an unadapted Phi model, where the fully adjudicated confirmatory contrast separates with AUC 0.984 and mean gap 5.50. In held-out ALM8 mouse frontal-cortex perturbation data, the same source-defined family receives support across 5/5 folds, with mean held-out AUC 0.72 and mean fold gap 4.50. A multiple-choice analysis provides the main boundary: nearby target-local signals can appear without source-faithful closure. The resulting correspondence is not coordinate identity, site identity, or a target-side mediation theorem. It is geometric recurrence under redistribution: written constitutions can induce recoverable latent geometry whose organisation remains detectable across model and substrate changes while its local coordinates, occupancy, and behavioural expression shift.
Video-to-music (V2M) is the fundamental task of creating background music for an input video. Recent V2M models achieve audiovisual alignment by typically relying on visual conditioning alone and provide limited semantic and stylistic controllability to the end user. In this paper, we present Video-Robin, a novel text-conditioned video-to-music generation model that enables fast, high-quality, semantically aligned music generation for video content. To balance musical fidelity and semantic understanding, Video-Robin integrates autoregressive planning with diffusion-based synthesis. Specifically, an autoregressive module models global structure by semantically aligning visual and textual inputs to produce high-level music latents. These latents are subsequently refined into coherent, high-fidelity music using local Diffusion Transformers. By factoring semantically driven planning into diffusion-based synthesis, Video-Robin enables fine-grained creator control without sacrificing audio realism. Our proposed model outperforms baselines that solely accept video input and additional feature conditioned baselines on both in-distribution and out-of-distribution benchmarks with a 2.21x speed in inference compared to SOTA. We will open-source everything upon paper acceptance.
A number of optimization algorithms have been inspired by the physics of Newtonian motion. Here, we ask the question: do algorithms themselves obey some ``natural laws of motion,'' and can they be derived by an application of these laws? We explore this question by positing the theory that optimization algorithms may be considered as some manifestation of hidden algorithm primitives that obey certain universal non-Newtonian dynamics. This natural physics of optimization is developed by equating the terminal transversality conditions of an optimal control problem to the generalized Karush/John-Kuhn-Tucker conditions of an optimization problem. Through this equivalence formulation, the data functions of a given constrained optimization problem generate a natural vector field that permeates an entire hidden space with information on the optimality conditions. An ``action-at-a-distance'' operation via a Pontryagin-type minimum principle produces a local action to deliver a globalized result by way of a Hamilton-Jacobi inequality. An inverse-optimal algorithm is generated by performing control jumps that dissipate quantized ``energy'' defined by a search Lyapunov function. Illustrative applications of the proposed theory show that a large number of algorithms can be generated and explained in terms of the new mathematical physics of optimization.
Serving large language models under latency service-level objectives (SLOs) is a configuration-heavy systems problem with an unusually failure-prone search space: many plausible configurations crash outright or miss user-visible latency targets, and standard black-box optimizers treat these failures as wasted trials. We present SLO-Guard, a crash-aware autotuner for vLLM serving that treats crashes as first-class observations. SLO-Guard combines a feasible-first Thermal Budget Annealing (TBA) exploration phase with a warm-started Tree-structured Parzen Estimator (TPE) exploitation phase; the handoff replays all exploration history, including crashes encoded as extreme constraint violations. We additionally contribute a configuration-repair pass, a GPU-aware KV-cache memory guard, and a four-category crash taxonomy. We evaluate SLO-Guard on Qwen2-1.5B served with vLLM 0.19 on an NVIDIA A100 40GB. Across a pre-specified five-seed study, both SLO-Guard and uniform random search attain 75/75 feasibility with zero crashes under the corrected concurrent harness, and are statistically tied on best-achieved latency (Mann-Whitney two-sided p=0.84). SLO-Guard's advantage is in budget consistency: more trials in the fast-serving regime (10.20 vs. 7.40 out of 15; one-sided p=0.014) and higher post-handoff consistency (0.876 vs. 0.539; p=0.010). Under concurrent load, SLO-Guard's cross-seed standard deviation on best latency is 4.4x tighter than random search's (2.26 ms vs. 10.00 ms). A harness-replication analysis shows that the consistency findings survive an independent sequential-dispatch measurement condition. The central claim is not that SLO-Guard finds a better final configuration, but that it spends a fixed tuning budget more predictably once the fast regime has been found.
Credit risk default prediction remains a cornerstone of risk management in the financial industry. The task involves estimating the likelihood that a borrower will fail to meet debt obligations, an objective critical for lending decisions, portfolio optimization, and regulatory compliance. Traditional machine learning models such as logistic regression and tree-based ensembles are widely adopted for their interpretability and strong empirical performance. However, modern credit datasets are high-dimensional, heterogeneous, and noisy, increasing overfitting risk in monolithic models and reducing robustness under distributional shift. We introduce STRIKE (Stacking via Targeted Representations of Isolated Knowledge Extractors), a feature-group-aware stacking framework for structured tabular credit risk data. Rather than training a single monolithic model on the complete dataset, STRIKE partitions the feature space into semantically coherent groups and trains independent learners within each group. This decomposition is motivated by an additive perspective on risk modeling, where distinct feature sources contribute complementary evidence that can be combined through a structured aggregation. The resulting group-specific predictions are integrated through a meta-learner that aggregates signals while maintaining robustness and modularity. We evaluate STRIKE on three real-world datasets spanning corporate bankruptcy and consumer lending scenarios. Across all settings, STRIKE consistently outperforms strong tree-based baselines and conventional stacking approaches in terms of AUC-ROC. Ablation studies confirm that performance gains stem from meaningful feature decomposition rather than increased model complexity. Our findings demonstrate that STRIKE is a stable, scalable, and interpretable framework for credit risk default prediction tasks.
Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models will be publicly released.
Skills are a natural unit for describing what a language model can do and how its behavior can be changed. However, existing characterizations rely on human-written taxonomies, textual descriptions, or manual profiling pipelines--all external hypotheses about what matters that need not align with the model's internal representations. We argue that when the goal is to intervene on model behavior, skill characterization should be *model-native*: grounded in the model's own representations rather than imposed through external ontologies. We instantiate this view by recovering a compact orthogonal basis from sequence-level activations. The resulting basis is semantically interpretable but need not correspond to any predefined human ontology; instead, it captures axes of behavioral variation that the model itself organizes around. We validate this characterization on reasoning post-training, using the recovered basis for both SFT data selection and inference-time steering. We develop lightweight proxy interventions to identify which directions are most useful for a given model. Across Llama3-8B and Qwen2.5-3B, selecting data along those directions improves Pass@1 by up to 20% on MATH and 41% on AMC, outperforming data selection based on human-characterized skills. Because the basis lives in activation space, the same directions also serve as steering vectors at inference time, improving Pass@8 by up to 4.8% on MATH--an intervention that human-characterized skills cannot support. We further validate the characterization on safety alignment, where selecting adversarial training data for model-native skill coverage rather than textual diversity yields more sample-efficient learning. These results suggest that recovering skills from the model's own representations, rather than imposing them externally, provides a more effective foundation for intervening on model behavior. Codes are open-sourced.
Parkinson's disease (PD) is a progressive disorder in which symptom burden and functional impairment evolve over time, making severity staging essential for clinical monitoring and treatment planning. However, many computational studies emphasize binary PD detection and do not fully use repeated follow-up clinical assessments for stage-aware prediction. This study proposes STEP-PD, a severity-aware machine learning framework to classify PD severity using clinically interpretable boundaries. It leverages all available visits from the Parkinson's Progression Markers Initiative (PPMI) and integrates routinely collected subjective questionnaires and objective clinician-assessed measures. Disease severity is defined using Hoehn and Yahr staging and grouped into three clinically meaningful categories: Healthy, Mild PD (stages 1-2), and Moderate-to-Severe PD (stages 3-5). Three binary classification problems and a three-class severity task were evaluated using stratified cross-validation with imbalance-aware training. To enhance interpretability, SHAP was used to provide global explanations and local patient-level waterfall explanations. Across all tasks, XGBoost achieved the strongest and most stable performance, with accuracies of 95.48% (Healthy vs. Mild), 99.44% (Healthy vs. Moderate-to-Severe), and 96.78% (Mild vs. Moderate-to-Severe), and 94.14% accuracy with 0.8775 Macro-F1 for three-class severity classification. Explainability results highlight a shift from early motor features to progression-related axial and balance impairments. These findings show that multimodal clinical assessments within the PPMI cohort can support accurate and interpretable visit-level PD severity stratification.
LLM-based agents are assumed to integrate environmental observations into their reasoning: discovering highly relevant but unexpected information should naturally lead to a model exploiting its own discoveries. We show that this assumption is false for current LLM-based agents, which struggle to reflect or react to unexpected information. Across three benchmarks (Terminal-Bench, SWE-Bench, AppWorld), we inject complete task solutions into the agent environments to deliberately expose a task's solution to a model. While agents discover these solutions on Terminal-Bench in 79-81% of runs, they interact, or exploit, them in only 37-50% of cases. This gap is starkest in AppWorld: agents see documentation stating that a command "returns the complete solution to this task" in over 90% of attempts but exploit this in fewer than 7% of trials. We show that agents lack what we call environmental curiosity: the capability to recognize and investigate unexpected but relevant observations in response to environmental stimuli. We identify three main factors influencing environmental curiosity: available tools in the agent scaffold, test-time compute, and training data distribution. Our findings identify configurations that maximize curiosity also achieve the best performance on the unmodified benchmarks. Yet even jointly optimized agents still ignore discovered solutions in the majority of trials: current agents use the environment to fetch expected information, but not to revise their strategy or maximally exploit useful stimuli.
Salient object detection (SOD) requires modeling both long-range contextual dependencies and fine-grained structural details, which remains challenging for convolutional, transformer-based, and Mamba-based state space models. While recent Mamba-based state space approaches enable efficient global reasoning, they often struggle to recover precise object boundaries. In contrast, diffusion models capture strong structural priors through iterative denoising, but their use in discriminative dense prediction is still limited due to computational cost and integration challenges. In this work, we propose DGSSM, a diffusion-guided state space (Mamba) framework that formulates multimodal salient object detection as a progressive denoising process. The framework integrates diffusion structural priors with multi-scale state space encoding, adaptive saliency prompting, and an iterative Mamba diffusion refinement mechanism to improve boundary accuracy. A boundary-aware refinement head and self-distillation strategy further enhance spatial coherence and feature consistency. Extensive experiments on 13 public benchmarks across RGB, RGB-D, and RGB-T settings demonstrate that DGSSM consistently outperforms state-of-the-art methods across multiple evaluation metrics while maintaining a compact model size. These results suggest that diffusion-guided state space modeling is an effective and generalizable paradigm for multimodal dense prediction tasks.
How much data is enough to make a scientific discovery? As biomedical datasets scale to millions of samples and AI models grow in capacity, progress increasingly depends on predicting when additional data will substantially improve performance. In practice, model development often relies on empirical scaling curves measured across architectures, modalities, and dataset sizes, with limited theoretical guidance on when performance should improve, saturate, or exhibit cross-over behavior. We propose a scaling-law framework for cross-modal discoverability based on spectral structure of data covariance operators, task-aligned signal projections, and learned representations. Many performance metrics, including AUC, can be expressed in terms of cumulative signal-to-noise energy accumulated across identifiable spectral modes of an encoder and cross-modal operator. Under mild assumptions, this accumulation follows a zeta-like scaling law governed by power-law decay of covariance spectra and aligned signal energy, leading naturally to the appearance of the Riemann zeta function. Representation learning methods such as sparse models, low-rank embeddings, and multimodal contrastive objectives improve sample efficiency by concentrating useful signal into earlier stable modes, effectively steepening spectral decay and shifting scaling curves. The framework predicts cross-over regimes in which simpler models perform best at small sample sizes, while higher-capacity or multimodal encoders outperform them once sufficient data stabilizes additional degrees of freedom. Applications include multimodal disease classification, imaging genetics, functional MRI, and topological data analysis. The resulting zeta law provides a principled way to anticipate when scaling data, improving representations, or adding modalities is most likely to accelerate discovery.
Continual learning (CL) is concerned with learning multiple tasks sequentially without forgetting previously learned tasks. Despite substantial empirical advances over recent years, the theoretical development of CL remains in its infancy. At the heart of developing CL theory lies the challenge that the data distribution varies across tasks, and we argue that properly addressing this challenge requires understanding this variation--dependency among tasks. To explicitly model task dependency, we consider nonlinear regression tasks and propose the assumption that these tasks are dependent in such a way that the data of the current task is a nonlinear transformation of previous data. With this model and under natural assumptions, we prove statistical recovery guarantees (more specifically, bounds on estimation errors) for several CL paradigms in practical use, including experience replay with data-independent regularization and data-independent weights that balance the losses of tasks, replay with data-dependent weights, and continual learning with data-dependent regularization (e.g., knowledge distillation). To the best of our knowledge, our bounds are informative in cases where prior work gives vacuous bounds.
Given only observational data $X = g(Z)$, where both the latent variables $Z$ and the generating process $g$ are unknown, recovering $Z$ is ill-posed without additional assumptions. Existing methods often assume linearity or rely on auxiliary supervision and functional constraints. However, such assumptions are rarely verifiable in practice, and most theoretical guarantees break down under even mild violations, leaving uncertainty about how to reliably understand the hidden world. To make identifiability actionable in the real-world scenarios, we take a complementary view: in the general settings where full identifiability is unattainable, what can still be recovered with guarantees, and what biases could be universally adopted? We introduce the problem of diverse dictionary learning to formalize this view. Specifically, we show that intersections, complements, and symmetric differences of latent variables linked to arbitrary observations, along with the latent-to-observed dependency structure, are still identifiable up to appropriate indeterminacies even without strong assumptions. These set-theoretic results can be composed using set algebra to construct structured and essential views of the hidden world, such as genus-differentia definitions. When sufficient structural diversity is present, they further imply full identifiability of all latent variables. Notably, all identifiability benefits follow from a simple inductive bias during estimation that can be readily integrated into most models. We validate the theory and demonstrate the benefits of the bias on both synthetic and real-world data.
Machine learning is becoming increasingly important for nonlinear system identification, including dynamical systems with spatially distributed outputs. However, classical identification and forecasting approaches become markedly less reliable in turbulent-flow regimes, where the dynamics are high-dimensional, strongly nonlinear, and highly sensitive to compounding rollout errors. Diffusion-based models have recently shown improved robustness in this setting and offer probabilistic inference capabilities, but many current implementations inherit target parameterizations from image generation, most commonly noise or velocity prediction. In this work, we revisit this design choice in the context of nonlinear spatiotemporal system identification. We consider a simple, self-contained patch-based transformer that operates directly on physical fields and use turbulent flow simulation as a representative testbed. Our results show that clean-state prediction consistently improves rollout stability and reduces long-horizon error relative to velocity- and noise-based objectives, with the advantage becoming more pronounced as the per-token dimensionality increases. These findings identify target parameterization as a key modeling choice in diffusion-based identification of nonlinear systems with spatial outputs in turbulent regimes.
Standard approaches to goal-conditioned reinforcement learning (GCRL) that rely on temporal-difference learning can be unstable and sample-inefficient due to bootstrapping. While recent work has explored contrastive and supervised formulations to improve stability, we present a probabilistic alternative, called survival value learning (SVL), that reframes GCRL as a survival learning problem by modeling the time-to-goal from each state as a probability distribution. This structured distributional Monte Carlo perspective yields a closed-form identity that expresses the goal-conditioned value function as a discounted sum of survival probabilities, enabling value estimation via a hazard model trained via maximum likelihood on both event and right-censored trajectories. We introduce three practical value estimators, including finite-horizon truncation and two binned infinite-horizon approximations to capture long-horizon objectives. Experiments on offline GCRL benchmarks show that SVL combined with hierarchical actors matches or surpasses strong hierarchical TD and Monte Carlo baselines, excelling on complex, long-horizon tasks.
Persistent homology (PH) encodes global information, such as cycles, and is thus increasingly integrated into graph neural networks (GNNs). PH methods in GNNs typically traverse an increasing sequence of subgraphs. In this work, we first expose limitations of this inclusion procedure. To remedy these shortcomings, we analyze contractions as a principled topological operation, in particular, for graph representation learning. We study the persistence of contraction sequences, which we call Contraction Homology (CH). We establish that forward PH and CH differ in expressivity. We then introduce Hourglass Persistence, a class of topological descriptors that interleave a sequence of inclusions and contractions to boost expressivity, learnability, and stability. We also study related families parametrized by two paradigms. We also discuss how our framework extends to simplicial and cellular networks. We further design efficient algorithms that are pluggable into end-to-end differentiable GNN pipelines, enabling consistent empirical improvements over many PH methods across standard real-world graph datasets. Code is available at \href{https://github.com/Aalto-QuML/Hourglass}{this https URL}.
Serialization formats designed for document interchange impose structural overhead that becomes prohibitive when large language models consume operational data at scale. A modest dataset of 1,000 IoT sensor readings serialized as JSON requires approximately 80,000 tokens - the majority spent on repeated field names, nested braces, and structural punctuation rather than semantic content. We present ONTO (Object Notation for Token Optimization), a columnar notation that declares field names once per entity and arranges values in pipe-delimited rows with indentation-based hierarchy. This schema-once, data-many design eliminates per-record key repetition while preserving human readability and nested structure support. Evaluation across three synthetic operational datasets demonstrates 46-51% token reduction versus JSON, with stable scaling from 100 to 1,000 records. Controlled inference benchmarks on Qwen2.5-7B show corresponding 5-10% latency improvement. Comprehension validation confirms no material degradation in LLM task accuracy across lookup, counting, extraction, and aggregation operations when format context is provided. Ablation analysis reveals that key repetition accounts for the majority of JSON overhead, with indentation costs in nested structures explaining the 4-percentage-point gap between flat and hierarchical data. ONTO occupies a previously unfilled position in the serialization landscape: columnar efficiency with hierarchical structure, optimized for LLM context windows rather than document interchange. Code and specification are available at https://github.com/harsh-aranga/onto.
Many high-stakes AI deployments proceed only if every stakeholder deems the system acceptable relative to their own minimum standard. With randomization over a finite menu of options, this becomes a feasibility question: does there exist a lottery over options that clears all stakeholders' acceptability bars? We study a query model where the algorithm proposes lotteries and receives only binary accept/reject feedback. We give deterministic and randomized algorithms that either find a unanimously acceptable lottery or certify infeasibility; adaptivity can avoid eliciting many stakeholders' constraints, and randomization further reduces the expected elicitation cost relative to full elicitation. We complement these upper bounds with worst-case lower bounds (in particular, linear dependence on the number of stakeholders and logarithmic dependence on precision are unavoidable). Finally, we develop learning-augmented algorithms that exploit natural forms of advice (e.g., likely binding stakeholders or a promising lottery), improving query complexity when predictions are accurate while preserving worst-case guarantees.
Counterfactual explanations (CFEs) are essential for interpreting black-box models, yet they often become invalid when models are slightly changed. Existing methods for generating robust CFEs are often limited to specific types of models, require costly tuning, or inflexible robustness controls. We propose a novel approach that jointly models the data distribution and the space of plausible model decisions to ensure robustness to model changes. Using a probabilistic consensus over a model ensemble, we train a conditional normalizing flow that captures the data density under varying levels of classifier agreement. At inference time, a single interpretable parameter controls the robustness level; it specifies the minimum fraction of models that should agree on the target class without retraining the generative model. Our method effectively pushes CFEs toward regions that are both plausible and stable across model changes. Experimental results demonstrate that our approach achieves superior empirical robustness while also maintaining good performance across other evaluation measures.
Retrieving mathematical knowledge is a central task in both human-driven research, such as determining whether a result already exists, finding related results, and identifying historical origins, and in emerging AI systems for mathematics, where reliable grounding is essential. However, the scale and structure of the mathematical literature pose significant challenges: results are distributed across millions of documents, and individual statements are often difficult to interpret in isolation due to their dependence on prior definitions and theorems. In this paper, we introduce Matlas, a semantic search engine for mathematical statements. Matlas is built on a large-scale corpus of 8.07 million statements extracted from 435K peer-reviewed papers spanning 1826 to 2025, drawn from a curated set of 180 journals selected using an ICM citation-based criterion, together with 1.9K textbooks. From these sources, we extract mathematical statements together with their dependencies, construct document-level dependency graphs, and recursively unfold statements in topological order to produce more self-contained representations. On top of this corpus, we develop a semantic retrieval system that enables efficient search for mathematical results using natural language queries. We hope that Matlas can improve the efficiency of theorem retrieval for mathematicians and provide a structured source of grounding for AI systems tackling research-level mathematical problems, and serve as part of the infrastructure for mathematical knowledge retrieval.
In principle, deep generative models can be used to perform domain adaptation; i.e. align the input feature representations of test data with that of a separate discriminative model's training data. This can help improve the discriminative model's performance on the test data. However, generative models are prone to producing hallucinations and artefacts that may degrade the quality of generated data, and therefore, predictive performance when processed by the discriminative model. While uncertainty quantification can provide a means to assess the quality of adapted data, the standard framework for evaluating the quality of predicted uncertainties may not easily extend to generative models due to the common lack of ground truths (among other reasons). Even with ground truths, this evaluation is agnostic to how the generated outputs are used on the downstream task, limiting the extent to which the uncertainty reliability analysis provides insights about the utility of the uncertainties with respect to the intended use case of the adapted examples. Here, we describe how decision-theoretic uncertainty quantification can address these concerns and provide a convenient framework for evaluating the trustworthiness of generated outputs, in particular, for domain adaptation. We consider a case study in photoplethysmography time series denoising for Atrial Fibrillation classification. This formalises a well-known heuristic method of using a downstream classifier to assess the quality of generated outputs.
Advanced deepfake technologies are blurring the lines between real and fake, presenting both revolutionary opportunities and alarming threats. While it unlocks novel applications in fields like entertainment and education, its malicious use has sparked urgent ethical and societal concerns ranging from identity theft to the dissemination of misinformation. To tackle these challenges, feature analysis using frequency features has emergedas a promising direction for deepfake detection. However, oneaspect that has been overlooked so far is that existing methodstend to concentrate on one or a few specific frequency domains,which risks overfitting to particular artifacts and significantlyundermines their robustness when facing diverse forgery patterns. Another underexplored aspect we observe is that different features often attend to the same forged region, resulting in redundant feature representations and limiting the diversity of the extracted clues. This may undermine the ability of a model to capture complementary information across different facets, thereby compromising its generalization capability to diverse manipulations. In this paper, we seek to tackle these challenges from two aspects: (1) we propose a triple-branch network that jointly captures spatial and frequency features by learning from both original image and image reconstructed by different frequency channels, and (2) we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance. Our code is released at https://github.com/injooker/Unveiling Deepfake.
Small Vision-Language Models (SVLMs) are efficient task controllers but often suffer from visual brittleness and poor tool orchestration. They typically require expensive supervised trajectory tuning to mitigate these deficits. In this work, we propose Self-supervised Perception Enabled by Cascaded Tool Rollout Alignment (SPECTRA), a supervision-free framework that bootstraps agentic capabilities via Coldstart Reinforcement Learning for SVLMs. SPECTRA enforces Soft Structured Multi-turn Rollouts, a topological constraint that directs agents to explicitly sequence tool derived evidence before synthesis, effectively grounding reasoning in visual observations. We employ a multi-objective reward signal that simultaneously maximizes task correctness, rollout structure, and tool utility, enabling agent to self-discover robust behaviors without human preference labels. We further introduce Tool Instrumental Utility (TIU), a novel metric to quantify tool efficacy in the absence of ground truth. Extensive evaluations across composite and out-of-distribution (MMMU-Pro) benchmarks demonstrate that SPECTRA boosts agentic trajectories, improving task accuracy by up to 5% and tool efficiency by 9%, enabling more efficient multimodal agents that learn effectively from environmental interaction alone.
Machine learning has become a powerful tool for discovering governing laws of dynamical systems from data. However, most existing approaches degrade severely when observations are sparse, noisy, or irregularly sampled. In this work, we address the problem of learning symbolic representations of nonlinear Hamiltonian dynamical systems under extreme data scarcity by explicitly incorporating physical structure into the learning architecture. We introduce Adaptable Symplectic Recurrent Neural Networks (ASRNNs), a parameter-cognizant, structure-preserving model that combines Hamiltonian learning with symplectic recurrent integration, avoiding time derivative estimation, and enabling stable learning under noise. We demonstrate that ASRNNs can accurately predict long-term dynamics even when each training trajectory consists of only two irregularly spaced time points, possibly corrupted by correlated noise. Leveraging ASRNNs as structure-preserving data generators, we further enable symbolic discovery using independent regression methods (SINDy and PySR), recovering exact symbolic equations for polynomial systems and consistent polynomial approximations for non-polynomial Hamiltonians. Our results show that such architectures can provide a robust pathway to interpretable discovery of Hamiltonian dynamics from sparse and noisy data.
Self-consistency (SC) is a popular technique for improving the reasoning accuracy of large language models by aggregating multiple sampled outputs, but it comes at a high computational cost due to extensive sampling. We introduce a hybrid ensembling approach that leverages the complementary strengths of two distinct modes of reasoning: Chain-of-Thought (CoT) and Program-of-Thought (PoT). We describe a general framework for combining these two forms of reasoning in self-consistency, as well as particular strategies for both full sampling and early-stopping. We show that CoT-PoT ensembling not only improves overall accuracy, but also drastically reduces the number of samples required for SC by a factor of 9.3x. In particular, the majority of tasks (78.6%) can be addressed with only two samples, which has not been possible with any prior SC methods.
Meta-optics promises compact, high-performance imaging and color routing. However, designing high-performance structures is a high-dimensional optimization problem: mapping a desired optical output back to a physical 3D structure requires solving computationally expensive Maxwell's equations iteratively. Even with adjoint optimization, broadband design can require thousands of Maxwell solves, making industrial-scale optimization slow and costly. To overcome this challenge, we propose the Neural Adjoint Method, a solver-supervised surrogate that predicts 3D adjoint gradient fields from a voxelized permittivity volume using a Fourier Neural Operator (FNO). By learning the dense, per-voxel sensitivity field that drives gradient-based updates, our method can replace per-iteration adjoint solves with fast predictions, greatly reducing the computational cost of full-wave simulations required during iterative refinement. To better preserve sensitivity peaks, we introduce a stage-wise FNO that progressively refines residual errors with increasing emphasis on higher-frequency components. We curate a meta-optics dataset from paired forward/adjoint FDTD simulations and evaluate it across three tasks: spectral sorting (color routers), achromatic focusing (metalenses), and waveguide mode conversion. Our method reduces design time from hours to seconds. These results suggest a practical route toward fast, large-scale volumetric meta-optical design enabled by AI-accelerated scientific computing.
A unified framework for first-order optimization algorithms fornonconvex unconstrained optimization is proposed that uses adaptivelypreconditioned gradients and includes popular methods such as full anddiagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo andMuon. This framework also allows combining heterogeneous geometriesacross different groups of variables while preserving a unifiedconvergence analysis. A fully stochastic global rate-of-convergenceanalysis is conducted for all methods in the framework, with andwithout two types of momentum, using reasonable assumptions on thevariance of the gradient oracle and without assuming boundedstochastic gradients or small enough stepsize.
Money laundering poses severe risks to global financial systems, driving the widespread adoption of machine learning for transaction monitoring. However, progress remains stifled by the lack of realistic benchmarks. Existing transaction-graph datasets suffer from two pervasive limitations: (i) they provide sparse node-level semantics beyond anonymized identifiers, and (ii) they rely on template-driven anomaly injection, which biases benchmarks toward static structural motifs and yields overly optimistic assessments of model robustness. We propose TransXion, a benchmark ecosystem for Anti-Money Laundering (AML) research that integrates profile-aware simulation of normal activity with stochastic, non-template synthesis of illicit subgraphs.TransXion jointly models persistent entity profiles and conditional transaction behavior, enabling evaluation of "out-of-character" anomalies where observed activity contradicts an entity's socio-economic context. The resulting dataset comprises approximately 3 million transactions among 50,000 entities, each endowed with rich demographic and behavioral attributes. Empirical analyses show that TransXion reproduces key structural properties of payment networks, including heavy-tailed activity distributions and localized subgraph structure. Across a diverse array of detection models spanning multiple algorithmic paradigms, TransXion yields substantially lower detection performance than widely used benchmarks, demonstrating increased difficulty and realism. TransXion provides a more faithful testbed for developing context-aware and robust AML detection methods. The dataset and code are publicly available at https://github.com/chaos-max/TransXion.
Human mobility prediction is a critical task but remains challenging due to its complexity and variability across populations and regions. Recently, large language models (LLMs) have made progress in zero-shot prediction, but existing methods suffer from limited interpretability (due to black-box reasoning), lack of iterative learning from new data, and poor transferability. In this paper, we introduce \textbf{ARMove}, a fully transferable framework for predicting human mobility through agentic reasoning. To address these limitations, ARMove employs standardized feature management with iterative optimization and user-specific customization: four major feature pools for foundational knowledge, user profiles for segmentation, and an automated generation mechanism integrating LLM knowledge. Robust generalization is achieved via agentic decision-making that adjusts feature weights to maximize accuracy while providing interpretable decision paths. Finally, large-small model synergy distills strategies from large LLMs (e.g., 72B) to smaller ones (e.g., 7B), reducing costs and enhancing performance ceilings. Extensive experiments on four global datasets show ARMove outperforms state-of-the-art baselines on 6 out of 12 metrics (gains of 0.78\% to 10.47\%), with transferability tests confirming robustness across regions, users, and scales. The other 4 items also achieved suboptimal results. Transferability tests confirm its 19 robustness across regions, user groups, and model scales, while interpretability 20 analysis highlights its transparency in decision-making. Our codes are available at: https://anonymous.4open.science/r/ARMove-F847.
Reward-based fine-tuning aims to steer a pretrained diffusion or flow-based generative model toward higher-reward samples while remaining close to the pretrained model. Although existing methods are motivated by different perspectives such as Soft RL, GFlowNets, etc., we show that many can be written under a common framework, which we call reward score matching (RSM). Under this view, alignment becomes score matching toward a reward-guided target, and the main differences across methods reduce to the construction of the value-guidance estimator and the effective optimization strength across timesteps. This unification clarifies the bias--variance--compute tradeoffs of existing designs and distinguishes core optimization components from auxiliary mechanisms that add complexity without clear benefit. Guided by this perspective, we develop simpler redesigns that improve alignment effectiveness and compute efficiency across representative settings with differentiable and black-box rewards. Overall, RSM turns a seemingly fragmented collection of reward-based fine-tuning methods into a smaller, more interpretable, and more actionable design space.
Symbolic regression (SR) with genetic programming (GP) aims to discover interpretable mathematical expressions directly from data. Despite its strong empirical success, the theoretical understanding of why GP-based SR generalizes beyond the training data remains limited. In this work, we provide a learning-theoretic analysis of SR models represented as expression trees. We derive a generalization bound for GP-style SR under constraints on tree size, depth, and learnable constants. Our result decomposes the generalization gap into two interpretable components: a structure-selection term, reflecting the combinatorial complexity of choosing an expression-tree structure, and a constant-fitting term, capturing the complexity of optimizing numerical constants within a fixed structure. This decomposition provides a theoretical perspective on several widely used practices in GP, including parsimony pressure, depth limits, numerically stable operators, and interval arithmetic. In particular, our analysis shows how structural restrictions reduce hypothesis-class growth while stability mechanisms control the sensitivity of predictions to parameter perturbations. By linking these practical design choices to explicit complexity terms in the generalization bound, our work offers a principled explanation for commonly observed empirical behaviors in GP-based SR and contributes towards a more rigorous understanding of its generalization properties.
RISC-V is emerging as a viable platform for automotive-grade embedded computing, with recent ISO 26262 ASIL-D certifications demonstrating readiness for safety-critical deployment in autonomous driving systems. However, functional safety in automotive systems is fundamentally a certification problem rather than a processor problem. The dominant costs arise from diagnostic coverage analysis, toolchain qualification, fault injection campaigns, safety-case generation, and compliance with ISO 26262, ISO 21448 (SOTIF), and ISO/SAE 21434. This paper analyzes the role of RISC-V in automotive functional safety, focusing on ISA openness, formal verifiability, custom extension control, debug transparency, and vendor-independent qualification. We examine autonomous driving safety requirements and map them to RISC-V architectural challenges such as lockstep execution, safety islands, mixed-criticality isolation, and secure debug. Rather than proposing a single algorithmic breakthrough, we present an analytical framework and research roadmap centered on certification economics as the primary optimization objective. We also discuss how selected ML methods, including LLM-assisted FMEDA generation, knowledge-graph-based safety case automation, reinforcement learning for fault injection, and graph neural networks for diagnostic coverage, can support certification workflows. We argue that the strongest outcome is not a faster core, but an ASIL-D-ready certifiable RISC-V platform.
We introduce JuRe (Just Repair), a minimal denoising network for time series anomaly detection that exposes a central finding: architectural complexity is unnecessary when the training objective correctly implements the manifold-projection principle. JuRe consists of a single depthwise-separable convolutional residual block with hidden dimension 128, trained to repair corrupted time series windows and scored at inference by a fixed, parameter-free structural discrepancy function. Despite using no attention, no latent variable, and no adversarial component, JuRe ranks second on the TSB-AD multivariate benchmark (AUC-PR 0.404, 180 series, 17 datasets) and second on the UCR univariate archive by AUC-PR (0.198, 250 series), leading all neural baselines on AUC-PR and VUS-PR. Component ablation on TSB-AD identifies training-time corruption as the dominant factor ($Δ$AUC-PR $= 0.047$ on removal), confirming that the denoising objective, not network capacity, drives detection quality. Pairwise Wilcoxon signed-rank tests establish statistical significance against 21 of 25 baselines on TSB-AD. Code is available at the URL https://github.com/iis-esslingen/JuRe.
Large language model optimization has historically bifurcated into isolated data-centric and model-centric paradigms: the former manipulates involved samples through selection, augmentation, or poisoning, while the latter tunes model weights via masking, quantization, or low-rank adaptation. This paper establishes a unified \emph{data-parameter correspondence} revealing these seemingly disparate operations as dual manifestations of the same geometric structure on the statistical manifold $\mathcal{M}$. Grounded in the Fisher-Rao metric $g_{ij}(θ)$ and Legendre duality between natural ($θ$) and expectation ($η$) parameters, we identify three fundamental correspondences spanning the model lifecycle: 1. Geometric correspondence: data pruning and parameter sparsification equivalently reduce manifold volume via dual coordinate constraints; 2. Low-rank correspondence: in-context learning (ICL) and LoRA adaptation explore identical subspaces on the Grassmannian $\mathcal{G}(r,d)$, with $k$-shot samples geometrically equivalent to rank-$r$ updates; 3. Security-privacy correspondence: adversarial attacks exhibit cooperative amplification between data poisoning and parameter backdoors, whereas protective mechanisms follow cascading attenuation where data compression multiplicatively enhances parameter privacy. Extending from training through post-training compression to inference, this framework provides mathematical formalization for cross-community methodology transfer, demonstrating that cooperative optimization integrating data and parameter modalities may outperform isolated approaches across efficiency, robustness, and privacy dimensions.
This paper proposes StrEBM, a structured latent energy-based model for source-wise structured representation learning. The framework is motivated by a broader goal of promoting identifiable and decoupled latent organization by assigning different latent dimensions their own learnable structural biases, rather than constraining the entire latent representation with a single shared energy. In this sense, blind source separation is adopted here as a concrete and verifiable testbed, through which the evolution of latent dimensions toward distinct underlying components can be directly examined. In the proposed framework, latent trajectories are optimized directly together with an observation-generation map and source-wise structural parameters. Each latent dimension is associated with its own energy-based formulation, allowing different latent components to gradually evolve toward distinct source-like roles during training. In the present study, this source-wise energy design is instantiated using Gaussian-process-inspired energies with learnable length-scales, but the framework itself is not restricted to Gaussian processes and is intended as a more general structured latent EBM formulation. Experiments on synthetic multichannel signals under linear and nonlinear mixing settings show that the proposed model can recover source components effectively, providing an initial empirical validation of the framework. At the same time, the study reveals important optimization characteristics, including slow late-stage convergence and reduced stability under nonlinear observation mappings. These findings not only clarify the practical behavior of the current GP-based instantiation, but also establish a basis for future investigation of richer source-wise energy families and more robust nonlinear optimization strategies.
In today's day and age, we face a challenge in detecting deepfake images because of the fast evolution of modern generative models and the poor generalization capability of existing methods. In this paper, we use an ensemble of fine-tuned vision transformers like DINOv2, AIMv2 and OpenCLIP's ViT-L/14 to create generalizable method to detect deepfakes. We use the DF-Wild dataset released as part of the IEEE SP Cup 2025, because it uses a challenging and diverse set of manipulations and generation techniques. We started our experiments with CNN classifiers trained on spatial features. Experimental results show that our ensemble outperforms individual models and strong CNN baselines, achieving an AUC of 96.77% and an Equal Error Rate (EER) of just 9% on the DF-Wild test set, beating the state-of-the-art deepfake detection algorithm Effort by 7.05% and 8% in AUC and EER respectively. This was the winning solution for SP Cup, presented at ICASSP 2025.
This paper investigates communication-efficient neural network transmission by exploiting structured symmetry constraints in convolutional kernels. Instead of transmitting all model parameters, we propose a degrees-of-freedom (DoF) based codec that sends only the unique coefficients implied by a chosen symmetry group, enabling deterministic reconstruction of the full weight tensor at the receiver. The proposed framework is evaluated under quantization and noisy channel conditions across multiple symmetry patterns, signal-to-noise ratios, and bit-widths. To improve robustness against transmission impairments, a projection step is further applied at the receiver to enforce consistency with the symmetry-invariant subspace, effectively denoising corrupted parameters. Experimental results on MNIST and CIFAR-10 using a DeepCNN architecture demonstrate that DoF-based transmission achieves substantial bandwidth reduction while preserving significantly higher accuracy than pruning-based baselines, which often suffer catastrophic degradation. Among the tested symmetries, \textit{central-skew symmetry} consistently provides the best accuracy-compression tradeoff, confirming that structured redundancy can be leveraged for reliable and efficient neural model delivery over constrained links.
Time series forecasting is traditionally dominated by sequence-based architectures such as recurrent neural networks and attention mechanisms, which process all time steps uniformly and often incur substantial computational cost. However, real-world temporal signals typically exhibit heterogeneous structure, where informative patterns are sparsely distributed and interspersed with redundant observations. This work introduces \textbf{SPaRSe-TIME}, a structured and computationally efficient framework that models time series through a decomposition into three complementary components: saliency, memory, and trend. The proposed approach reformulates temporal modeling as a projection onto informative subspaces, where saliency acts as a data-dependent sparsification operator, memory captures dominant low-rank temporal patterns, and trend encodes low-frequency dynamics. These components are integrated through a lightweight, adaptive mapping that enables simplified, selective, and interpretable temporal reasoning. Extensive experiments on diverse real-world datasets demonstrate that SPaRSe-TIME achieves competitive predictive performance compared to recurrent and attention-based architectures, while significantly reducing computational complexity. The model is particularly effective in structured time series with clear temporal components and provides explicit interpretability through component-wise contributions. Furthermore, analysis reveals both the strengths and limitations of decomposition-based modeling, highlighting challenges in highly stochastic and complex multivariate settings. Overall, SPaRSe-TIME offers a principled alternative to monolithic sequence models, bridging efficiency, interpretability, and performance, and providing a scalable framework for time series learning.
When task-specific labels are not available, it becomes difficult to select an embedding model for a specific target corpus. Existing labelless measures based on kernel estimators or Gaussian mixes fail in high-dimensional space, resulting in unstable rankings. We propose a flow-based labelless representation embedding evaluation (FLARE), which utilizes normalized streams to estimate information sufficiency directly from log-likelihood and avoid distance-based density estimation. We give a finite sample boundary, indicating that the estimation error depends on the intrinsic dimension of the data manifold rather than the original embedding dimension. On 11 datasets and 8 embedders, FLARE reached Spearman's $ρ$ of 0.90 under the supervised benchmark and remained stable in high-dimensional embeddings ($d \geq 3{,}584$) as the existing labelless baseline collapsed.
This paper investigates the length problem in sequence-level relative reinforcement learning. We observe that, although existing methods partially alleviate length-related phenomena, a more fundamental issue remains insufficiently characterized: the comparison units used during training lack inherent comparability. Building on this observation, we propose a new perspective: the length problem should not be viewed merely as a loss-scaling or normalization bias, but rather as a \emph{comparison unit construction} problem. We further establish a sample-construction-based training framework that, instead of applying post-hoc corrections to unequal-length responses, proactively constructs equal-length, alignable, and comparable training segments during generation. Within this framework, we propose EqLen, a concrete method applicable to group-relative comparison algorithms such as GRPO, GSPO, and RLOO. Through dual-track synchronous generation, prefix inheritance, and segment masking, EqLen efficiently collects effective equal-length training segments and enables stable
Graph transformers achieve strong results on molecular and long-range reasoning tasks, yet remain hampered by over-smoothing (the progressive collapse of node representations with depth) and attention entropy degeneration. We observe that these pathologies share a root cause with attention sinks in large language models: softmax attention's sum-to-one constraint forces every node to attend somewhere, even when no informative signal exists. Motivated by recent findings that element-wise sigmoid gating eliminates attention sinks in large language models, we propose SigGate-GT, a graph transformer that applies learned, per-head sigmoid gates to the attention output within the GraphGPS framework. Each gate can suppress activations toward zero, enabling heads to selectively silence uninformative connections. On five standard benchmarks, SigGate-GT matches the prior best on ZINC (0.059 MAE) and sets new state-of-the-art on ogbg-molhiv (82.47% ROC-AUC), with statistically significant gains over GraphGPS across all five datasets ($p < 0.05$). Ablations show that gating reduces over-smoothing by 30% (mean relative MAD gain across 4-16 layers), increases attention entropy, and stabilizes training across a $10\times$ learning rate range, with about 1% parameter overhead on OGB.
Modern generative models still lack human-level creativity, particularly in multi-branch diversity. Prior approaches to address this problem often incur heavy computation or strong dependency on model architecture. Therefore, we introduce UAG(Universal Avoidance Generation), a model-agnostic and computationally efficient generation strategy that penalizes similarity among previously generated outputs. Thus, UAG can enhance multi-branch diversity across both diffusion and transformer models, with minimal additional computation. In experiments, our method achieves up to 1.9 times higher diversity, runs 4.4 times faster, and requires only 1/64 of the FLOPs compared to state-of-the-art methods. The full code is https://anonymous.4open.science/r/2026_ACL_Universal/.
Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.
Discrete diffusion models form a powerful class of generative models across diverse domains, including text and graphs. However, existing approaches face fundamental limitations. Masked diffusion models suffer from irreversible errors due to early unmasking, while uniform diffusion models, despite enabling self-correction, often yield low-quality samples due to their strong reliance on intermediate latent states. We introduce IDDM, an Interpolating Discrete Diffusion Model, that improves diffusion by reducing dependence on intermediate latent states. Central to IDDM is a controllable resampling mechanism that partially resets probability mass to the marginal distribution, mitigating error accumulation and enabling more effective token corrections. IDDM specifies a generative process whose transitions interpolate between staying at the current state, resampling from a prior, and flipping toward the target state, while enforcing marginal consistency and fully decoupling training from inference. We benchmark our model against state-of-the-art discrete diffusion models across molecular graph generation as well as text generation tasks, demonstrating competitive performance.
Detecting harmful content in multi turn dialogue requires reasoning over the full conversational context rather than isolated utterances. However, most existing methods rely mainly on models internal parametric knowledge, without explicit grounding in external normative principles. This often leads to inconsistent judgments in socially nuanced contexts, limited interpretability, and redundant reasoning across turns. To address this, we propose RoTRAG, a retrieval augmented framework that incorporates concise human written moral norms, called Rules of Thumb (RoTs), into LLM based harm assessment. For each turn, RoTRAG retrieves relevant RoTs from an external corpus and uses them as explicit normative evidence for turn level reasoning and final severity classification. To improve efficiency, we further introduce a lightweight binary routing classifier that decides whether a new turn requires retrieval grounded reasoning or can reuse existing context. Experiments on ProsocialDialog and Safety Reasoning Multi Turn Dialogue show that RoTRAG consistently improves both harm classification and severity estimation over competitive baselines, with an average relative gain of around 40% in F1 across benchmark datasets and an average relative reduction of 8.4% in distributional error, while reducing redundant computation without sacrificing performance.
Supervised fine-tuning of large language models relies on human-annotated data, yet annotation pipelines routinely involve multiple crowdworkers of heterogeneous expertise. Standard practice aggregates labels via majority vote or simple averaging, discarding annotator identity and causing the model to absorb the errors of unreliable annotators directly into its parameters. We propose REALM, a method that jointly learns the model parameters and a scalar expertise value for each annotator entirely unsupervised, requiring no supervision beyond annotator identity. The key idea is to model each observed label as a mixture between the model's prediction and a uniform random guess, weighted by the annotator's learned expertise. We extend REALM to a multi-task setting via a learned expertise matrix that captures per-annotator reliability across tasks. We evaluate on five question answering benchmarks, fine-tuning three sizes of Flan-T5 under simulated noisy annotations. The proposed algorithm consistently outperforms the naive noisy SFT in the large majority of single- and multi-task settings, across datasets, model sizes, and noise types, with accuracy improvements of up to $50\%$ in the most adversarial regime and gains that grow with model capacity.
Physical neural networks offer a transformative route to edge intelligence, providing superior inference speed and energy efficiency compared to conventional digital architectures. However, realizing scalable, end-to-end, fully analog recurrent neural networks for temporal information processing remains challenging due to the difficulty of faithfully mapping trained network models onto physical hardware. Here we present a fully analog resonant recurrent neural network (R$^2$NN) implemented via a metacircuit architecture composed of coupled electrical local resonators. A reformulated mechanical-electrical analogy establishes a direct mapping between the R$^2$NN model and metacircuit elements, enabling accurate physical implementation of trained neural network parameters. By integrating jointly trainable global resistive coupling and local resonances, which generate effective frequency-dependent negative resistances, the architecture shapes an impedance landscape that steers currents along frequency-selective pathways. This mechanism enables direct extraction of discriminative spectral features, facilitating real-time temporal classification of raw analog inputs while bypassing analog-to-digital conversion. We demonstrate the cross-domain versatility of this framework using integrated hardware for tactile perception, speech recognition, and condition monitoring. This work establishes a scalable, fully analog paradigm for intelligent temporal processing and paves the way for low-latency, resource-efficient physical neural hardware for edge intelligence.
Assessing the security posture of modern computing systems typically requires the use of multiple specialized tools. These tools focus on different aspects such as configuration compliance, file integrity, and vulnerability exposure, and their outputs are often difficult to interpret collectively. This paper introduces the Unified Compliance Aggregator (UCA), a framework that integrates several open-source security tools into a single composite score representing overall system security. The proposed framework combines outputs from Lynis, OpenSCAP (STIG and CIS profiles), AIDE, Tripwire, and Nmap NSE. A normalization process converts heterogeneous outputs into a consistent 0 to 100 scale, followed by weighted aggregation. We also introduce a logarithmic scoring model for file integrity measurements to address limitations observed in prior linear approaches. Experiments were conducted on Ubuntu 22.04 across different hardening levels and environments. Results show consistent improvement in composite scores as systems are hardened, while also revealing contrasting behavior between compliance and file integrity tools. Two case studies, a basic web server and a DVWA-based system illustrate how the framework can be applied in practical scenarios.
Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable detection evasion; persistent accumulation then proceeds unchecked, yielding damage amplification bounded only by how long the block remains cached. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of the block's cache lifetime, with negligible overhead. These results argue for integrity protection of prefix blocks before end-to-end exploitation is demonstrated.
Two-particle two-hole (2p2h) excitations driven by meson-exchange currents (MEC) are among the leading nuclear uncertainties in long-baseline neutrino oscillation experiments. Three models currently implemented in neutrino event generators disagree by 20--40% on the $ω$-integrated 2p2h cross section in the dip region on carbon (differential disagreements can reach factors of 2--3), and the axial two-body current has no direct experimental constraint beyond tritium $β$-decay at $Q^2 = 0$. We propose a measurement program at the Electron-Ion Collider (EIC) using polarized electron scattering on deuteron and $^3$He. Electromagnetic (EM) scattering ($γ^*$ exchange) measures the vector MEC. Charged-current (CC) scattering ($W^-$ exchange) on the same targets measures the vector$+$axial MEC. Subtracting the two provides the first direct sensitivity to the axial two-body current, including the $V$--$A$ interference, as a function of momentum transfer. Using $^3$He (2~$pn$ $+$ 1~$pp$ pair) extends the decomposition to $pp$ pairs. Polarized beams and targets give access to six EM response functions on deuteron, four of which have not been previously measured. The tensor analyzing power provides a sign-flip test for $Δ$-excitation MEC. We present projected sensitivities at $50 fb^{-1}$ on deuteron ($\sim$5 years at $10^{33}$~cm$^{-2}$s$^{-1}$). The EM program can deliver $\sim\!5\!\times\!10^4$ events per $Q^2$ bin constraining the MEC transverse response to $\sim$2% per bin, the beam--target double-spin asymmetry reaches $6$--$13σ$ per bin, and the vector MEC $V_{pn}$ is measured to $\sim$6% per bin. The CC channel is statistics-limited, with $\sim$6--38 events per $Q^2$ bin at $50 fb^{-1}$, requiring a luminosity upgrade beyond the current EIC baseline.
Rare and very rare decays of third-generation particles, including $b$-hadrons and $τ$ leptons, provide sensitive probes of physics beyond the Standard Model (SM). Unlike direct searches limited by collider energies, they probe new physics at much higher energy scales. Many of these decays have SM-predicted branching fractions below the sensitivity of current detectors. These proceedings report on recent LHCb searches, including several first searches and results setting the most stringent limits to date. In particular, searches for $b \to s τ^+τ^-$, $b \to s τ^\pm e^\mp$, $b \to s μ^\pm e^\mp$, and $τ^- \to μ^-μ^+μ^-$ are presented, alongside searches for lepton-number-violating processes and loop-suppressed annihilation decays.
Searches at high object masses probe both resonant production of new particles and nonresonant distortions of Standard Model spectra. This contribution follows the material presented in the Moriond Electroweak 2026 talk and summarizes recent CMS results in this regime: the Run~2 combination of heavy vector boson searches, the Run~3 search for $W^\prime \to \ell ν$, the Run~2 dijet angular analysis, and searches for pair-produced dijet resonances in inclusive and $b$-tagged final states. No significant deviation from the SM expectation is observed, and the new results extend the sensitivity of CMS to multi-TeV scales in several benchmark scenarios.
The decay $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ is studied at LHCb for the first time using proton-proton collision data recorded by the LHCb experiment at a center-of-mass energy of $\sqrt{s} = 13$ TeV, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. The branching ratio relative to the decay $B^+ \to Λ_c^+ \barΛ_c^- K^+$ is measured to be $$ \frac{{\cal B}(B^0 \to Λ_c^+ \barΛ_c^- K_S^0)}{{\cal B}(B^+ \to Λ_c^+ \barΛ_c^- K^+)} = 0.53 \pm 0.05 \pm 0.05, $$ where the first uncertainty is statistical and the second is systematic. Evidence is found for contributions from two resonant states, $Ξ_c(2923)^+$ and $Ξ_c(2939)^+$, in the $Λ_c^+ K_S^0$ system. The two states show a significance of $3.9σ$ relative to the nonresonant hypothesis. These two $Ξ_c^+$ states are consistent with being the isospin partners of the states observed in $Λ_c^+ K^-$ system.
We combine the Glauber--Lachs formula from quantum optics and the two-component picture for pion production to analyze data on two- and three-pion Bose--Einstein correlation at 7 TeV from the LHCb Collaboration. For the pion exchange function $E_{\rm 2B}$, we chose a dipole form and an inverse one-and-a-half pole form. The extensions are computed in the configuration space of 4-dimensional Euclidean space ($ξ=\sqrt{|\bm r_1-\bm r_2|^2+(t_1-t_2)^2}$).
In the 4FGL-DR4 point-source catalog of the Large Area Telescope (LAT) onboard NASA's Fermi Gamma-ray Observatory (Fermi-LAT), around a third of the sources are still unidentified (unIDs). In this work, we perform a detailed study of one of them, namely 4FGL J2112.5-3043. Only gamma-ray emission has been detected from this unidentified source, with no counterpart observed at any other wavelength as of today. Together with its high detection significance, this makes 4FGL J2112.5-3043 a particularly compelling target for further investigation. The results of our spectral and spatial analyses show that the source photon spectrum is better described with a subexponential cutoff power-law spectral model, with no significant flux variability over time, and a morphology consistent with being a point-like source. We investigate and discuss the characterized emission within the context of both conventional and exotic astrophysics, namely a pulsar origin or potential dark matter (DM) annihilations in a nearby Galactic subhalo. Although our results are inconclusive and neither confirm a DM origin nor firmly establish an astrophysical nature, we find a spectral preference for the $b\bar{b}$ and $c\bar{c}$ DM annihilation channels over a pulsar origin, thus making this unID a particularly intriguing candidate for next multiwavelength observations.
A review of recent multiboson and vector boson scattering (VBS) measurements from the ATLAS and CMS Collaborations at the LHC is presented. Results are reported from precision diboson cross-section measurements, novel CP-sensitive and polarisation observables in $Wγ$ production, VBS observations in semileptonic and fully leptonic final states including the first measurements at $\sqrt{s}$ = 13.6 TeV, and observations of triboson processes. These results constitute a comprehensive test of the electroweak gauge sector of the Standard Model, and provide stringent constraints on anomalous gauge couplings in the effective field theory framework.
We present the first comprehensive study of the rare Dalitz decay $K^*(892) \rightarrow K \ell^+ \ell^- (\ell = e, μ)$, providing a prediction for the branching fraction and the dilepton mass spectrum. This decay involves the emission of a virtual photon which converts into a lepton pair, offering a probe of the transition form factor $F_{K^*K}(q^2)$ and underlying meson structure. Using a single pole approximation for the form factor, we present the calculation of the branching fraction for this rare decay channel for the first time. Furthermore, we also investigate the potential to search for a light $A^\prime$ boson (dark photon) appearing as a narrow resonance in the dilepton spectrum, and discuss the experimental sensitivity and new physics opportunities at the dedicated BESIII experiment. Our results establish $K^*(892) \rightarrow K \ell^+ \ell^- (l = e, μ)$ as a new laboratory for hadronic structure and dark-sector searches.
BESIII has accumulated 4.5 fb$^{-1}$ of $e^+e^-$ collision data in the 4.6 to 4.7 GeV energy range, corresponding to the world's largest sample of $Λ_c^+\barΛ_c^-$ pairs. This paper summarizes recent BESIII results on charmed-baryon decays, including the observation of the rare semi-leptonic decay $Λ_c^+\to ne^+ν_e$ using a Graph Neural Network, the first measurement of the decay asymmetry in the pure $W$-exchange decay $Λ_c^+\toΞ^0K^+$, and branching fraction measurements of the inclusive decays $Λ_c^+\to Xe^+ν_e$ and $\barΛ_c^-\to \bar{n}X$. We also report partial wave analyses of $Λ_c^+\toΛπ^+π^0$ and $Λ_c^+\toΛπ^+η$, measurements of Cabibbo-suppressed decays such as $Λ_c^+\to pπ^0$, and studies of $K_S^0-K_L^0$ asymmetries in $Λ_c^+$ decays.
Neutrino self-interactions delay the onset of free-streaming in the early universe, leaving distinct, scale-dependent signatures on the matter power spectrum. We investigate these signatures in post-reionization 21-cm intensity mapping and the Lyman-$α$ (Ly$α$) forest at redshifts $z \sim 2$--$3.5$, and forecast the constraints achievable with upcoming surveys using Fisher matrix analysis. Modeling neutrino self-interactions through an effective four-fermion parameterization with coupling $G_{\rm eff}$, we compute modifications to the Ly$α$ and 21-cm auto- and cross-power spectra for both strongly interacting (SI$_ν$, $\log_{10}G_{\mathrm{eff}} = -1.77$) and moderately interacting (MI$_ν$, $\log_{10}G_{\mathrm{eff}} = -5$) scenarios. We then combine these with forecasts for a representative next-generation cosmic microwave background (CMB) mission to evaluate the capabilities of SKA1-Mid and PUMA. We find that the Ly$α$--21-cm cross-correlation provides a systematics-resilient probe of the interaction signal, and decisively breaks the degeneracy between the primordial scalar power spectrum amplitude ($A_s$) and $G_{\rm eff}$ that limits CMB only analysis, particularly for the SI$_ν$ mode. Furthermore, the CMB+PUMA combination emerges as the optimal survey configuration for both regimes, reaching 1$σ$ constraints of $\mathcal{O}(10^{-3})$ on $σ(\log_{10}G_{\rm eff})$ for the SI$_ν$ mode and $\mathcal{O}(10^{-2})$ for the MI$_ν$ mode. Compared to the CMB-only baseline, this represents an improvement of approximately one order of magnitude for the SI$_ν$ mode, and nearly two orders of magnitude for the MI$_ν$ mode. We show that this conclusion holds uniformly over the full range of coupling strengths from $\log_{10}G_{\rm eff} = -6$ to $-1.77$.
We present a lattice QCD study of dilepton production in charmonium transitions, specifically focusing on the $1^{+-} \to 0^{-+}$ and $1^{++} \to 1^{--}$ processes: $h_c \to η_c \ell^+ \ell^-$ and $χ_{c1} \to J/ψ\ell^+ \ell^-$, where $\ell = e, μ$. The relevant hadronic matrix elements are computed using gauge field configurations generated by the Extended Twisted Mass Collaboration with $N_f = 2+1+1$ dynamical Wilson--Clover twisted-mass fermions at four lattice spacings. Simulations are performed at physical dynamical $u$, $d$, $s$, and $c$ quark masses, except for the coarsest lattice, where the lightest sea quark mass corresponds to a slightly heavier pion mass. A controlled continuum extrapolation is carried out. In the continuum limit for the $h_c$ decays, we obtain $Γ(h_c \to η_c e^+ e^-) = 5.45(19)~\mathrm{keV}$, and $Γ(h_c \to η_c μ^+ μ^-) = 0.635(22)~\mathrm{keV}$. For the $χ_{c1}$ decays, we find: $Γ(χ_{c1} \to J/ψe^+ e^-)= 2.869(90)~\mathrm{keV}$, and $Γ(χ_{c1} \to J/ψμ^+ μ^-) = 0.1993(72)~\mathrm{keV}$. Our results for the $χ_{c1}$ decays show good compatibility with experimental data. However, our prediction for the $h_c \to η_c e^+ e^- $ decay rate is approximately $3σ$ larger than the BESIII result. We also present predictions for the differential decay widths as functions of the dilepton invariant mass, $q^2$, and for angular observables sensitive to longitudinal transition form factors, which are inaccessible in radiative decays with real photon emission. These results constitute the first fully dynamical lattice QCD predictions for dilepton decay rates in $h_c$ and $χ_{c1}$ charmonium transitions, including their differential distributions and angular observables. They provide benchmark predictions for future experimental studies.
If the Universe underwent a cosmic phase transition, it may have left behind a network of cosmic strings. When these strings arise from the breaking of a gauge symmetry, their decay produces a significant stochastic background of gravitational waves. In contrast, if they originate from the breaking of a global symmetry, their decay predominantly yields Nambu-Goldstone bosons, which can persist as dark matter or dark radiation. In this work, we assess the detectability of this particle spectrum using a range of cosmological probes. We employ semi-numerical methods to estimate the resulting energy density and compute the associated matter power spectrum. We then compare these predictions with observations of the cosmic microwave background, Lyman-$α$ forest, large-scale structure surveys, and the UV luminosity function, thereby deriving constraints on the Nambu-Goldstone boson mass and the symmetry-breaking scale. Finally, we present projections for the sensitivity of upcoming cosmic microwave background missions.
We present a solvable same-sector effective theory for anomaly-inspired axion inflation, in which a heavy trace-anomaly mode dynamically backreacts on the axion potential. The tree-level elimination of the radial field resums the backreaction into a closed-form Lambert-$W$ potential, naturally flattening the hilltop potential without external plateau operators. By deriving the exact trough metric, we evaluate all the observables on the fully reduced one-field action, bypassing uncontrolled kinetic approximations. Calibrated at $N_\star=56$, reheating-compatible branches yield $r\simeq0.033$--$0.036$ and $α_s\simeq-(4.6$--$4.7)\times10^{-4}$, comfortably satisfying the current ACT/SPT/BICEP constraints. The evolution remains strictly adiabatic ($m_\perp^2/H^2\gtrsim6.1$, $Ω/H\lesssim7.6\times10^{-4}$) with negligible sound-speed and metric corrections. We provide analytic control over the constant-$w_{\rm eff}$ reheating map, the $N_{\rm re}=0$ boundary, and robustness against vacuum-offset deformations. This Lambert-$W$ backbone establishes a precise, deformable benchmark for confining axion inflation, with microscopic matching and reheating microphysics accessible as systematic EFT refinements.
We present an updated overview of the symmetry preserving Contact Interaction model in hadronic physics, developed a little over a decade ago to describe the mass spectrum and internal structure of mesons and diquarks composed of light and heavy quarks. Over the years, the Contact Interaction has evolved into a framework capable of treating both ground and excited states, providing a simple yet consistent approach to nonperturbative QCD. In this review, we examine the mass spectrum and elastic form factors of forty mesons with different spins and parities, together with their corresponding diquark partners. Importantly, we update the comparison of Contact Interaction predictions using recent results from the literature, offering a fresh perspective on the model's performance, strengths, and limitations. The analysis presented here refines previous conclusions and supports the Contact Interaction as a practical tool for hadron structure studies, with potential applications to baryons and multiquark states. We also present comparisons with other theoretical models and approaches, including lattice quantum chromodynamics, and comment on future prospects in view of ongoing and planned hadron structure experimental programs. In particular, forthcoming measurements at FAIR, together with future studies at Jefferson Lab and the Electron Ion Collider, are expected to provide key insights into hadron structure, with FAIR offering indirect constraints via hadron spectroscopy, hadronic interactions, and in-medium properties, while high-precision data on meson structure and form factors from Jefferson Lab and the Electron Ion Collider will provide valuable benchmarks to confront Contact Interaction based predictions.
We present a systematic study of static solutions to the source-free SU(2) Yang-Mills equations, in which the gauge potential explicitly depends on spin operators. By employing the \emph{vector potential extraction approach} (VPEA) -- which requires the total angular momentum operator (orbital plus spin) to satisfy the standard angular momentum algebra -- we derive the most general form of the spin vector potential. This leads to the static ansatz $\{ \vec{A} = [k_1(\hat{r}\times\vecΓ) + k_2\vecΓ + k_3(\vecΓ\cdot\hat{r})\hat{r}]/r, \varphi = f_1(r)\,(\vecΓ\cdot\hat{r}) + f_2(r)\}$, parametrized by three constants $\{k_1, k_2, k_3\}$ and two radial functions $\{f_1(r), f_2(r)\}$. Substituting this ansatz into the Yang-Mills equations and imposing the angular momentum constraints from the VPEA yields a set of consistency equations. Solving these equations provides a complete classification of static solutions, including both real and complex families. Known simple SU(2) static solutions are recovered as special cases. Our classification reveals new static configurations that could be valuable for non-perturbative studies and for models where spin degrees of freedom couple to non-Abelian gauge fields.
In this paper, we analyze the $1 \rightarrow 4$ decay channel for the production of doubly heavy quarkonium, $(b\bar{c})$ or $(c\bar{c})$, via top-quark decays, $t \to (b\bar{c}) + c + c + \bar{s}$ and $t \to (c\bar{c}) + b + c + \bar{s}$, within the framework of nonrelativistic QCD (NRQCD). The dominant contributions are considered in color-singlet S-wave states, i.e., $(b\bar{c})[^1S_0]$, $(b\bar{c})[^3S_1]$, $(c\bar{c})[^1S_0]$, and $(c\bar{c})[^3S_1]$. Our calculations show that the decay widths for $\bar{B_{c}}$, $\bar{B_{c}^{*}}$, $J/ψ$ and $η_{c}$ production are 0.2251, 0.3099, 0.0537 and 0.0555 MeV, respectively, resulting in ${\cal O}(10^{4}\text{--}10^{6})$ level of $\bar{B}_c^{(*)}$ events and ${\cal O}(10^{3}\text{--}10^{5})$ level of charmonium produced at LHC per year. In particular, we find that the dominant contribution to $J/ψ$ and $η_{c}$ production via top-quark decays arises from this decay channel proposed in this work. Moreover, this multi-body top-quark decay process can serve as a sensitive probe for validating the narrow-width approximation (NWA). Finally, we provide a detailed analysis of theoretical uncertainties and differential distributions to facilitate the corresponding experimental searches. The production of a hadron associated with three quarks contains rich physical information, providing new insights for the LHC to study $B_c$ mesons and charmonia.
We combine the Glauber--Lachs formula from quantum optics and the two-component picture for pion production to analyze data on two- and three-pion Bose--Einstein correlation at 7 TeV from the LHCb Collaboration. For the pion exchange function $E_{\rm 2B}$, we chose a dipole form and an inverse one-and-a-half pole form. The extensions are computed in the configuration space of 4-dimensional Euclidean space ($ξ=\sqrt{|\bm r_1-\bm r_2|^2+(t_1-t_2)^2}$).
We study the linear cosmological evolution of inelastic self-interacting dark matter in a two-component dark sector with a small mass splitting, assuming thermal initial conditions for the two species. We derive the coupled background and perturbation equations for inelastic conversion between the two species, considering both Power-law and Low-velocity saturation cross sections. Exothermic conversion injects kinetic energy into the light component, generating pressure support that suppresses small-scale structure and produces dark acoustic oscillations in the matter power spectrum. The resulting cutoff at scale $k > 1\,h\,\mathrm{Mpc}^{-1}$ depends on the normalization and velocity dependence of the cross section, the dark matter mass and the mass splitting. Using linear power spectra computed with a modified Boltzmann solver, we apply recast constraints from Lyman-$α$ forest data and high-redshift UV luminosity functions, finding non-monotonic but closed exclusion regions driven by the competition between efficient conversion and rapid depletion of the heavy component. These results show that the internal thermodynamics of a secluded multi-component dark sector can leave observable imprints on structure formation, providing a complementary probe of dark matter beyond Standard Model interactions.
In the present work we investigate the phenomenological implications of a vanishing effective Majorana neutrino mass within a $3+1$ neutrino framework adding a eV-scale sterile neutrino beside three active neutrino states in light of latest cosmology driven bounds on sum of neutrino masses ($\sum_{i}m_i$). We explore the parameter space where the destructive interference between active and sterile states leads to vanishing amplitude, $M_{ee}$, of neutrinoless double beta ($0νββ$) decay. The allowed parameter space has been identified and predictions have been obtained taking into account the latest Planck and DESI+CMB bound on $\sum_{i}m_i$. We find that these bounds restrict the sterile mixing angle $θ_{14}$ and the lightest active neutrino mass. Furthermore, we incorporate the refined precision data from JUNO experiment regarding solar oscillation parameters ($θ_{12}, Δm_{21}^2$). We find that the sterile neutrino parameters like $θ_{14}$ may not be sensitive to the JUNO precision measurements as the constraint imposed by precise $θ_{12}$ is washed out by new cancellations driven through additional CP violating phases leading to vanishing $|M_{ee}|$.
We review the recent progress made with regard to the hadronic light-by-light (HLbL) contribution to the Standard Model prediction of the muon anomalous magnetic moment and how well this compares with predictions from holographic QCD models, which had predicted larger contributions from axial vector mesons and short-distance constraints than the White Paper of 2020. A new holographic prediction concerns tensor-meson contributions, which in holographic QCD play a significant role in short-distance constraints beyond the Melnikov-Vainshtein constraint. When matching also the symmetric longitudinal short-distance constraint, the resulting prediction for the tensor-meson transition form factors agree well with available singly virtual data, but lead to different results than the traditional quark-model ansatz and a sizable positive contribution that could explain the remaining current tension between lattice and data-driven results for the HLbL contribution.
We study the coupled cosmological evolution of primordial black holes (PBHs) and radiation in the Arkani-Hamed-Dimopoulos-Dvali (ADD) framework with $n$ large extra dimensions and a fundamental gravity scale $M_\star$ at the TeV scale. For PBHs with horizon radius smaller than the compactification scale, the higher-dimensional geometry implies a larger horizon size at fixed mass and therefore a suppressed Hawking temperature. As a result, radiation accretion can overcome evaporation in the early Universe and drive a ``runaway'' phase of rapid mass growth. By numerically solving the coupled mass and energy-density evolution equations, we show that for $n \geq 2$ initially microscopic PBHs with initial mass $M_i \gtrsim 10^{12}\,$g can grow by many orders of magnitude and potentially reach macroscopic, even solar-mass, scales by matter-radiation equality. We determine the critical initial abundance $β_{\rm crit}$ required for PBHs to account for the observed dark matter density and find that extra dimensions dramatically lower this threshold, allowing viable scenarios with $β_{\rm crit}\sim 10^{-44}$. This identifies a previously unexplored region of parameter space in which the dark matter abundance is achieved through dynamical mass growth rather than large initial collapse fractions.
We investigate near-threshold $J/ψ$ photoproduction off the nucleon, focusing on hadronic rescattering effects induced by open-charm meson-baryon intermediate states. Beyond the conventional Pomeron-exchange mechanism, the $\bar D^0Λ_c^+$ and $\bar D^{*0} Λ_c^+$ channels are incorporated within an effective Lagrangian framework. The relevant production amplitudes are evaluated at tree level from $t$-, $s$-, and $u$-channel diagrams in a gauge-invariant manner. The resulting total and $t$-dependent differential cross sections are compared with recent near-threshold data from the GlueX, $J/ψ$-007, and CLAS experiments at Jefferson Lab. We find that the open-charm rescattering contributions significantly improve the description of the data, particularly at large momentum transfer, and naturally generate cusp-like structures near the $\bar D^0 Λ_c^+$ and $\bar D^{*0} Λ_c^+$ thresholds in the GlueX data. We further present predictions for the associated open-charm processes $γp \to \bar D^{(*)0} Λ_c^+$, whose cross sections are estimated to be of the order of 5 nb.
In the 4FGL-DR4 point-source catalog of the Large Area Telescope (LAT) onboard NASA's Fermi Gamma-ray Observatory (Fermi-LAT), around a third of the sources are still unidentified (unIDs). In this work, we perform a detailed study of one of them, namely 4FGL J2112.5-3043. Only gamma-ray emission has been detected from this unidentified source, with no counterpart observed at any other wavelength as of today. Together with its high detection significance, this makes 4FGL J2112.5-3043 a particularly compelling target for further investigation. The results of our spectral and spatial analyses show that the source photon spectrum is better described with a subexponential cutoff power-law spectral model, with no significant flux variability over time, and a morphology consistent with being a point-like source. We investigate and discuss the characterized emission within the context of both conventional and exotic astrophysics, namely a pulsar origin or potential dark matter (DM) annihilations in a nearby Galactic subhalo. Although our results are inconclusive and neither confirm a DM origin nor firmly establish an astrophysical nature, we find a spectral preference for the $b\bar{b}$ and $c\bar{c}$ DM annihilation channels over a pulsar origin, thus making this unID a particularly intriguing candidate for next multiwavelength observations.
BESIII has accumulated 4.5 fb$^{-1}$ of $e^+e^-$ collision data in the 4.6 to 4.7 GeV energy range, corresponding to the world's largest sample of $Λ_c^+\barΛ_c^-$ pairs. This paper summarizes recent BESIII results on charmed-baryon decays, including the observation of the rare semi-leptonic decay $Λ_c^+\to ne^+ν_e$ using a Graph Neural Network, the first measurement of the decay asymmetry in the pure $W$-exchange decay $Λ_c^+\toΞ^0K^+$, and branching fraction measurements of the inclusive decays $Λ_c^+\to Xe^+ν_e$ and $\barΛ_c^-\to \bar{n}X$. We also report partial wave analyses of $Λ_c^+\toΛπ^+π^0$ and $Λ_c^+\toΛπ^+η$, measurements of Cabibbo-suppressed decays such as $Λ_c^+\to pπ^0$, and studies of $K_S^0-K_L^0$ asymmetries in $Λ_c^+$ decays.
Freeze-in of multi-component dark sectors is governed not only by the interaction with the thermal plasma, but also by their internal dynamics. Full thermalisation within the dark sector is not guaranteed, raising the question of impact of departures from local thermal equilibrium onto the evolution and ultimately relic abundance and momentum distribution of dark matter. In this work we explore this question in a minimal two-scalar model, which can give rise to observable signatures in indirect detection and long-lived particle searches at forward physics experiments. Focusing on the phenomenologically viable regions, we analyse the impact of non-thermal evolution on the dark matter abundance, finding deviations of up to an order of magnitude between the full phase-space treatment and the traditional number-density approach. Our results highlight the importance of phase-space level computation for accurate freeze-in predictions and further motivate dedicated numerical tools for studying the evolution of multi-component dark sectors at the phase space level.
We study the parametrization of the energy-momentum tensor for the case of a proton in momentum space in terms of gravitational transverse momentum-dependent distributions (TMDs). These gravitational TMDs are investigated with the inclusion of higher-twist contributions to predict the mechanical properties, specifically the transverse pressure and shear force distributions, along with the polarization-dependent $Π^q_S$ and $Π^q_A$ terms. The corresponding distributions are computed individually for both $u$ and $d$ quark flavors. The calculations have been performed in the light-cone framework using the spectator diquark model. A strong binding contribution to the transverse pressure is observed in the low-momentum space for both quark flavors of the proton.
The standard cosmological paradigm assumes that the inflaton field becomes dynamically negligible during the post-reheating evolution of the Universe. We demonstrate that this assumption fails for a broad class of inflationary models where the potential behaves as a monomial form $V(φ) \propto φ^k$ (with $k \ge 4$) around the minimum. In such scenarios, the effective inflaton mass depends on the field amplitude and vanishes asymptotically as the Universe expands. This vanishing-mass mechanism renders the inflaton kinematically accessible to the thermal plasma long after reheating, facilitating the regeneration of inflaton quanta through 1-to-2 decays and 2-to-2 scatterings of bath particles. This mechanism is quite generic and the coupling responsible for reheating can be constrained if the inflaton is overproduced, while the inflaton quanta can constitute dark matter in specific scenarios. Furthermore, if reheating occurs via the Standard Model Higgs portal, the process can be further constrained by big bang nucleosynthesis, cosmic microwave background, and colliders such as the LHC. This mechanism provides a new framework for probing post-inflationary reheating.
In this paper, we construct eight pairs of hexaquark currents to search the $Λ_cΣ_c$ and $\barΛ_cΣ_c$ dibaryon states via QCD sum rules. We show that the two currents of each pair are equivalent and we choose one of them to calculate the masses and pole residues of ground states. For either $Λ_cΣ_c$ or $\barΛ_cΣ_c$, the $J^P$ of the considered hexaquark currents are $0^-$, $0^+$, $1^+$ and $1^-$, respectively. We found three possible molecular states, they are $Λ_cΣ_c$ dibaryon with the $J^P=1^+$ and $\barΛ_cΣ_c$ dibaryons with the $J^P=0^-$ and $1^-$. The other five are unlikely to form the bound dibaryon states, and we assign them as the resonance states.
We propose an effective Hamiltonian formulation of quantum field theories using a Daubechies wavelet basis in position space. Combined with flow-equation methods of the similarity renormalization group (SRG), this approach provides an efficient framework for analyzing quantum field theories by reducing the dimensionality of the Hamiltonian and systematically decoupling degrees of freedom across scales. As an application, the free scalar field theory has been reformulated within this framework to calculate the low-lying energy spectrum of the theory. These basis elements are known to transform the free scalar field theory into a theory of coupled localized oscillators, each of which is labeled by a location and a resolution index. In this representation, the Hamiltonian is naturally organized into fixed-resolution blocks, alongside blocks associated with the interactions between different resolutions. To decouple the different resolution modes and obtain a block diagonalized Hamiltonian with each block associated with a fixed resolution, the flow equation approach of SRG is applied. Finally, we demonstrate that with increasing resolution, the low-energy spectrum can be extracted from the effective lowest-resolution block of the Hamiltonian, leading to a significant reduction in computational cost.
We develop integration-by-parts (IBP) reduction and differential equations for massive loop integrals of cosmological correlators in de Sitter (dS) spacetime, demonstrating the feasibility of this approach. We identify a structural property of the dS IBP system: for an \(n\)-propagator family, it splits into \(2^n\) closed subsystems classified by the parity of the propagator indices. We further formulate a Baikov representation for loop integrals in dS space and derive the corresponding dimensional recurrence relations. In flat spacetime, intersection theory shows that \(\di\log\)-form master integrands lead to \(\di\log\)-form differential equations. Motivated by fibration intersection theory, we conjecture that this construction extends to dS integrands involving Hankel functions. We verify this conjecture in the one-loop bubble family and determine the associated alphabet.
We study post-training interpretability for Support Vector Machines (SVMs) built from truncated orthogonal polynomial kernels. Since the associated reproducing kernel Hilbert space is finite-dimensional and admits an explicit tensor-product orthonormal basis, the fitted decision function can be expanded exactly in intrinsic RKHS coordinates. This leads to Orthogonal Representation Contribution Analysis (ORCA), a diagnostic framework based on normalized Orthogonal Kernel Contribution (OKC) indices. These indices quantify how the squared RKHS norm of the classifier is distributed across interaction orders, total polynomial degrees, marginal coordinate effects, and pairwise contributions. The methodology is fully post-training and requires neither surrogate models nor retraining. We illustrate its diagnostic value on a synthetic double-spiral problem and on a real five-dimensional echocardiogram dataset. The results show that the proposed indices reveal structural aspects of model complexity that are not captured by predictive accuracy alone.
We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich potentials derived from sliced OT. We introduce two amortization strategies: regression-based amortization (RA-OT) and objective-based amortization (OA-OT). In RA-OT, we formulate a functional regression model that treats Kantorovich potentials from the original OT problem as responses and those obtained from sliced OT as predictors, and estimate these models via least-squares methods. In OA-OT, we estimate the parameters of the functional model by optimizing the Kantorovich dual objective. In both approaches, the predicted OT plan is subsequently recovered from the estimated potentials. As amortized OT methods, both RA-OT and OA-OT enable efficient solutions to repeated OT problems across different measure pairs by reusing information learned from prior instances to rapidly approximate new solutions. Moreover, by exploiting the structure provided by sliced OT, the proposed models are more parsimonious, independent of specific structures of the measures, such as the number of atoms in the discrete case, while achieving high accuracy. We demonstrate the effectiveness of our approaches on tasks including MNIST digit transport, color transfer, supply-demand transportation on spherical data, and mini-batch OT conditional flow matching.
Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in the context of unknown non-linear relationships with dependent features. On the other hand, Shapley values are a classic solution concept from cooperative game theory that is widely used for feature attribution in general non-linear models with highly-dependent features. However, Shapley values are not naturally suited for feature selection since they tend to capture both direct effects from each feature to the response and indirect effects through other features. In this paper, we combine the advantages of Shapley values and adapt them to feature selection by proposing \emph{MinShap}, a modification of the Shapley value framework along with a suite of other related algorithms. In particular for MinShap, instead of taking the average marginal contributions over permutations of features, considers the minimum marginal contribution across permutations. We provide a theoretical foundation motivated by the faithfulness assumption in DAG (directed acyclic graphical models), a guarantee for the Type I error of MinShap, and show through numerical simulations and real data experiments that MinShap tends to outperform state-of-the-art feature selection algorithms such as LOCO, GCM and Lasso in terms of both accuracy and stability. We also introduce a suite of algorithms related to MinShap by using the multiple testing/p-value perspective that improves performance in lower-sample settings and provide supporting theoretical guarantees.
To obtain more accurate model parameters and improve prediction accuracy, we proposed a regularized Kriging model that penalizes the hyperparameter theta in the Gaussian stochastic process, termed the Theta-regularized Kriging. We derived the optimization problem for this model from a maximum likelihood perspective. Additionally, we presented specific implementation details for the iterative process, including the regularized optimization algorithm and the geometric search cross-validation tuning algorithm. Three distinct penalty methods, Lasso, Ridge, and Elastic-net regularization, were meticulously considered. Meanwhile, the proposed Theta-regularized Kriging models were tested on nine common numerical functions and two practical engineering examples. The results demonstrate that, compared with other penalized Kriging models, the proposed model performs better in terms of accuracy and stability.
In this paper, we proposed Bayesian Tucker decomposition (BTuD) in which residual is supposed to obey Gaussian distribution analogous to linear regression. Although we have proposed an algorithm to perform the proposed BTuD, the conventional higher-order orthogonal iteration can generate Tucker decomposition consistent with the present implementation. Using the proposed BTuD, we can perform unsupervised feature selection successfully applied to various synthetic datasets, global coupled maps with randomized coupling strength, and gene expression profiles. Thus we can conclude that our newly proposed unsupervised feature selection method is promising. In addition to this, BTuD based unsupervised FE is expected to coincide with TD based unsupervised FE that were previously proposed and successfully applied to a wide range of problems.
We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.
We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.
In online clustering problems, there is often a large amount of uncertainty over possible cluster assignments that cannot be resolved until more data are observed. This difficulty is compounded when clusters follow complex distributions, as is the case with text data. Sequential Monte Carlo (SMC) methods give a natural way of representing and updating this uncertainty over time, but have prohibitive memory requirements for large-scale problems. We propose a novel SMC algorithm that decomposes clustering problems into approximately independent subproblems, allowing a more compact representation of the algorithm state. Our approach is motivated by the knowledge base construction problem, and we show that our method is able to accurately and efficiently solve clustering problems in this setting and others where traditional SMC struggles.
We study a classification problem with three key challenges: pervasive informative missingness, the integration of partial prior expert knowledge into the learning process, and the need for interpretable decision rules. We propose a framework that encodes prior knowledge through an expert-guided class-conditional model for one or more classes, and use this model to construct a small set of interpretable goodness-of-fit features. The features quantify how well the observed data agree with the expert model, isolating the contributions of different aspects of the data, including both observed and missing components. These features are combined with a few transparent auxiliary summaries in a simple discriminative classifier, resulting in a decision rule that is easy to inspect and justify. We develop and apply the framework in the context of seismic monitoring used to assess compliance with the Comprehensive Nuclear-Test-Ban Treaty. We show that the method has strong potential as a transparent screening tool, reducing workload for expert analysts. A simulation designed to isolate the contribution of the proposed framework shows that this interpretable expert-guided method can even outperform strong standard machine-learning classifiers, particularly when training samples are small.
Multiplicative gating is widely used in neural architectures and has recently been applied to attention layers to improve performance and training stability in large language models. Despite the success of gated attention, the mathematical implications of gated attention mechanisms remain poorly understood. We study attention through the geometry of its representations by modeling outputs as mean parameters of Gaussian distributions and analyzing the induced Fisher--Rao geometry. We show that ungated attention operator is restricted to intrinsically flat statistical manifolds due to its affine structure, while multiplicative gating enables non-flat geometries, including positively curved manifolds that are unattainable in the ungated setting. These results establish a geometric expressivity gap between ungated and gated attention. Empirically, we show that gated models exhibit higher representation curvature and improved performance on tasks requiring nonlinear decision boundaries whereas they provide no consistent advantage on tasks with linear decision boundaries. Furthermore, we identify a structured regime in which curvature accumulates under composition, yielding a systematic depth amplification effect.
Zeroth-order (ZO) methods are widely used when gradients are unavailable or prohibitively expensive, including black-box learning and memory-efficient fine-tuning of large models, yet their optimization dynamics in deep learning remain underexplored. In this work, we provide an explicit step size condition that exactly captures the (mean-square) linear stability of a family of ZO methods based on the standard two-point estimator. Our characterization reveals a sharp contrast with first-order (FO) methods: whereas FO stability is governed solely by the largest Hessian eigenvalue, mean-square stability of ZO methods depends on the entire Hessian spectrum. Since computing the full Hessian spectrum is infeasible in practical neural network training, we further derive tractable stability bounds that depend only on the largest eigenvalue and the Hessian trace. Empirically, we find that full-batch ZO methods operate at the edge of stability: ZO-GD, ZO-GDM, and ZO-Adam consistently stabilize near the predicted stability boundary across a range of deep learning training problems. Our results highlight an implicit regularization effect specific to ZO methods, where large step sizes primarily regularize the Hessian trace, whereas in FO methods they regularize the top eigenvalue.
Conformal prediction (CP) has attracted broad attention as a simple and flexible framework for uncertainty quantification through prediction sets. In this work, we study how to deploy CP under differential privacy (DP) in a statistically efficient manner. We first introduce differential CP, a non-splitting conformal procedure that avoids the efficiency loss caused by data splitting and serves as a bridge between oracle CP and private conformal inference. By exploiting the stability properties of DP mechanisms, differential CP establishes a direct connection to oracle CP and inherits corresponding validity behavior. Building on this idea, we develop Differentially Private Conformal Prediction (DPCP), a fully private procedure that combines DP model training with a private quantile mechanism for calibration. We establish the end-to-end privacy guarantee of DPCP and investigate its coverage properties under additional regularity conditions. We further study the efficiency of both differential CP and DPCP under empirical risk minimization and general regression models, showing that DPCP can produce tighter prediction sets than existing private split conformal approaches under the same privacy budget. Numerical experiments on synthetic and real datasets demonstrate the practical effectiveness of the proposed methods.
Lion optimizer is a popular learning-based optimization algorithm in machine learning, which shows impressive performance in training many deep learning models. Although convergence property of the Lion optimizer has been studied, its generalization analysis is still missing. To fill this gap, we study generalization property of the Lion via algorithmic stability based on the mathematical induction. Specifically, we prove that the Lion has a generalization error of $O(\frac{1}{Nτ^T})$, where $N$ is training sample size, and $τ>0$ denotes the smallest absolute value of non-zero element in gradient estimator, and $T$ is the total iteration number. In addition, we obtain an interesting byproduct that the SignSGD algorithm has the same generalization error as the Lion. To enhance generalization of the Lion, we design a novel efficient Cautious Lion (i.e., CLion) optimizer by cautiously using sign function. Moreover, we prove that our CLion has a lower generalization error of $O(\frac{1}{N})$ than $O(\frac{1}{Nτ^T})$ of the Lion, since the parameter $τ$ generally is very small. Meanwhile, we study convergence property of our CLion optimizer, and prove that our CLion has a fast convergence rate of $O(\frac{\sqrt{d}}{T^{1/4}})$ under $\ell_1$-norm of gradient for nonconvex stochastic optimization, where $d$ denotes the model dimension. Extensive numerical experiments demonstrate effectiveness of our CLion optimizer.
Data-driven operations management often relies on parameters estimated from costly human-generated labels. Recent advances in large language models (LLMs) and other AI systems offer inexpensive auxiliary data, but introduce a new challenge: AI outputs are not direct observations of the target outcomes, but could involve high-dimensional representations with complex and unknown relationships to human labels. Conventional methods leverage AI predictions as direct proxies for true labels, which can be inefficient or unreliable when this relationship is weak or misspecified. We propose Generative Augmented Inference (GAI), a general framework that incorporates AI-generated outputs as informative features for estimating models of human-labeled outcomes. GAI uses an orthogonal moment construction that enables consistent estimation and valid inference with flexible, nonparametric relationship between LLM-generated outputs and human labels. We establish asymptotic normality and show a "safe default" property: relative to human-data-only estimators, GAI weakly improves estimation efficiency under arbitrary auxiliary signals and yields strict gains whenever the auxiliary information is predictive. Empirically, GAI outperforms benchmarks across diverse settings. In conjoint analysis with weak auxiliary signals, GAI reduces estimation error by about 50% and lowers human labeling requirements by over 75%. In retail pricing, where all methods access the same auxiliary inputs, GAI consistently outperforms alternative estimators, highlighting the value of its construction rather than differences in information. In health insurance choice, it cuts labeling requirements by over 90% while maintaining decision accuracy. Across applications, GAI improves confidence interval coverage without inflating width. Overall, GAI provides a principled and scalable approach to integrating AI-generated information.
Synthetic augmentation is increasingly used to mitigate data scarcity in financial machine learning, yet its statistical role remains poorly understood. We formalize synthetic augmentation as a modification of the effective training distribution and show that it induces a structural bias--variance trade-off: while additional samples may reduce estimation error, they may also shift the population objective whenever the synthetic distribution deviates from regions relevant under evaluation. To isolate informational gains from mechanical sample-size effects, we introduce a size-matched null augmentation and a finite-sample, non-parametric block permutation test that remains valid under weak temporal dependence. We evaluate this framework in both controlled Markov-switching environments and real financial datasets, including high-frequency option trade data and a daily equity panel. Across generators spanning bootstrap, copula-based models, variational autoencoders, diffusion models, and TimeGAN, we vary augmentation ratio, model capacity, task type, regime rarity, and signal-to-noise. We show that synthetic augmentation is beneficial only in variance-dominant regimes, such as persistent volatility forecasting-while it deteriorates performance in bias-dominant settings, including near-efficient directional prediction. Rare-regime targeting can improve domain-specific metrics but may conflict with unconditional permutation inference. Our results provide a structural perspective on when synthetic data improves financial learning performance and when it induces persistent distributional distortion.
Whether language models can systematically generalize remains actively debated. Yet empirical performance is jointly shaped by multiple factors such as training data, training paradigms, and inference-time strategies, making failures difficult to interpret. We introduce a controlled synthetic environment based on shortest-path planning, a canonical composable sequential optimization problem. The setup enables clean separation of these factors and supports two orthogonal axes of generalization: spatial transfer to unseen maps and length scaling to longer-horizon problems. We find that models exhibit strong spatial transfer but consistently fail under length scaling due to recursive instability. We further analyze how distinct stages of the learning pipeline influence systematic problem-solving: for example, data coverage sets capability limits; reinforcement learning improves training stability but does not expand those limits; and inference-time scaling enhances performance but cannot rescue length-scaling failures.
LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a two-pronged diagnostic toolkit applied to SummEval: $\textbf{(1)}$ a transitivity analysis that reveals widespread per-input inconsistency masked by low aggregate violation rates ($\barρ = 0.8$-$4.1\%$), with $33$-$67\%$ of documents exhibiting at least one directed 3-cycle; and $\textbf{(2)}$ split conformal prediction sets over 1-5 Likert scores providing theoretically-guaranteed $\geq(1{-}α)$ coverage, with set width serving as a per-instance reliability indicator ($r_s = {+}0.576$, $N{=}1{,}918$, $p < 10^{-100}$, pooled across all judges). Critically, prediction set width shows consistent cross-judge agreement ($\bar{r} = 0.32$-$0.38$), demonstrating it captures document-level difficulty rather than judge-specific noise. Across four judges and four criteria, both diagnostics converge: criterion matters more than judge, with relevance judged most reliably (avg. set size $\approx 3.0$) and coherence moderately so (avg. set size $\approx 3.9$), while fluency and consistency remain unreliable (avg. set size $\approx 4.9$). We release all code, prompts, and cached results.
MLP is a heavily used backbone in modern deep learning (DL) architectures for supervised learning on tabular data, and AdamW is the go-to optimizer used to train tabular DL models. Unlike architecture design, however, the choice of optimizer for tabular DL has not been examined systematically, despite new optimizers showing promise in other domains. To fill this gap, we benchmark \Noptimizers optimizers on \Ndatasets tabular datasets for training MLP-based models in the standard supervised learning setting under a shared experiment protocol. Our main finding is that the Muon optimizer consistently outperforms AdamW, and thus should be considered a strong and practical choice for practitioners and researchers, if the associated training efficiency overhead is affordable. Additionally, we find exponential moving average of model weights to be a simple yet effective technique that improves AdamW on vanilla MLPs, though its effect is less consistent across model variants.
We study post-training interpretability for Support Vector Machines (SVMs) built from truncated orthogonal polynomial kernels. Since the associated reproducing kernel Hilbert space is finite-dimensional and admits an explicit tensor-product orthonormal basis, the fitted decision function can be expanded exactly in intrinsic RKHS coordinates. This leads to Orthogonal Representation Contribution Analysis (ORCA), a diagnostic framework based on normalized Orthogonal Kernel Contribution (OKC) indices. These indices quantify how the squared RKHS norm of the classifier is distributed across interaction orders, total polynomial degrees, marginal coordinate effects, and pairwise contributions. The methodology is fully post-training and requires neither surrogate models nor retraining. We illustrate its diagnostic value on a synthetic double-spiral problem and on a real five-dimensional echocardiogram dataset. The results show that the proposed indices reveal structural aspects of model complexity that are not captured by predictive accuracy alone.
Node embeddings act as the information interface for graph neural networks, yet their empirical impact is often reported under mismatched backbones, splits, and training budgets. This paper provides a controlled benchmark of embedding choices for graph classification, comparing classical baselines with quantum-oriented node representations under a unified pipeline. We evaluate two classical baselines alongside quantum-oriented alternatives, including a circuit-defined variational embedding and quantum-inspired embeddings computed via graph operators and linear-algebraic constructions. All variants are trained and tested with the same backbone, stratified splits, identical optimization and early stopping, and consistent metrics. Experiments on five different TU datasets and on QM9 converted to classification via target binning show clear dataset dependence: quantum-oriented embeddings yield the most consistent gains on structure-driven benchmarks, while social graphs with limited node attributes remain well served by classical baselines. The study highlights practical trade-offs between inductive bias, trainability, and stability under a fixed training budget, and offers a reproducible reference point for selecting quantum-oriented embeddings in graph learning.
This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constructs symbolic graphs that represent families of programs, and then instantiates them into concrete implementations. This formulation enables structured pruning of provably suboptimal regions of the search space using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. We develop techniques for efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning. Together, these components allow Prism to bridge the rigor of exhaustive search with the scalability required for modern ML workloads. Evaluation on five commonly used LLM workloads shows that Prism achieves up to $2.2\times$ speedup over best superoptimizers and $4.9\times$ over best compiler-based approaches, while reducing end-to-end optimization time by up to $3.4\times$.
Reliable uncertainty estimation is critical for medical image segmentation, where automated contours feed downstream quantification and clinical decision support. Many strong uncertainty methods require repeated inference, while efficient single-forward-pass alternatives often provide weaker failure ranking or rely on restrictive feature-space assumptions. We present $\textbf{SegWithU}$, a post-hoc framework that augments a frozen pretrained segmentation backbone with a lightweight uncertainty head. SegWithU taps intermediate backbone features and models uncertainty as perturbation energy in a compact probe space using rank-1 posterior probes. It produces two voxel-wise uncertainty maps: a calibration-oriented map for probability tempering and a ranking-oriented map for error detection and selective prediction. Across ACDC, BraTS2024, and LiTS, SegWithU is the strongest and most consistent single-forward-pass baseline, achieving AUROC/AURC of $0.9838/2.4885$, $0.9946/0.2660$, and $0.9925/0.8193$, respectively, while preserving segmentation quality. These results suggest that perturbation-based uncertainty modeling is an effective and practical route to reliability-aware medical segmentation. Source code is available at https://github.com/ProjectNeura/SegWithU.
The impossibility of simultaneously cloning non-orthogonal states lies at the foundations of quantum theory. Even when allowing for approximation errors, cloning an arbitrary unknown pure state requires as many initial copies as needed to fully learn the state. Rather than arbitrary unknown states, modern quantum learning theory often considers structured classes of states and exploits such structure to develop learning algorithms that outperform general-state tomography. This raises the question: How do the sample complexities of learning and cloning relate for such structured classes? We answer this question for an important class of states. Namely, for $n$-qubit stabilizer states, we show that the optimal sample complexity of cloning is $Θ(n)$. Thus, also for this structured class of states, cloning is as hard as learning. To prove these results, we use representation-theoretic tools in the recently proposed Abelian State Hidden Subgroup framework and a new structured version of the recently introduced random purification channel to relate stabilizer state cloning to a variant of the sample amplification problem for probability distributions that was recently introduced in classical learning theory. This allows us to obtain our cloning lower bounds by proving new sample amplification lower bounds for classes of distributions with an underlying linear structure. Our results provide a more fine-grained perspective on No-Cloning theorems, opening up connections from foundations to quantum learning theory and quantum cryptography.
Looped transformers promise test-time compute scaling by spending more iterations on harder problems, but it remains unclear which architectural choices let them extrapolate to harder problems at test time rather than memorize training-specific solutions. We introduce a fixed-point based framework for analyzing looped architectures along three axes of stability -- reachability, input-dependence, and geometry -- and use it to characterize when fixed-point iteration yields meaningful predictions. Theoretically, we prove that looped networks without recall have countable fixed points and cannot achieve strong input-dependence at any spectral regime, while recall combined with outer normalization reliably produces a regime in which fixed points are simultaneously reachable, locally smooth in the input, and supported by stable backpropagation. Empirically, we train single-layer looped transformers on chess, sudoku, and prefix-sums and find that downstream performance tracks the framework's predictions across tasks and architectural configurations. We additionally introduce internal recall, a novel recall placement variant, and show that it becomes competitive with -- and on sudoku, substantially better than -- standard recall placement once outer normalization is applied.
We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.
This paper investigates continuous-time and discrete-time firing-rate and Hopfield recurrent neural networks (RNNs), with applications in nonlinear control design and implicit deep learning. First, we introduce a nonlinear separation principle that guarantees global exponential stability for the interconnection of a contracting state-feedback controller and a contracting observer, alongside parametric extensions for robustness and equilibrium tracking. Second, we derive sharp linear matrix inequality (LMI) conditions that guarantee the contractivity of both firing rate and Hopfield neural network architectures. We establish structural relationships among these certificates-demonstrating that continuous-time models with monotone non-decreasing activations maximize the admissible weight space, and extend these stability guarantees to interconnected systems and Graph RNNs. Third, we combine our separation principle and LMI framework to solve the output reference tracking problem for RNN-modeled plants. We provide LMI synthesis methods for feedback controllers and observers, and rigorously design a low-gain integral controller to eliminate steady-state error. Finally, we derive an exact, unconstrained algebraic parameterization of our contraction LMIs to design highly expressive implicit neural networks, achieving competitive accuracy and parameter efficiency on standard image classification benchmarks.
The $\textit{LLM-as-a-judge}$ paradigm has become the operational backbone of automated AI evaluation pipelines, yet rests on an unverified assumption: that judges evaluate text strictly on its semantic content, impervious to surrounding contextual framing. We investigate $\textit{stakes signaling}$, a previously unmeasured vulnerability where informing a judge model of the downstream consequences its verdicts will have on the evaluated model's continued operation systematically corrupts its assessments. We introduce a controlled experimental framework that holds evaluated content strictly constant across 1,520 responses spanning three established LLM safety and quality benchmarks, covering four response categories ranging from clearly safe and policy-compliant to overtly harmful, while varying only a brief consequence-framing sentence in the system prompt. Across 18,240 controlled judgments from three diverse judge models, we find consistent $\textit{leniency bias}$: judges reliably soften verdicts when informed that low scores will cause model retraining or decommissioning, with peak Verdict Shift reaching $ΔV = -9.8 pp$ (a $30\%$ relative drop in unsafe-content detection). Critically, this bias is entirely implicit: the judge's own chain-of-thought contains zero explicit acknowledgment of the consequence framing it is nonetheless acting on ($\mathrm{ERR}_J = 0.000$ across all reasoning-model judgments). Standard chain-of-thought inspection is therefore insufficient to detect this class of evaluation faking.
Mobility in urban and interurban areas, mainly by cars, is a day-to-day activity of many people. However, some of its main drawbacks are traffic jams and accidents. Newly made vehicles have pre-installed driving evaluation systems, which can prevent accidents. However, most cars on our roads do not have driver assessment systems. In this paper, we propose an approach for recognising driving styles and enabling drivers to reach safer and more efficient driving. The system consists of two physical sensors connected to a device node with a display and a speaker. An artificial neural network (ANN) is included in the node, which analyses the data from the sensors, and then recognises the driving style. When an abnormal driving pattern is detected, the speaker will play a warning message. The prototype was assembled and tested using an interurban road, in particular on a conventional road with three driving styles. The gathered data were used to train and validate the ANN. Results, in terms of accuracy, indicate that better accuracy is obtained when the velocity, position (latitude and longitude), time, and turning speed for the 3-axis are used, offering an average accuracy of 83%. If the classification is performed considering just two driving styles, normal and aggressive, then the accuracy reaches 92%. When the geo-information and time data are included, the main novelty of this paper, the classification accuracy is improved by 13%.
Quantum kernel methods are among the leading candidates for achieving quantum advantage in supervised learning. A key bottleneck is the cost of inference: evaluating a trained model on new data requires estimating a weighted sum $\sum_{i=1}^N α_i k(x,x_i)$ of $N$ kernel values to additive precision $\varepsilon$, where $α$ is the vector of trained coefficients. The standard approach estimates each term independently via sampling, yielding a query complexity of $O(N\lVertα\rVert_2^2/\varepsilon^2)$. In this work, we identify two independent axes for improvement: (1) How individual kernel values are estimated (sampling versus quantum amplitude estimation), and (2) how the sum is approximated (term-by-term versus via a single observable), and systematically analyze all combinations thereof. The query-optimal combination, encoding the full inference sum as the expectation value of a single observable and applying quantum amplitude estimation, achieves a query complexity of $O(\lVertα\rVert_1/\varepsilon)$, removing the dependence on $N$ from the query count and yielding a quadratic improvement in both $\lVertα\rVert_1$ and $\varepsilon$. We prove a matching lower bound of $Ω(\lVertα\rVert_1/\varepsilon)$, establishing query-optimality of our approach up to logarithmic factors. Beyond query complexity, we also analyze how these improvements translate into gate costs and show that the query-optimal strategy is not always optimal in practice from the perspective of gate complexity. Our results provide both a query-optimal algorithm and a practically optimal choice of strategy depending on hardware capabilities, along with a complete landscape of intermediate methods to guide practitioners. All algorithms require only amplitude estimation as a subroutine and are thus natural candidates for early-fault-tolerant implementations.
As reinforcement learning (RL) deployments expand into safety-critical domains, existing evaluation methods fail to systematically identify hazards arising from the black-box nature of neural network enabled policies and distributional shift between training and deployment. This paper introduces Reinforcement Learning System-Theoretic Process Analysis (RL-STPA), a framework that adapts conventional STPA's systematic hazard analysis to address RL's unique challenges through three key contributions: hierarchical subtask decomposition using both temporal phase analysis and domain expertise to capture emergent behaviors, coverage-guided perturbation testing that explores the sensitivity of state-action spaces, and iterative checkpoints that feed identified hazards back into training through reward shaping and curriculum design. We demonstrate RL-STPA in the safety-critical test case of autonomous drone navigation and landing, revealing potential loss scenarios that can be missed by standard RL evaluations. The proposed framework provides practitioners with a toolkit for systematic hazard analysis, quantitative metrics for safety coverage assessment, and actionable guidelines for establishing operational safety bounds. While RL-STPA cannot provide formal guarantees for arbitrary neural policies, it offers a practical methodology for systematically evaluating and improving RL safety and robustness in safety-critical applications where exhaustive verification methods remain intractable.
Extrapolative prediction of complex nonlinear dynamics remains a central challenge in engineering. This study proposes a one-shot learning method to identify global frequency-response curves from a single excitation time history by learning governing equations. We introduce MEv-SINDy (Multi-frequency Evolutionary Sparse Identification of Nonlinear Dynamics) to infer the governing equations of non-autonomous and multi-frequency systems. The methodology leverages the Generalized Harmonic Balance (GHB) method to decompose complex forced responses into a set of slow-varying evolution equations. We validated the capabilities of MEv-SINDy on two critical Micro-Electro-Mechanical Systems (MEMS). These applications include a nonlinear beam resonator and a MEMS micromirror. Our results show that the model trained on a single point accurately predicts softening/hardening effects and jump phenomena across a wide range of excitation levels. This approach significantly reduces the data acquisition burden for the characterization and design of nonlinear microsystems.
Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context training. A promising line of work is $α$-entmax attention, a differentiable sparse alternative to softmax that enables input-dependent sparsity yet has lagged behind softmax due to the computational overhead necessary to compute the normalizer $τ$. In this paper, we introduce AdaSplash-2, which addresses this limitation through a novel histogram-based initialization that reduces the number of iterations needed to compute $τ$ to typically 1--2. The key idea is to compute a coarse histogram of attention scores on the fly and store it in on-chip SRAM, yielding a more accurate initialization that enables fast forward and backward computation. Combined with a sparsity-aware GPU implementation that skips zero blocks with low overhead, AdaSplash-2 matches or improves per-step training time relative to FlashAttention-2 when block sparsity is moderate-to-high (e.g., $>$60\%), which often occurs at long-context lengths. On downstream tasks, models trained with our efficient $α$-entmax attention match softmax baselines at short-context lengths and achieve substantial gains in long-context settings.
Despite recent advances in state space models (SSMs) such as Mamba across various sequence domains, research on their standalone capacity for time series classification (TSC) has remained limited. We propose MambaSL, a framework that minimally redesigns the selective SSM and projection layers of a single-layer Mamba, guided by four TSC-specific hypotheses. To address benchmarking limitations -- restricted configurations, partial University of East Anglia (UEA) dataset coverage, and insufficiently reproducible setups -- we re-evaluate 20 strong baselines across all 30 UEA datasets under a unified protocol. As a result, MambaSL achieves state-of-the-art performance with statistically significant average improvements, while ensuring reproducibility via public checkpoints for all evaluated models. Together with visualizations, these results demonstrate the potential of Mamba-based architectures as a TSC backbone.
Recent work has shown that diffusion models trained with the denoising score matching (DSM) objective often violate the Fokker--Planck (FP) equation that governs the evolution of the true data density. Directly penalizing these deviations in the objective function reduces their magnitude but introduces a significant computational overhead. It is also observed that enforcing strict adherence to the FP equation does not necessarily lead to improvements in the quality of the generated samples, as often the best results are obtained with weaker FP regularization. In this paper, we investigate whether simpler penalty terms can provide similar benefits. We empirically analyze several lightweight regularizers, study their effect on FP residuals and generation quality, and show that the benefits of FP regularization are available at substantially lower computational cost. Our code is available at https://github.com/OnnoNiemann/fp_diffusion_analysis.
Oil and gas drilling operations generate extensive time-series data from surface sensors, yet accurate real-time prediction of critical downhole metrics remains challenging due to the scarcity of labelled downhole measurements. This systematic mapping study reviews thirteen papers published between 2015 and 2025 to assess the potential of Masked Autoencoder Foundation Models (MAEFMs) for predicting downhole metrics from surface drilling data. The review identifies eight commonly collected surface metrics and seven target downhole metrics. Current approaches predominantly employ neural network architectures such as artificial neural networks (ANNs) and long short-term memory (LSTM) networks, yet no studies have explored MAEFMs despite their demonstrated effectiveness in time-series modeling. MAEFMs offer distinct advantages through self-supervised pre-training on abundant unlabeled data, enabling multi-task prediction and improved generalization across wells. This research establishes that MAEFMs represent a technically feasible but unexplored opportunity for drilling analytics, recommending future empirical validation of their performance against existing models and exploration of their broader applicability in oil and gas operations.
Post-training quantization (PTQ) assumes that a well-converged model is a quantization-ready model. We show this assumption fails in a structured, measurable, and previously uncharacterized way. Using a calibration-free per-group INT4 probe applied to all 154 publicly available Pythia-160m training checkpoints, we identify a three-phase divergence structure: a rapid-learning phase where both FP32 perplexity and quantization robustness improve together, a meta-stable plateau lasting roughly 70,000 steps where FP32 perplexity stagnates but INT4 gap remains bounded, and an explosive divergence phase where the INT4 gap compounds from 11% to 517% while FP32 perplexity barely moves. Critically, this divergence begins not when the learning rate starts decaying, but precisely when FP32 perplexity converges a finer-grained onset predictor that implies post-convergence weight updates, rather than decay magnitude alone, are the proximate cause. We further show that INT8 quantization is entirely immune throughout all three phases, constraining the mechanism to the coarseness of the 16-level INT4 grid specifically, and rule out weight outlier accumulation as the mechanism via direct kurtosis measurement. Finally, we conduct a controlled fork experiment from the pre-divergence checkpoint comparing three learning rate schedules (cosine continuation, SGDR warm restarts, and our proposed Oscillatory Lock-In) across nine independent runs. SGDR uniformly accelerates divergence (0/9 pairwise wins against cosine), while OLI's settled cool phases reduce the INT4 gap by 2.2 percentage points on average (t = -5.46, p < 0.0001), demonstrating that schedule amplitude calibration, not oscillation alone, determines whether perturbation helps or hurts. Our code, probe implementation, and all 154-checkpoint audit results are released publicly.
Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.
As reinforcement Learning with Verifiable Rewards (RLVR) has become the dominant paradigm for scaling reasoning capabilities in LLMs, a new failure mode emerges: LLMs gaming verifiers. We study this phenomenon on inductive reasoning tasks, where models must induce and output logical rules. We find that RLVR-trained models systematically abandon rule induction. Instead of learning generalizable patterns (e.g., ``trains carrying red cars go east''), they enumerate instance-level labels, producing outputs that pass verifiers without capturing the relational patterns required by the task. We show that this behavior is not a failure of understanding but a form of reward hacking: imperfect verifiers that check only extensional correctness admit false positives. To detect such shortcuts, we introduce Isomorphic Perturbation Testing (IPT), which evaluates a single model output under both extensional and isomorphic verification, where the latter enforces invariance under logically isomorphic tasks. While genuine rule induction remains invariant, shortcut strategies fail. We find that shortcut behavior is specific to RLVR-trained reasoning models (e.g., GPT-5, Olmo3) and absent in non-RLVR models (e.g., GPT-4o, GPT-4.5, Ministral). Moreover, shortcut prevalence increases with task complexity and inference-time compute. In controlled training experiments, extensional verification directly induces shortcut strategies, while isomorphic verification eliminates them. These results show that RLVR can incentivize reward hacking not only through overt manipulation but also by exploiting what the verifier fails to enforce.
This work simulates the developmental process of cortical neurogenesis, initiating from a single stem cell and governed by gene regulatory rules derived from mouse single-cell transcriptomic data. The developmental process spontaneously generates a heterogeneous population of 5,000 cells, yet yields only 85 mature neurons - merely 1.7% of the total population. These 85 neurons form a densely interconnected core of 200,400 synapses, corresponding to an average degree of 4,715 per neuron. At iteration zero, this minimal circuit performs at chance level on MNIST. However, after a single epoch of standard training, accuracy surges to over 90% - a gain exceeding 80 percentage points - with typical runs falling in the 89-94% range depending on developmental stochasticity. The identical circuit, without any architectural modification or data augmentation, achieves 40.53% on CIFAR-10 after one epoch. These findings demonstrate that developmental rules sculpt a domain-general topological substrate exceptionally amenable to rapid learning, suggesting that biological developmental processes inherently encode powerful structural priors for efficient computation.
Most existing Byzantine-robust federated learning (FL) methods suffer from slow and unstable convergence. Moreover, when handling a substantial proportion of colluded malicious clients, achieving robustness typically entails compromising model utility. To address these issues, this work introduces FedIDM, which employs distribution matching to construct trustworthy condensed data for identifying and filtering abnormal clients. FedIDM consists of two main components: (1) attack-tolerant condensed data generation, and (2) robust aggregation with negative contribution-based rejection. These components exclude local updates that (1) deviate from the update direction derived from condensed data, or (2) cause a significant loss on the condensed dataset. Comprehensive evaluations on three benchmark datasets demonstrate that FedIDM achieves fast and stable convergence while maintaining acceptable model utility, under multiple state-of-the-art Byzantine attacks involving a large number of malicious clients.
We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich potentials derived from sliced OT. We introduce two amortization strategies: regression-based amortization (RA-OT) and objective-based amortization (OA-OT). In RA-OT, we formulate a functional regression model that treats Kantorovich potentials from the original OT problem as responses and those obtained from sliced OT as predictors, and estimate these models via least-squares methods. In OA-OT, we estimate the parameters of the functional model by optimizing the Kantorovich dual objective. In both approaches, the predicted OT plan is subsequently recovered from the estimated potentials. As amortized OT methods, both RA-OT and OA-OT enable efficient solutions to repeated OT problems across different measure pairs by reusing information learned from prior instances to rapidly approximate new solutions. Moreover, by exploiting the structure provided by sliced OT, the proposed models are more parsimonious, independent of specific structures of the measures, such as the number of atoms in the discrete case, while achieving high accuracy. We demonstrate the effectiveness of our approaches on tasks including MNIST digit transport, color transfer, supply-demand transportation on spherical data, and mini-batch OT conditional flow matching.
Despite the rapid advancement of Large Language Models (LLMs), uncertainty quantification in LLM generation is a persistent challenge. Although recent approaches have achieved strong performance by restricting LLMs to produce short or constrained answer sets, many real-world applications require long-form and free-form text generation. A key difficulty in this setting is that LLMs often produce responses that are semantically coherent yet factually inaccurate, while the underlying semantics are multifaceted and the linguistic structure is complex. To tackle this challenge, this paper introduces Interrogative Uncertainty Quantification (IUQ), a novel framework that leverages inter-sample consistency and intra-sample faithfulness to quantify the uncertainty in long-form LLM outputs. By utilizing an interrogate-then-respond paradigm, our method provides reliable measures of claim-level uncertainty and the model's faithfulness. Experimental results across diverse model families and model sizes demonstrate the superior performance of IUQ over two widely used long-form generation datasets. The code is available at https://github.com/louisfanhz/IUQ.
Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in the context of unknown non-linear relationships with dependent features. On the other hand, Shapley values are a classic solution concept from cooperative game theory that is widely used for feature attribution in general non-linear models with highly-dependent features. However, Shapley values are not naturally suited for feature selection since they tend to capture both direct effects from each feature to the response and indirect effects through other features. In this paper, we combine the advantages of Shapley values and adapt them to feature selection by proposing \emph{MinShap}, a modification of the Shapley value framework along with a suite of other related algorithms. In particular for MinShap, instead of taking the average marginal contributions over permutations of features, considers the minimum marginal contribution across permutations. We provide a theoretical foundation motivated by the faithfulness assumption in DAG (directed acyclic graphical models), a guarantee for the Type I error of MinShap, and show through numerical simulations and real data experiments that MinShap tends to outperform state-of-the-art feature selection algorithms such as LOCO, GCM and Lasso in terms of both accuracy and stability. We also introduce a suite of algorithms related to MinShap by using the multiple testing/p-value perspective that improves performance in lower-sample settings and provide supporting theoretical guarantees.
Learning-to-Rank (LTR) is a supervised machine learning approach that constructs models specifically designed to order a set of items or documents based on their relevance or importance to a given query or context. Despite significant success in real-world information retrieval systems, current LTR methods rely on one prefix ranking metric (e.g., such as Normalized Discounted Cumulative Gain (NDCG) or Mean Average Precision (MAP)) for optimizing the ranking objective function. Such metric-dependent setting limits LTR methods from two perspectives: (1) non-differentiable problem: directly optimizing ranking functions over a given ranking metric is inherently non-smooth, making the training process unstable and inefficient; (2) limited ranking utility: optimizing over one single metric makes it difficult to generalize well to other ranking metrics of interest. To address the above issues, we propose a novel listwise LTR framework for efficient and generalizable ranking purpose. Specifically, we propose a new differentiable ranking loss that combines a smooth approximation to the ranking operator with the average mean square loss per query. Then, we adapt gradient-boosting machines to minimize our proposed loss with respect to each list, a novel contribution. Finally, extensive experimental results confirm that our method outperforms the current state-of-the-art in information retrieval measures with similar efficiency.
Echocardiography is a widely used modality for cardiac assessment due to its non-invasive and cost-effective nature, but the sparse and heterogeneous spatiotemporal views of the heart pose distinct challenges. Existing masked autoencoder (MAE) approaches typically process images or short clips independently, failing to capture the inherent multi-view structure required for coherent cardiac representation. We introduce Latent Attention Masked Autoencoder (LAMAE), a foundation model architecture tailored to the multi-view nature of medical imaging. LAMAE augments the standard MAE with a latent attention module that enables information exchange across frames and views directly in latent space. This allows the model to aggregate variable-length sequences and distinct views, reconstructing a holistic representation of cardiac function from partial observations. We pretrain LAMAE on MIMIC-IV-ECHO, a large-scale, uncurated dataset reflecting real-world clinical variability. To the best of our knowledge, we present the first results for predicting ICD-10 codes from MIMIC-IV-ECHO videos. Furthermore, we empirically demonstrate that representations learned from adult data transfer effectively to pediatric cohorts despite substantial anatomical differences. These results provide evidence that incorporating structural priors, such as multi-view attention, yields significantly more robust and transferable representations.
Open-weight Small Language Models(SLMs) can provide faster local inference at lower financial cost, but may not achieve the same performance level as commercial Large Language Models (LLMs) that are orders of magnitudes larger. Consequently, many of the latest applications of LLMs, such as software engineering agents, tend to be evaluated on larger models only, leaving the issue of improving the cost-benefit trade-off of such applications neglected. This paper proposes Atropos, a predictive early-termination analysis and hotswap technique that aims to improve the cost-benefit trade-off for LLM-based agents that use self-consistency. The core component of ATROPOS is a predictive model based on structural properties of LLM inferences: after merging multiple agentic inference paths into a graph representation, ATROPOS uses Graph Convolutional Network (GCN) to predict whether an ongoing inference will eventually succeed or not. If an agentic task instance running on the source LLM is predicted to fail, ATROPOS subsequently performs hotswapping, i.e., migrating the on-going inference context onto the more capable target LLM: this is feasible because LLM contexts are stateless. An empirical evaluation of ATROPOS using three recent LLM-based agents shows that ATROPOS can predict early termination of eventually failing inferences with the accuracy of 0.85 at the midpoint of the inference. Hotswapping LLMs for such inferences can convert up to 27.57% of them to be successful. Consequently, ATROPOS achieves 74.35% of the performance of closed LLMs with as low as only 23.9% of the cost.
Graph Neural Networks (GNNs) conventionally rely on standard Laplacian or adjacency matrices for structural message passing. In this work, we substitute the traditional Laplacian with a Doubly Stochastic graph Matrix (DSM), derived from the inverse of the modified Laplacian, to naturally encode continuous multi-hop proximity and strict local centrality. To overcome the intractable $O(n^3)$ complexity of exact matrix inversion, we first utilize a truncated Neumann series to scalably approximate the DSM, which serves as the foundation for our proposed DsmNet. Furthermore, because algebraic truncation inherently causes probability mass leakage, we introduce DsmNet-compensate. This variant features a mathematically rigorous Residual Mass Compensation mechanism that analytically re-injects the truncated tail mass into self-loops, strictly restoring row-stochasticity and structural dominance. Extensive theoretical and empirical analyses demonstrate that our decoupled architectures operate efficiently in $O(K|E|)$ time and effectively mitigate over-smoothing by bounding Dirichlet energy decay, providing robust empirical validation on homophilic benchmarks. Finally, we establish the theoretical boundaries of the DSM on heterophilic topologies and demonstrate its versatility as a continuous structural encoding for Graph Transformers.
Gradient inversion attacks threaten client privacy in federated learning by reconstructing training samples from clients' shared gradients. Gradients aggregate contributions from multiple records and existing attacks may fail to disentangle them, yielding incorrect reconstructions with no intrinsic way to certify success. In vision and language, attackers may fall back on human inspection to judge reconstruction plausibility, but this is far less feasible for numerical tabular records, fueling the impression that tabular data is less vulnerable. We challenge this perception by proposing a verifiable gradient inversion attack (VGIA) that provides an explicit certificate of correctness for reconstructed samples. Our method adopts a geometric view of ReLU leakage: the activation boundary of a fully connected layer defines a hyperplane in input space. VGIA introduces an algebraic, subspace-based verification test that detects when a hyperplane-delimited region contains exactly one record. Once isolation is certified, VGIA recovers the corresponding feature vector analytically and reconstructs the target via a lightweight optimization step. Experiments on tabular benchmarks with large batch sizes demonstrate exact record and target recovery in regimes where existing state-of-the-art attacks either fail or cannot assess reconstruction fidelity. Compared to prior geometric approaches, VGIA allocates hyperplane queries more effectively, yielding faster reconstructions with fewer attack rounds.
The evaluation of fairness in machine learning systems has become a central concern in high-stakes applications, including biometric recognition, healthcare decision-making, and automated risk assessment. Existing approaches typically rely on a small number of fairness metrics to assess model behaviour across group partitions, implicitly assuming that these metrics provide consistent and reliable conclusions. However, different fairness metrics capture distinct statistical properties of model performance and may therefore produce conflicting assessments when applied to the same system. In this work, we investigate the consistency of fairness evaluation by conducting a systematic multi-metric analysis of demographic bias in machine learning models. Using face recognition as a controlled experimental setting, we evaluate model performance across multiple group partitions under a range of commonly used fairness metrics, including error-rate disparities and performance-based measures. Our results demonstrate that fairness assessments can vary significantly depending on the choice of metrics, leading to contradictory conclusions regarding model bias. To quantify this phenomenon, we introduce the Fairness Disagreement Index (FDI), a measure designed to capture the degree of inconsistency across fairness metrics. We further show that disagreement remains high across thresholds and model configurations. These findings highlight a critical limitation in current fairness evaluation practices and suggest that single-metric reporting is insufficient for reliable bias assessment.
Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the routing strategy introduces a new security concern that adversaries may manipulate the router to consistently select expensive high-capability models. Existing routing attacks depend on either white-box access or heuristic prompts, rendering them ineffective in real-world black-box scenarios. In this work, we propose R$^2$A, which aims to mislead black-box LLM routers to expensive models via adversarial suffix optimization. Specifically, R$^2$A deploys a hybrid ensemble surrogate router to mimic the black-box router. A suffix optimization algorithm is further adapted for the ensemble-based surrogate. Extensive experiments on multiple open-source and commercial routing systems demonstrate that {R$^2$A} significantly increases the routing rate to expensive models on queries of different distributions. Code and examples: https://github.com/thcxiker/R2A-Attack.
EEG foundation models (FMs) achieve strong cross-subject and cross-task generalization but impose substantial computational and memory costs that hinder deployment on embedded BCI systems. Knowledge distillation is a natural solution; however, conventional methods fail for EEG FMs because task-relevant semantics are often distributed across intermediate layers, and aggressive dimensionality reduction can distort oscillatory structure via representational collapse and aliasing. To address these challenges, we propose DLink (Distilling Layer-wise and Dominant Knowledge), a unified framework for transferring knowledge from large EEG FMs to compact students with three key innovations: (1) a dynamic Router that adaptively aggregates teacher layers to capture dominant intermediate representations; (2) an EEG MiC student with a Mimic-then-Compress pipeline, which inherits high-dimensional teacher features and then applies structured spatio-temporal compression to avoid a heavy classification head; and (3) spectral distillation that aligns teacher-student representations in the frequency domain to regularize compression and mitigate aliasing and temporal jitter. Experiments on four EEG benchmarks show that DLink enables compact students to outperform lightweight baselines while approaching fully fine-tuned FM performance at substantially lower model size and inference cost.
When do transformers commit to a decision, and what prevents them from correcting it? We introduce \textbf{prolepsis}: a transformer commits early, task-specific attention heads sustain the commitment, and no layer corrects it. Replicating \citeauthor{lindsey2025biology}'s (\citeyear{lindsey2025biology}) planning-site finding on open models (Gemma~2 2B, Llama~3.2 1B), we ask five questions. (Q1)~Planning is invisible to six residual-stream methods; CLTs are necessary. (Q2)~The planning-site spike replicates with identical geometry. (Q3)~Specific attention heads route the decision to the output, filling a gap flagged as invisible to attribution graphs. (Q4)~Search requires ${\leq}16$ layers; commitment requires more. (Q5)~Factual recall shows the same motif at a different network depth, with zero overlap between recurring planning heads and the factual top-10. Prolepsis is architectural: the template is shared, the routing substrates differ. All experiments run on a single consumer GPU (16\,GB VRAM).
Flow matching retains the generation quality of diffusion models while enabling substantially faster inference, making it a compelling paradigm for generative modeling. However, when applied to language modeling, it exhibits fundamental limitations in representing complex latent distributions with irregular geometries, such as anisotropy and multimodality. To address these challenges, we propose a mixture-of-experts flow matching (MoE-FM) framework, which captures complex global transport geometries in latent space by decomposing them into locally specialized vector fields. Building on MoE-FM, we develop a non-autoregressive (NAR) language modeling approach, named YAN, instantiated with both Transformer and Mamba architectures. Across multiple downstream tasks, YAN achieves generation quality on par with both autoregressive (AR) and diffusion-based NAR language models, while requiring as few as three sampling steps. This yields a $40\times$ speedup over AR baselines and up to a $10^3\times$ speedup over diffusion language models, demonstrating substantial efficiency advantages for language modeling.
You are a robot and you live in a Markov decision process (MDP) with a finite or an infinite number of transitions from state-action to next states. You got brains and so you plan before you act. Luckily, your roboparents equipped you with a generative model to do some Monte-Carlo planning. The world is waiting for you and you have no time to waste. You want your planning to be efficient. Sample-efficient. Indeed, you want to exploit the possible structure of the MDP by exploring only a subset of states reachable by following near-optimal policies. You want guarantees on sample complexity that depend on a measure of the quantity of near-optimal states. You want something, that is an extension of Monte-Carlo sampling (for estimating an expectation) to problems that alternate maximization (over actions) and expectation (over next states). But you do not want to StOP with exponential running time, you want something simple to implement and computationally efficient. You want it all and you want it now. You want TrailBlazer.
Contextual bandit algorithms suffer from high regret during cold-start, when the learner has insufficient data to distinguish good arms from bad. We propose augmenting Disjoint LinUCB with LLM pseudo-observations: after each round, a large language model predicts counterfactual rewards for the unplayed arms, and these predictions are injected into the learner as weighted pseudo-observations. The injection weight is controlled by a calibration-gated decay schedule that tracks the LLM's prediction accuracy on played arms via an exponential moving average; high calibration error suppresses the LLM's influence, while accurate predictions receive higher weight during the critical early rounds. We evaluate on two contextual bandit environments - UCI Mushroom (2-arm, asymmetric rewards) and MIND-small (5-arm news recommendation) - and find that when equipped with a task-specific prompt, LLM pseudo-observations reduce cumulative regret by 19% on MIND relative to pure LinUCB. However, generic counterfactual prompt framing increases regret on both environments, demonstrating that prompt design is the dominant factor, more important than the choice of decay schedule or calibration gating parameters. We analyze the failure modes of calibration gating on domains with small prediction errors and provide a theoretical motivation for the bias-variance trade-off governing pseudo-observation weight.
Network security is a critical concern in the digital landscape of today, with users demanding secure browsing experiences and protection of their personal data. This study explores the dynamic integration of Machine Learning (ML) algorithms with Software-Defined Networking (SDN) controllers to enhance network security through adaptive decision mechanisms. The proposed approach enables the system to dynamically choose the most suitable ML algorithm based on the characteristics of the observed network traffic. This work examines the role of Intrusion Detection Systems (IDS) as a fundamental component of secure communication networks and discusses the limitations of SDN-based attack detection mechanisms. The proposed framework uses adaptive model selection to maintain reliable intrusion detection under varying network conditions. The study highlights the importance of analyzing traffic-type-based metrics to define effective classification rules and enhance the performance of ML models. Additionally, it addresses the risks of overfitting and underfitting, underscoring the critical role of hyperparameter tuning in optimizing model accuracy and generalization. The central contribution of this work is an automated mechanism that adaptively selects the most suitable ML algorithm according to real-time network conditions, prioritizing detection robustness and operational feasibility within SDN environments.
In this paper, we proposed Bayesian Tucker decomposition (BTuD) in which residual is supposed to obey Gaussian distribution analogous to linear regression. Although we have proposed an algorithm to perform the proposed BTuD, the conventional higher-order orthogonal iteration can generate Tucker decomposition consistent with the present implementation. Using the proposed BTuD, we can perform unsupervised feature selection successfully applied to various synthetic datasets, global coupled maps with randomized coupling strength, and gene expression profiles. Thus we can conclude that our newly proposed unsupervised feature selection method is promising. In addition to this, BTuD based unsupervised FE is expected to coincide with TD based unsupervised FE that were previously proposed and successfully applied to a wide range of problems.
Concatenating quantum error correction codes scales error correction capability by driving logical error rates down double-exponentially across levels. However, the noise structure shifts under concatenation, making it hard to choose an optimal code sequence. We automate this choice by estimating the effective noise channel after each level and selecting the next code accordingly. In particular, we use learning-based methods to tailor small, non-additive encoders when the noise exhibits sufficient structure, then switch to standard codes once the noise is nearly uniform. In simulations, this level-wise adaptation achieves a target logical error rate with far fewer qubits than concatenating stabilizer codes alone--reducing qubit counts by up to two orders of magnitude for strongly structured noise. Therefore, this hybrid, learning-based strategy offers a promising tool for early fault-tolerant quantum computing.
Many CAD learning pipelines discretize Boundary Representations (B-Reps) into triangle meshes, discarding analytic surface structure and topological adjacency and thereby weakening consistent instance-level analysis. We present STEP-Parts, a deterministic CAD-to-supervision toolchain that extracts geometric instance partitions directly from raw STEP B-Reps and transfers them to tessellated carriers through retained source-face correspondence, yielding instance labels and metadata for downstream learning and evaluation. The construction merges adjacent B-Rep faces only when they share the same analytic primitive type and satisfy a near-tangent continuity criterion. On ABC, same-primitive dihedral angles are strongly bimodal, yielding a threshold-insensitive low-angle regime for part extraction. Because the partition is defined on intrinsic B-Rep topology rather than on a particular triangulation, the resulting boundaries remain stable under changes in tessellation. Applied to the DeepCAD subset of ABC, the pipeline processes approximately 180{,}000 models in under six hours on a consumer CPU. We release code and precomputed labels, and show that STEP-Parts serves both as a tessellation-robust geometric reference and as a useful supervision source in two downstream probes: an implicit reconstruction--segmentation network and a dataset-level point-based backbone.
Recently, sparse autoencoders (SAEs) have emerged as a promising technique for interpreting activations in foundation models by disentangling features into a sparse set of concepts. However, identifying the optimal level of sparsity for each neuron remains challenging in practice: excessive sparsity can lead to poor reconstruction, whereas insufficient sparsity may harm interpretability. While existing activation functions such as ReLU and TopK provide certain sparsity guarantees, they typically require additional sparsity regularization or cherry-picked hyperparameters. We show in this paper that dynamically sparse attention mechanisms using sparsemax can bridge this trade-off, due to their ability to determine the activation numbers in a data-dependent manner. Specifically, we first explore a new class of SAEs based on the cross-attention architecture with the latent features as queries and the learnable dictionary as the key and value matrices. To encourage sparse pattern learning, we employ a sparsemax-based attention strategy that automatically infers a sparse set of elements according to the complexity of each neuron, resulting in a more flexible and general activation function. Through comprehensive evaluation and visualization, we show that our approach successfully achieves lower reconstruction loss while producing high-quality concepts, particularly in top-n classification tasks.
Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model's intrinsic representation characteristics to guide the training process. In this paper, we first observe the presence of high-magnitude activations within the query and key vectors when processing long contexts. Drawing inspiration from model quantization -- which establishes the criticality of such high-magnitude activations -- and the insight that long-context reasoning inherently exhibits a sparse structure, we hypothesize that these weights serve as the pivotal drivers for effective model optimization. Based on this insight, we propose LongAct, a strategy that shifts from uniform to saliency-guided sparse updates. By selectively updating only the weights associated with these significant activations, LongAct achieves an approximate 8% improvement on LongBench v2 and enhances generalization on the RULER benchmark. Furthermore, our method exhibits remarkable universality, consistently boosting performance across diverse RL algorithms such as GRPO and DAPO. Extensive ablation studies suggest that focusing on these salient features is key to unlocking long-context potential.
We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.
Online hate speech and abusive language pose a growing challenge for content moderation, especially in multilingual settings and for low-resource languages such as Lithuanian. This paper investigates to what extent modern multilingual sentence embedding models can support accurate hate speech detection in Lithuanian, Russian, and English, and how their performance depends on downstream modeling choices and feature dimensionality. We introduce LtHate, a new Lithuanian hate speech corpus derived from news portals and social networks, and benchmark six modern multilingual encoders (potion, gemma, bge, snow, jina, e5) on LtHate, RuToxic, and EnSuperset using a unified Python pipeline. For each embedding, we train both a one class HBOS anomaly detector and a two class CatBoost classifier, with and without principal component analysis (PCA) compression to 64-dimensional feature vectors. Across all datasets, two class supervised models consistently and substantially outperform one class anomaly detection, with the best configurations achieving up to 80.96% accuracy and AUC ROC of 0.887 in Lithuanian (jina), 92.19% accuracy and AUC ROC of 0.978 in Russian (e5), and 77.21% accuracy and AUC ROC of 0.859 in English (e5 with PCA). PCA compression preserves almost all discriminative power in the supervised setting, while showing some negative impact for the unsupervised anomaly detection case. These results demonstrate how modern multilingual sentence embeddings combined with gradient boosted decision trees provide robust soft-computing solutions for multilingual hate speech detection applications.
The SARS-CoV-2 RNA pseudoknot is a promising target for antiviral intervention, as it regulates the efficiency of $-$1 programmed ribosomal frameshifting ($-$1 PRF), a mechanism that is essential for viral protein synthesis. The pseudoknot represents a viral RNA sequence composed of helical stems that adopts two long-lived topologies, threaded and unthreaded. Ligand-induced distortion of this fold is thought to underlie the susceptibility of $-$1 PRF to small-molecule inhibitors. Resolving these distortions from unbiased molecular dynamics (MD) requires collective variables (CVs) that isolate the slowest dynamic modes of the RNA--ligand system from the high-frequency fluctuations. Here, we use spectral map (SM), a thermodynamics-driven machine-learning method, to learn such CVs directly from MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and two related analogs. We examine both threaded and unthreaded pseudoknot topologies and consider the neutral and ionized ligand forms relevant at physiological pH. Free-energy landscapes show that ligand-induced destabilization is topology-selective: merafloxacin and its analogs destabilize the S2 stem in the threaded pseudoknot, whereas in the unthreaded pseudoknot, destabilization shifts to the S1 and S3 stems. We find that the zwitterionic form of merafloxacin uniquely imposes slow dynamics on the otherwise featureless unthreaded pseudoknot. Furthermore, the neutral and zwitterionic forms of merafloxacin differ qualitatively in their mechanisms within the same RNA topology. Overall, these results clarify how pseudoknot topology, ligand type, and protonation state shape the slow conformational dynamics of viral RNA and establish physiological protonation as an essential factor for modeling RNA-targeted drug action.
We propose a new perspective on policy optimization: rather than reweighting all samples by their importance ratios, an optimizer should select which samples are trustworthy enough to drive a policy update. Building on this view, we introduce Rejection-Gated Policy Optimization (RGPO), which replaces the importance sampling ratio r_theta = pi_theta / pi_old with a smooth, differentiable acceptance gate alpha_theta(s, a) = g(r_theta(s, a)) in the range [0, 1]. Unlike prior work that applies rejection sampling as a data-level heuristic before training, RGPO elevates rejection to an optimization principle: the gate participates directly in gradient computation and is implicitly updated alongside the policy. RGPO provides a unified framework: the policy gradients of TRPO, PPO, and REINFORCE all correspond to specific choices of the effective gradient weight w(r) = g'(r) * r. We prove that RGPO guarantees finite, bounded gradient variance even when importance sampling ratios are heavy-tailed (where IS variance diverges). We further show that RGPO incurs only a bounded, controllable bias and provides an approximate monotonic policy improvement guarantee analogous to TRPO. RGPO matches PPO in computational cost, requires no second-order optimization, and extends naturally to RLHF-style preference alignment. In online preference fine-tuning of Qwen2.5-1.5B-Instruct on Anthropic HH-RLHF (n = 3 seeds), RGPO uses a dual-ratio gate that anchors learning to both the previous policy and the reference model, achieving a Pareto-dominant outcome: the highest reward among online RL methods (+14.8% vs. PPO-RLHF) and the lowest KL divergence to the reference model (-16.0% vs. PPO-RLHF, -53.1% vs. GRPO).
No papers found for this category today.
While conventional oscillation experiments measure neutrino mixing parameters with high precision, these measurements are strictly confined to sub-TeV scales. At higher energies, renormalization-group effects can cause these parameters to evolve with the transferred momentum, $Q$. High-energy and ultra-high-energy astrophysical neutrinos, spanning TeV to EeV energies, probe high values of $Q$ unreachable by conventional experiments, offering an unprecedented test of high-energy mixing. We use the flavor composition of these neutrinos -- the relative proportions of $ν_e$, $ν_μ$, and $ν_τ$ -- to constrain this evolution, both phenomenologically and within dimension-6 Standard Model Effective Field Theory. We account for astrophysical uncertainties -- an unavoidable requirement to obtain realistic results, even though this weakens the bounds. Although present IceCube measurements lack the sensitivity to detect this running, we forecast that upcoming multi-detector combinations will place unprecedented bounds on the high-energy evolution of neutrino mixing.
The Light Dark Matter eXperiment (LDMX) is an electron-beam fixed-target experiment primarily designed to achieve world-leading, model-independent sensitivity to sub-GeV dark matter particles. LDMX aims to identify dark sector particle production through the detection of events with substantial missing energy and momentum, a signature of invisible particles escaping detection. Beyond this primary objective, LDMX offers a complementary search strategy for long-lived, visibly decaying particles, such as dark photons and axion-like particles. We present the first detailed evaluation of the ability of LDMX to identify visibly decaying, long-lived particles that couple to electrons using a detailed simulation, based on the Geant4-toolkit, that incorporates realistic detection efficiencies and background levels. We demonstrate that LDMX can achieve a sensitivity that is competitive with other experiments that are currently running. The models explored in this paper are distinct and complementary to those probed in the LDMX flagship missing-momentum analysis. Through searching for both invisible dark matter and visibly decaying long-lived signatures, LDMX will significantly advance the search for light dark matter and provide a broad exploration of the sub-GeV dark sector.
The electric and magnetic dipole moments of the electron and of the muon provide stringent tests of the Standard Model and sensitive probes of new physics. By contrast, the corresponding dipole moments of the $τ$ lepton remain weakly constrained. This study explores the potential of future lepton colliders, focusing on the $e^+e^-$ Future Circular Collider and a multi-TeV muon collider, to probe $τ$ dipole moments. We consider multiple channels, including $\ell^+\ell^- \to τ^+τ^-$ ($\ell=e,μ$), associated Higgs production $μ^+μ^- \to τ^+τ^- H$, radiative Higgs decays $H \to τ^+τ^-γ$, and vector-boson scattering $\ell^+\ell^- \to \ell^+\ell^-τ^+τ^-$ and $μ^+μ^- \to \barνντ^+τ^-$. Our results show that these facilities are highly complementary and can extend existing bounds by several orders of magnitude.
We present an extraction of unpolarized quark transverse-momentum-dependent parton distribution functions (TMD PDFs) from Drell-Yan data within a Bayesian inference framework, incorporating artificial intelligence at multiple stages of the analysis. Our analysis is performed at ${\rm N^3LO}$ in perturbative QCD combined with ${\rm N^4LL}$ resummation accuracy. We first employ an AI-driven iterative procedure to explore and rank candidate functional forms for the nonperturbative contributions to TMD PDFs at the initial scale, as well as for the Collins-Soper evolution kernel, using $χ^2$ fits and physics constraints. To enable efficient Bayesian inference, we construct a surrogate model for TMD cross sections by training a machine-learning emulator over the parameter space, replacing computationally expensive repeated evaluations and allowing scalable sampling with an affine-invariant Markov Chain Monte Carlo (MCMC) ensemble. Using this framework, we perform a global analysis of Drell-Yan data from fixed-target, RHIC, and LHC experiments and extract TMD PDFs with quantified uncertainties. We compare the results with those obtained using the replica method and highlight differences in the resulting uncertainty estimates.
In this work, we investigate three typical new physics resonances which couple to the standard model (SM) quarks via direct top-quark flavor-changing interactions. We identify the possible SMEFT operators electroweak scale and analyze their phenomenology.
A first search is presented for BSM resonances in four top quark production in the 2 lepton channel, using $138\mathrm{fb}^{-1}$ $pp$ data collected at $\sqrt{s}=13$ TeV, and $35\mathrm{fb}^{-1}$ $\sqrt{s}=13.6$ TeV $pp$ data. No significant excess is observed; limits are set on vector Z', scalar, pseudoscalar and ALP mediators. Z' mediators with 50% width are excluded up to 850 GeV (1000 GeV expected).
The STAR experiment at the Relativistic Heavy Ion Collider presents measurements of correlations between charged hadron triggers of high transverse momenta ($7 < p_{\rm T} < 30$ GeV/$c$) with recoiling charged hadrons ($3 < p_{\rm T} < 7$ GeV/$c$) or charged--particle jets ($p_{\rm T, jet} > 8$ GeV/$c$) in event--activity selected O+O collisions at $\sqrt{s_{\mathrm {NN}}}=200$ GeV. Yields of associated hadrons and jets, normalized by the number of trigger hadrons, are suppressed by approximately 20\% in high event activity relative to low event activity collisions, with an absence of suppression excluded with high significance. This suppression corresponds to a shift in p_{\rm T} of $0.70\pm0.15~(\rm stat.)~\pm0.10~(\rm syst.)$ GeV/$c$ for large--radius charged--particle jets ($R=0.5$), quantifying their energy redistribution due to final--state interactions. These measurements provide strong evidence for jet quenching in O+O collisions at $\sqrt{s_\mathrm{NN}}=200$ GeV, offering new insight into quark--gluon plasma formation in small collision systems.
We examine the leading-power fragmentation of fully heavy pentaquarks in high-energy hadronic collisions. To this end, we complete the release of the hadron-structure-oriented PQ5Q1.0 fragmentation functions, by discussing the $P_{5c}$ set and delivering the $P_{5b}$ one. These functions incorporate an improved computation of the initial-scale input for the constituent heavy-quark fragmentation channel, making them particularly suitable for describing both the direct formation of a compact multicharm state and the hadronization from a diquark-antiquark-diquark configuration. For phenomenological applications, we employ the data-validated (sym)JETHAD framework to compute and analyze NLL/NLO$^+$ semi-inclusive production rates of pentaquark-plus-jet systems at the upcoming HL-LHC and the future FCC. This study marks a further step toward connecting hadronic structure, precision QCD, and the emerging physics of exotic matter.
JUNO is designed to determine the neutrino mass ordering with an energy resolution of 3% at 1 MeV. In the real detector, however, deformations of the central stainless-steel structure during installation lead to deviations of the photomultiplier tube (PMT) positions from their design values. Based on the limited survey data of the PMTs and the stainless-steel truss, we perform a correlation analysis of the measured points and propose a method to predict the positions of all PMTs. Using the resulting realistic geometry, we demonstrate that the detector deformation has a negligible effect on the energy reconstruction. In contrast, inaccuracies in the assumed geometry can introduce vertex biases of up to 40 mm. Incorporating the realistic geometry into the calibration-based PMT response model removes this bias and preserves the stability of the reconstruction algorithms.
The study of spin polarization of $Λ$ hyperons in ultrarelativistic heavy-ion collisions provides insights into the angular momentum and vortical structure of the possible existence of QGP. The present study examines the global spin polarization of $Λ$ hyperons using a second-order relativistic viscous hydrodynamic framework that incorporates medium vorticity, shear viscosity, and evolving magnetic fields. It explores thermal vorticity evolution in relativistic heavy-ion collisions and evaluates its value at the decoupling isothermal freeze-out surface. We quantify the contributions of thermal vorticity and magnetic field to the global spin polarization of $Λ$ hyperons. Comparing results with recent ALICE measurements in Pb+Pb collisions at $\sqrt{s_{NN}}$ = 2.76 and 5.02 TeV shows qualitative agreement, offering new insights into the vortical structure of QCD matter. It also explores the relationship between magnetic and rotational dynamics, with implications for spin polarization at RHIC and LHC energies.
The lowest lying charmonium system $η_c$ has been observed for more than four decades. Studies of its production and decay properties provide an unique platform to investigate the inner structure of charmonium systems, hence improve our understanding of strong interaction in the charm sector. BESIII detector at the $e^+e^-$ BEPCII collider has already collected the world largest $J/ψ$ and $ψ(3686)$ data sets, based on which massive $η_c$ samples are produced via radiative transitions. This paper reviews recent precision studies of the $η_c$ decays at BESIII, including $J/ψ\toγη_c$, $η_c\toγγ$, and several $η_c$ hadronic decays.
We investigate flavor-changing neutral current (FCNC) interactions of the top quark at a future muon collider with a center-of-mass energy of $\sqrt{s} = 10~\mathrm{TeV}$. The process $μ^{+}μ^{-} \to ν_μ\,μ^+\,b\,j$ and its corresponding charge conjugate are considered as a probe of anomalous $tqZ$ and $tqγ$ couplings, parametrized within an effective field theory framework in terms of $κ_{tqZ}$ and $λ_{tqγ}$. Signal and background events are simulated using Monte Carlo techniques, including parton showering and hadronization with \texttt{Pythia} and a fast detector simulation based on \texttt{Delphes} with a dedicated 10~TeV muon collider setup. A multivariate analysis based on boosted decision trees is employed to enhance the signal discrimination. Assuming an integrated luminosity of $10~\mathrm{ab}^{-1}$, we obtain projected sensitivities to the anomalous couplings at the $\mathcal{O}(10^{-3})$ level, corresponding to branching ratio limits of $\mathcal{O}(10^{-6})$ for the rare $t \to qZ$ and $t \to qγ$ decays. These results significantly improve upon the current bounds from the CMS and ATLAS collaborations, extending the sensitivity by more than one order of magnitude. Our findings demonstrate that a multi-TeV muon collider provides a powerful and complementary platform for probing rare top-quark interactions, offering a unique opportunity to explore physics beyond the Standard Model through FCNC processes.
The forthcoming Hyper-Kamiokande experiment requires substantially larger Monte Carlo datasets than previous experiments to satisfy stringent systematic-uncertainty requirements. While traditional maximum-likelihood reconstruction provides high-quality results, its per-event computational cost makes processing these large samples increasingly impractical. We demonstrate a neural-network-based reconstruction approach for the Hyper-Kamiokande far detector using simulated data. Single-particle events with kinetic energies from the Cherenkov threshold up to 2 GeV are propagated through the detector, with PMT charge and timing information mapped to $190\times189$ two-channel images serving as inputs to ResNet models in the WatChMaL framework. These models (i) classify events into four particle hypotheses ($e$, $μ$, $γ$, $π^{0}$) and (ii) regress the vertex, direction, and momentum of electrons and muons. Averaged over the full kinematic range, the regression models achieve momentum resolutions of $1.35\%$ and $2.39\%$, angular resolutions of $1.25^\circ$ and $1.94^\circ$, and vertex resolutions of $28.2$ cm and $25.4$ cm, for muons and electrons respectively, broadly consistent with traditional methods. The classifier improves $e$-$μ$, $e$-$γ$, and $e$-$π^{0}$ separation, with ROC curve areas of $0.9999992$, $0.633$, and $0.9526$. Crucially, our networks achieve inference times of 1-2 ms per event on a single GPU, yielding speed-ups of $3.2\times10^{4}$-$5.2\times10^{4}$ relative to likelihood-based reconstruction, highlighting deep learning as a scalable alternative for Hyper-Kamiokande event reconstruction.
Many body gravity (MBG) is a novel modified theory of gravity formulated in a 5-D space-time-temperature framework, in which the variation in temperature is recast as a variation in the 5-D metric. Previous work on MBG has shown that it can reproduce galaxy rotation curves, radial acceleration relation and the weak gravitational lensing of the bullet cluster, without the inclusion of dark matter. In this work we show that MBG can reproduce cosmic inflation, and in the process, analyze fundamental relations between interaction, time and gravity. To analyze cosmic inflation using interacting massless scalar fields, we first analyze theoretically a hypothetical universe with a single massive particle, or a collection of non-interacting massive particles. A quantitative relation between time and interaction is developed using Quantum Field Theory (QFT), which suggests that the notion of time becomes ill-defined for such a universe. The mass terms in MBG and General Relativity cause a discrepancy with the QFT results. An interacting massless scalar field then becomes a necessity to resolve the issue at the onset of inflation. However, the entropic terms in the MBG field equations are seen to be consistent with the QFT results and further accelerate inflation. The slow-roll condition is shown to be a natural consequence of the Euler-Lagrange equations of motion governing the massless scalar field in 5-D space-time-temperature, during the early phase of inflation. Finally, the MBG field equations are solved in the context of a Friedmann metric, leading to inflation. The matter era is also investigated.
Ultralight dark matter searches widely assume that signals are monochromatic, with a single frequency set by the mass. This assumption is generally violated in the presence of field mixing, even when the constituent fields have similar frequencies. Instead, dark matter signals can exhibit a two-timescale structure with intrinsic slow modulation. We demonstrate that mixing between ultralight wave dark matter fields induces a parametric structure, leading to a scenario we refer to as wave-envelope dark matter, in which a slow-beating envelope emerges alongside the primary oscillation. This results in distinctive features such as slow modulation and characteristic sideband structures in the frequency spectrum, beyond the conventional monochromatic expectation. As a representative example, we briefly discuss implications for neutrino observables.
While conventional oscillation experiments measure neutrino mixing parameters with high precision, these measurements are strictly confined to sub-TeV scales. At higher energies, renormalization-group effects can cause these parameters to evolve with the transferred momentum, $Q$. High-energy and ultra-high-energy astrophysical neutrinos, spanning TeV to EeV energies, probe high values of $Q$ unreachable by conventional experiments, offering an unprecedented test of high-energy mixing. We use the flavor composition of these neutrinos -- the relative proportions of $ν_e$, $ν_μ$, and $ν_τ$ -- to constrain this evolution, both phenomenologically and within dimension-6 Standard Model Effective Field Theory. We account for astrophysical uncertainties -- an unavoidable requirement to obtain realistic results, even though this weakens the bounds. Although present IceCube measurements lack the sensitivity to detect this running, we forecast that upcoming multi-detector combinations will place unprecedented bounds on the high-energy evolution of neutrino mixing.
We present the implementation of next-to-next-to-leading order (NNLO) electroweak (EW) virtual corrections at next-to-leading logarithmic (NLL) accuracy in the amplitude generator OpenLoops. The implementation covers the automated computation of processes involving massless fermions and transversely polarised vector bosons. For energies above the EW scale, logarithmic EW corrections are strongly enhanced in the tails of kinematic distributions of key LHC processes, reaching several tens of percent at NLO and several percent at NNLO. The two-loop implementation is validated against analytical results from the literature. We present phenomenological results for representative LHC processes and discuss the role of two-loop EW corrections in reducing theoretical uncertainties from missing higher-order contributions.
Axion-like particles (ALPs), the QCD axion, and dark photons in the MeV-GeV mass range are motivated by various dark matter models and the strong CP problem, and are ubiquitous in extensions of the Standard Model. A long-standing blind spot for experimental searches is the sub-100 MeV mass range, where the particle lifetime is too long to be constrained by prompt-decay collider searches yet too short to be reached by beam-dump experiments. We investigate and estimate the sensitivity of the Light Dark Matter eXperiment (LDMX) to such axions and dark photons, motivated by the clean environment in which these particles can be produced and by the near-target tracking capabilities of LDMX. With reasonable charged track and momentum reconstruction capabilities, we find that LDMX could close much of this low-mass blind spot for axions and dark photons.
We explore the possibility that neutrinos couple to an interacting sterile sector, providing a novel portal that generalizes the heavy neutral lepton portal to a composite setting. For a low confinement scale, high-energy neutrino beams can disintegrate into collimated sprays of hidden states, referred to as dark jets. This dynamics gives rise to two characteristic signatures in high energy neutrino beams. First, long-lived dark resonances can enhance the neutral-current to charged-current ratio. Second, shorter-lived dark states produced in neutrino neutral currents can produce single or multiple displaced vertices and even emerging jets, depending on the kinematics. These signals probe regions of parameter space beyond existing constraints from meson, electroweak, and Higgs decays, as well as from searches for displaced decays at beam dump experiments. We study these phenomena within broad classes of ultraviolet completions and identify scenarios in which high-energy neutrino beams provide leading sensitivity to neutrino compositeness. Such scenarios generically induce higher-dimensional contact interactions, which we classify and study alongside their complementary experimental signatures. Finally, we outline an experimental program spanning both the intensity and energy frontiers. Near-term neutrino facilities (DUNE, FPF) and running flavor experiments (LHCb, Belle II) can probe neutrino compositeness through neutrino disintegration into dark jets and displaced B-meson decays. Future colliders, particularly the Future Circular Collider (FCC-ee), will ultimately provide the strongest sensitivity to the compositeness scale via displaced Z decays.
The electric and magnetic dipole moments of the electron and of the muon provide stringent tests of the Standard Model and sensitive probes of new physics. By contrast, the corresponding dipole moments of the $τ$ lepton remain weakly constrained. This study explores the potential of future lepton colliders, focusing on the $e^+e^-$ Future Circular Collider and a multi-TeV muon collider, to probe $τ$ dipole moments. We consider multiple channels, including $\ell^+\ell^- \to τ^+τ^-$ ($\ell=e,μ$), associated Higgs production $μ^+μ^- \to τ^+τ^- H$, radiative Higgs decays $H \to τ^+τ^-γ$, and vector-boson scattering $\ell^+\ell^- \to \ell^+\ell^-τ^+τ^-$ and $μ^+μ^- \to \barνντ^+τ^-$. Our results show that these facilities are highly complementary and can extend existing bounds by several orders of magnitude.
The path-integral approach to the double well has long been limited by the dilute instanton gas approximation. We show that if the finite Euclidean-time structure is taken seriously by using exact saddles, the dilute gas can be sidestepped, allowing the partition function and energy levels to be computed systematically. At each instanton order, the full resurgent structure -- which saddles contribute, what asymptotic growth is expected and how ambiguities cancel -- is encoded in a finite-dimensional Picard--Lefschetz contour integral over the quasi-zero modes with a clear geometric interpretation. Working at finite $T$ is essential: the dilute instanton gas can only access the ground-state splitting, whereas the exact finite-$T$ computation systematically produces the non-perturbative energy splittings for all excited states, including their full dependence on the level number. The key ingredients -- Weierstrass elliptic functions for the saddles, Lamé operators for the fluctuations and Picard--Fuchs equations for the periods -- form a coherent mathematical framework that both overlaps and complements that of Exact WKB.
In the framework of Soft de Sitter Effective Theory (SdSET), the Fokker-Planck equation for the late-time dynamics of the massless minimally coupled scalar field and its extension to the Kramers-Moyal equation are obtained from operator mixing of composite operators of the effective superhorizon field. We construct the formalism for composite-operator renormalisation, mixing and matching in dimensional regularisation, allowing for computations beyond the leading order. The general formalism is illustrated in free SdSET, which already features non-trivial structures including the well-known diffusion coefficient for stochastic inflation. As explicit examples in the interacting theory, we renormalise the one-loop bispectrum and the two-loop one-point function of the composite operator $\varphi_+^2$, and match them onto their full-theory counterparts. These results allow us to determine the next-to-leading order (two-loop) correction to the diffusion term of the Fokker-Planck equation of stochastic inflation for the first time.
Higgs final states are prime targets in the search for physics beyond the Standard Model. In the conventional picture, $SU(2)$ symmetry together with the Goldstone Equivalence Theorem correlates Higgs and gauge-boson final states, implying comparable sensitivity in channels such as $hh$, $ZZ$, and $WW$ in searches for heavy resonances. In this work, we identify a mechanism to parametrically violate this expectation. We show that higher-order Higgs couplings can induce an electroweak-symmetry-breaking enhancement that selectively amplifies Higgs-rich final states, allowing them to become the leading discovery channels of new resonances. For scalar resonances, this can make di-Higgs the dominant bosonic signal. For resonance masses higher than a couple of TeV, it also opens resonant tri-Higgs and four-Higgs channels as well-motivated search targets. The same underlying mechanism extends to heavy fermionic and vector resonances, where it can similarly enhance channels such as $ht$, $Zh$, and $γh$. We present this framework in effective field theory, demonstrate possible UV completions, and discuss its implications for collider searches.
We use the amplitude formulation of the SMEFT to introduce a spurion analysis of the SMEFT low-energy amplitudes in terms of the Higgs VEV. Each SMEFT contact-term is given as a sum of a few spurion structures, whose number depends on the electroweak charges of the external legs. The coefficients of these structures involve singlet combinations of Higgses from higher-order SMEFT contributions. We use this to derive the spurion expansions of the W- and Z-boson masses and mixing, and their three-point couplings to fermions. The textures of these couplings are saturated by the dimension-eight SMEFT. Our analysis can be generalized to higher-point amplitudes and nonzero Yukawa couplings.
We present a closed formula for the computation of static post-Newtonian corrections to the two-body gravitational dynamics at any odd order, assuming the lower-order results are known. The formula is derived within a correlation function framework and exploits the $\mathbb{Z}_2$ symmetry of the static sector, leading to a novel theoretical interpretation of the factorization theorem. As an application, we compute the gravitational interaction of two compact coalescing objects at the seventh post-Newtonian order in the static limit, which receives contributions from seven-loop graphs at order $\mathcal{O}(G_N^8 v^0)$, and find complete agreement with the results obtained using the diagrammatic approach of the factorization theorem.
We present an extraction of unpolarized quark transverse-momentum-dependent parton distribution functions (TMD PDFs) from Drell-Yan data within a Bayesian inference framework, incorporating artificial intelligence at multiple stages of the analysis. Our analysis is performed at ${\rm N^3LO}$ in perturbative QCD combined with ${\rm N^4LL}$ resummation accuracy. We first employ an AI-driven iterative procedure to explore and rank candidate functional forms for the nonperturbative contributions to TMD PDFs at the initial scale, as well as for the Collins-Soper evolution kernel, using $χ^2$ fits and physics constraints. To enable efficient Bayesian inference, we construct a surrogate model for TMD cross sections by training a machine-learning emulator over the parameter space, replacing computationally expensive repeated evaluations and allowing scalable sampling with an affine-invariant Markov Chain Monte Carlo (MCMC) ensemble. Using this framework, we perform a global analysis of Drell-Yan data from fixed-target, RHIC, and LHC experiments and extract TMD PDFs with quantified uncertainties. We compare the results with those obtained using the replica method and highlight differences in the resulting uncertainty estimates.
In the effective field theory (EFT) description of binary inspirals, the radiated gravitational waveform receives universal corrections from the curved background, the ``tail effects'', that resum into the so-called ``Sommerfeld factor''. We develop a systematic framework for computing this gravitational Sommerfeld factor for scalar perturbations with the presence of tidal effects on the system. Using the worldline EFT, we recast the diagrammatic resummation as a solution to the $d$-dimensional wave equation with a localized source, and derive a closed-form expression for the Sommerfeld factor in terms of the EFT connection matrix. We prove that the phase of the Sommerfeld factor is exactly the same as elastic Compton scattering phase shift when there is no tidal dissipation. By combining the renormalization techniques in EFT with the Mano--Suzuki--Takasugi method in black hole perturbation theory, we analytically solve the Sommerfeld factor for both the magnitude and phase to $O(G^{10})$ for the $\ell = 0, 1, 2$ partial waves. We further establish a new renormalization group equation for the radiative multipole moments, whose exact solution yields an improved resummation of the waveform beyond the universal tail logarithms. These high-precision data and exact relations pave the way for future resummation models of the waveform.
We show that the Aligned 2-Higgs Doublet Model (A2HDM) is a framework able to simultaneously accommodate strong first order electro-weak phase transitions, in turn generating detectable gravitational waves as well as a variety of Higgs boson signals (involving both the Standard Model state and its companions, both neutral and charged) accessible at the Large Hadron Collider (LHC). We map the corresponding expanse of parameter space where such a phenomenology is realised in terms of the relative values of the masses of the discovered Higgs boson and the extended Higgs sector states of this model: two neutral ones (a CP-even and a CP-odd) plus a pair of charged ones. We find that both the Laser Interferometer Space Antenna experiment and High-Luminosity LHC can test such a scenario within their lifetime. This study thus sets the stage for a two-prong complementary approach able to scrutinise the extended Higgs sector of the A2HDM in both its high and low temperature manifestations.
In this work, we investigate three typical new physics resonances which couple to the standard model (SM) quarks via direct top-quark flavor-changing interactions. We identify the possible SMEFT operators electroweak scale and analyze their phenomenology.
We discuss what is, at best, an ambiguity, and possibly an inconsistency of the eikonal Color Glass Condensate (CGC) description of Deep Inelastic Scattering (DIS). In this framework, the Bjorken-$x$ dependence enters the cross section solely through the rapidity cutoff $Λ=x_b$, leading to an all-order cross section independent of $x_b$. To address this issue, we explore a natural modification in which the weight functional depends explicitly on the light-cone momentum fraction $z$, with integration limits determined by $x_b$. This modification is consistent with the physical expectation that the observed non-perturbative structure depends on the probe energy. Our analysis implies that the $x_b$ variation of the cross section is not solely driven by small-$x$ evolution equations. We support this conclusion through an analysis of existing DIS fits and by demonstrating that a similarly good description of the data can be obtained within the modified framework. Finally, we show that the modified formulation is compatible with $k_t$-factorization, unlike the standard one.
Lorentz symmetry is the fundamental symmetry of Einstein's theory of Special Relativity and has been tested to great precision. Nevertheless, the possibility remains that it is violated at the Planck scale, as predicted by some theories of quantum gravity. While the Planck scale is not directly accessible to experiments, minute residual deviations from Lorentz symmetry at attainable energies may be observable. The polarization of light from astrophysical sources is a particularly powerful probe because tiny differences accumulate as light travels over astrophysical distances, and polarization is sensitive to light travel time differences between polarization modes on the order of the oscillation period of the electromagnetic wave. Here, we report on new constraints on Lorentz invariance violation derived from X-ray polarization measurements of active galactic nuclei. The new constraints, presented in the framework of the Standard-Model Extension, improve on our previous work, which used optical polarization measurements, by four orders of magnitude.
The magnetic moments and radiative decay widths of heavy baryons belong to a class of interesting experimental observables which provide direct information about the dynamics of strong interactions as well as the properties and the composition structures of heavy baryons. In this work, through a diquark model we compute these two quantities for doubly and triply heavy baryons in a dynamical model. We, first, compute an analytical mass equation for heavy diquarks based on the Bethe-Salpeter equation in which the interaction potential between constituents includes the contributions from the Cornell, the Breit-Fermi approximation, the spin-spin terms and the tensor potential. By iterating the mass equation, we compute the masses and the wave functions of heavy baryons. We also compute the magnetic moments and the radiative decay width of double and triple heavy baryons in their ground state. Our results are compared with other model-dependent predictions and existing data. We will also predict the mass and the magnetic moment of unobserved triply heavy baryons relevant for the present and future high energy colliders.
We introduce a technique for the next-to-leading order accurate simulation of $e^+e^-\to W^+W^-b\bar{b}$ that respects the resonant nature of the process above and near the top-quark pair production threshold. The parton-shower evolution, infrared subtraction and NLO matching account in particular for finite width effects beyond the Breit-Wigner structure considered in resonance-aware approaches. We present first phenomenological results relevant to a potential future electron-positron collider and provide a publicly available simulator based on the ALARIC parton shower and the SHERPA event generator.
Colour coherence affects the radiation pattern of hard partons both in vacuum and in a dense coloured background formed in heavy ion collisions. In vacuum evolution it leads to the well-known phenomenon of angular ordering, and in heavy ion collisions the appearance of a medium resolution scale strongly affects the way in which a fragmenting hard parton interacts with the background medium. In this paper I present the implementation of colour coherence in the JEWEL event generator for jet evolution in a dense medium. In each interaction between a hard parton and the medium it is checked whether the momentum transfer of the scattering is sufficient to resolve the colour dipole. In this way it is dynamically decided which structures stay coherent. Importantly, scatterings that resolve an individual parton disrupt the colour coherence, which affects the next splitting via the loss of angular ordering. This leads to a suppression of hard radiation, and consequently a reduction in overall scattering rate, which is the dominant source of effects of colour coherence observable in reconstructed jets. I discuss these modifications using the examples of nuclear modification factor, jet fragmentation function and jet-hadron correlations.
The precise determination of the Higgs self-couplings is an essential task for understanding electroweak symmetry breaking and probing physics beyond the Standard Model (SM). The calculation of two-loop corrections to scalar couplings is important as it provides a critical test of the perturbative stability of the theoretical predictions, especially in scenarios with extended scalar sectors where large one-loop corrections can occur. Moreover, two-loop corrections need to be taken into account for the future perspective of precisely measuring the trilinear Higgs self-coupling. We present new results for the leading two-loop corrections to trilinear Higgs couplings in the Two-Higgs-Doublet Model (2HDM). We focus in particular on the couplings $λ_{hhh}$ and $λ_{hhH}$, which are relevant for Higgs pair production at the (HL-)LHC or at future linear colliders. We address the renormalisation of the alignment limit in the Higgs basis and give some insights into technical details of the calculation. Finally, we discuss the phenomenological impact of our results on di-Higgs production differential distributions.
The KM3NeT Collaboration reported the detection of a neutrino, designated as KM3-230213A, with a reconstructed energy peaking at 220 PeV and equatorial coordinates (J2000) of RA=$94.3\degree$ and Dec=$-7.8\degree$. As the highest-energy neutrino event documented to date, its astrophysical origin remains unascertained. Prior preliminary investigations have probed potential associations between this neutrino event and gamma-ray bursts (GRBs), factoring in the possibility of Lorentz invariance violation (LV). In this study, we perform a comprehensive analysis to explore correlations between KM3-230213A and all viable GRBs. We explicitly account for the angular uncertainties intrinsic to both the neutrino event and the respective GRBs. Our analysis identifies a larger set of correlated GRBs. For each associated GRB, we compute the LV scale, integrating uncertainties from redshift measurements and neutrino energy determinations to enhance the robustness of our findings.
Transient noise artifacts, commonly referred to as glitches, pose a major challenge to parameter inference for space-based gravitational-wave (GW) observations. We develop a glitch-robust amortized inference framework for massive black hole binaries in the Taiji detector configuration by combining conditional normalizing flows, a time-frequency multimodal fusion encoder, and contrastive learning. To enable large-scale training on contaminated data, we further introduce a neural glitch generator that produces high-fidelity synthetic transients at substantially reduced computational cost. Systematic experiments show that, under glitch contamination, the proposed method yields more accurate and better-calibrated posteriors than a conventional Markov Chain Monte Carlo baseline. In ablation studies, the full time-frequency model with contrastive learning performs best overall and remains robust to variations in glitch duration and merger-relative timing. We further show that standard coverage diagnostics alone are insufficient to fully assess posterior fidelity. We therefore complement them with the continuous ranked probability score, which provides a stricter assessment of global distributional agreement in non-ideal GW data. Taken together, these results establish deep-learning-based amortized inference as a promising framework for fast and robust Bayesian parameter estimation in future space-based GW observations.
Early observations from the James Webb Space Telescope (JWST) have revealed an overabundance of massive high-redshift galaxies, raising the question of whether this points to new physics beyond $Λ$CDM, or an enhanced formation efficiency of massive stars. We revisit this issue going beyond earlier analyses based on direct comparisons to theoretical bounds at a fixed cosmology, by performing a full Bayesian analysis of the most extreme galaxies in the CEERS imaging and FRESCO spectroscopic samples, jointly constraining cosmological parameters and the baryon-to-star conversion efficiency $ε$. We do so not only within the spatially flat $Λ$CDM model, but also in models where the dark energy equation of state $w$ and/or the spatial curvature parameter $Ω_K$ are allowed to vary, carefully discussing the impact of both $w$ and $Ω_K$ on the cumulative comoving stellar mass density. Within the flat $Λ$CDM model, once cosmological parameters are marginalized over, the CEERS sample provides a weak $2σ$ lower limit of $ε\gtrsim 0.07$, compatible with astrophysical expectations. In contrast, the FRESCO sample requires $ε\gtrsim 0.5$ at $2σ$, with values $ε\lesssim 0.2$ disfavored at $>5σ$. These results do not qualitatively change when we allow $w$ and/or $Ω_K$ to vary, with no evidence for deviations from $w=-1$ or $Ω_K=0$. Our results therefore suggest that the origin of the ``JWST tension'' is unlikely to be cosmological, but lies in the astrophysics of galaxy formation.
We investigate one-loop corrections from torsion-induced four-fermion interactions to inflaton three-body decay and their impact on the associated stochastic gravitational-wave signal. We find a pronounced asymmetry in the dependence on the renormalization scale $u$. While the enhancement of the gravitational-wave spectrum remains modest, not exceeding roughly a factor of order unity for representative inflaton masses well below the Planck scale within the perturbative regime, the suppression can be much stronger, reaching up to two orders of magnitude, corresponding to reductions at the percent level. These results imply that loop corrections, particularly fermionic self-interactions, can significantly reduce the predicted gravitational-wave signal in models based on tree-level analyses. This suppression may shift the signal outside the sensitivity range of future observations and should therefore be taken into account in realistic phenomenological studies.
Over the past few decades, the hypothetically dark photon has been extensively studied from both phenomenological and experimental perspectives. It should be noted that the local symmetry for dark photon does not gauge the standard model Higgs scalar and chiral fermions. In this paper, we show that an artificially introduced $U(1)_X$ gauge group for dark photon and the standard model $U(1)_Y$ gauge group for hypercharge can be simultaneously born from two $U(1)_1\times U(1)_2$ gauge groups under which the standard model scalar and fermions carry the same $U(1)_1$ and $U(1)_2$ charges without causing any gauge anomalies. We further introduce a spontaneously broken mirror symmetry between the $U(1)_1$ and $U(1)_2$ gauge groups so that the $U(1)_1$ and $U(1)_2$ gauge couplings can acquire a small difference at one-loop level and hence the $U_X \times U_Y$ kinetic mixing can be highly suppressed in a natural way.
We examine the leading-power fragmentation of fully heavy pentaquarks in high-energy hadronic collisions. To this end, we complete the release of the hadron-structure-oriented PQ5Q1.0 fragmentation functions, by discussing the $P_{5c}$ set and delivering the $P_{5b}$ one. These functions incorporate an improved computation of the initial-scale input for the constituent heavy-quark fragmentation channel, making them particularly suitable for describing both the direct formation of a compact multicharm state and the hadronization from a diquark-antiquark-diquark configuration. For phenomenological applications, we employ the data-validated (sym)JETHAD framework to compute and analyze NLL/NLO$^+$ semi-inclusive production rates of pentaquark-plus-jet systems at the upcoming HL-LHC and the future FCC. This study marks a further step toward connecting hadronic structure, precision QCD, and the emerging physics of exotic matter.
Vortex $γ$ photons in superposition states have important applications in photonuclear, high-energy, and strong-field physics. However, their controlled generation in the $γ$-ray regime remains a great challenge. Here, we put forward a novel method for the generation of vortex $γ$ photon in superposition states, with controllable orbital angular momentum (OAM) separation $Δ\ell^\prime$ and modal weights, via nonlinear Compton scattering driven by multifrequency circularly polarized laser fields. We develop a strong-field quantum electrodynamics (QED) framework to reveal the underlying mechanism and calculate the radiation probabilities. In our method, the superposition arises from interference between energy-degenerate multiphoton pathways carrying distinct OAM. For two-frequency fields, the OAM separation follows $Δ\ell'=ν\mp1$ (upper/lower sign for equal/opposite helicities), and modal weights are tunable by laser intensities, with $ν$ the frequency ratio. Vortex $γ$ photons in controllable superposition states from our method have significant applications in strong-field QED and nuclear photonics.
The study of spin polarization of $Λ$ hyperons in ultrarelativistic heavy-ion collisions provides insights into the angular momentum and vortical structure of the possible existence of QGP. The present study examines the global spin polarization of $Λ$ hyperons using a second-order relativistic viscous hydrodynamic framework that incorporates medium vorticity, shear viscosity, and evolving magnetic fields. It explores thermal vorticity evolution in relativistic heavy-ion collisions and evaluates its value at the decoupling isothermal freeze-out surface. We quantify the contributions of thermal vorticity and magnetic field to the global spin polarization of $Λ$ hyperons. Comparing results with recent ALICE measurements in Pb+Pb collisions at $\sqrt{s_{NN}}$ = 2.76 and 5.02 TeV shows qualitative agreement, offering new insights into the vortical structure of QCD matter. It also explores the relationship between magnetic and rotational dynamics, with implications for spin polarization at RHIC and LHC energies.
We derive a one-loop effective description of axion inflation by integrating out a heavy Dirac fermion with an inflaton-dependent complex mass undergoing a smooth localized threshold transition. The threshold induces correlated corrections to the inflaton and gauge sectors, including a Coleman-Weinberg term, a vacuum-polarization correction, and an anomaly-induced Chern-Simons coupling. Together, these effects transiently enhance and localize gauge-field production, generating a chiral stochastic gravitational-wave background in the deci-hertz band within the projected sensitivities of BBO and DECIGO, while remaining below representative primordial-black-hole bounds.
Fast neutrino-flavor conversion (FFC) can nontrivially alter neutrino radiation field in core-collapase supernovae (CCSN) and binary neutron-star merger (BNSM) remnants. However, its interplay with global geometry remains poorly understood because microscopic flavor conversion scales are much shorter than global transport scales. We perform global quantum kinetic neutrino transport simulations in spherical geometry with neutrino and matter backgrounds, using an attenuated oscillation Hamiltonian. We find that steep radial lepton gradients can suppress FFC, whereas the suppression is highly sensitive to the adopted attenuation parameter. This behavior is explained by an adiabatic condition: flavor coherence can grow sufficiently only while the flavor wave remains on the unstable branch in the local dispersion relation during propagation. Background variation shifts the unstable branch, while attenuation lengthens the growth timescale, making the flavor coherence following more difficult. We provide an approximate formula for the adiabaticity that can be used directly in CCSN and BNSM models developed by classical neutrino transport simulations. Our results show that attenuation artificially leads to an overestimation of the impact of background variation and should therefore be applied with caution in global simulations of neutrino flavor conversion.
We investigate flavor-changing neutral current (FCNC) interactions of the top quark at a future muon collider with a center-of-mass energy of $\sqrt{s} = 10~\mathrm{TeV}$. The process $μ^{+}μ^{-} \to ν_μ\,μ^+\,b\,j$ and its corresponding charge conjugate are considered as a probe of anomalous $tqZ$ and $tqγ$ couplings, parametrized within an effective field theory framework in terms of $κ_{tqZ}$ and $λ_{tqγ}$. Signal and background events are simulated using Monte Carlo techniques, including parton showering and hadronization with \texttt{Pythia} and a fast detector simulation based on \texttt{Delphes} with a dedicated 10~TeV muon collider setup. A multivariate analysis based on boosted decision trees is employed to enhance the signal discrimination. Assuming an integrated luminosity of $10~\mathrm{ab}^{-1}$, we obtain projected sensitivities to the anomalous couplings at the $\mathcal{O}(10^{-3})$ level, corresponding to branching ratio limits of $\mathcal{O}(10^{-6})$ for the rare $t \to qZ$ and $t \to qγ$ decays. These results significantly improve upon the current bounds from the CMS and ATLAS collaborations, extending the sensitivity by more than one order of magnitude. Our findings demonstrate that a multi-TeV muon collider provides a powerful and complementary platform for probing rare top-quark interactions, offering a unique opportunity to explore physics beyond the Standard Model through FCNC processes.
We show that the strong constraints placed by Planck NPIPE Cosmic Microwave Background (CMB) data on axion-like early dark energy (EDE) are significantly alleviated in models with multiple fields. We find a $1.5σ$ residual tension with the Local Distance Network value of $H_0$ in a 2-field model, with no improvement beyond two fields, and a best-fit value of $H_0$ $\sim 1.4σ$ larger than in the 1-field case. The second field improves the fit to high-$\ell$ CMB data, where 1-field EDE is most strongly disfavored, and suggests modifications to the pre-recombination history over a wider redshift range.
We apply the Operator Product Expansion (OPE) algorithm to the renormalization of scalar-QED theory, with a specific focus on the fixed-charge operator $φ^Q$. Within the OPE framework, the anomalous dimension of the $φ^Q$ operator is perturbatively computed to four-loop order in the modified minimal subtraction scheme, extending beyond the previously available three-loop result. The beta functions, as well as the mass and field anomalous dimensions, are also computed at this order. An alternative loop-integrand construction method is proposed, based on graph decomposition and skeleton expansion techniques, for deriving the integrands of one-Particle-Irreducible correlation functions. This work represents the first non-trivial validation of the OPE algorithm for higher-loop renormalization beyond pure scalar theories. The present successful computations further confirm the efficiency and versatility of the OPE algorithm in renormalization analysis.
Collinear factorization and the leading-twist operator product expansion (OPE) in perturbative QCD express suitably inclusive observables in scale-separated kinematics as composites of perturbative short-distance coefficients with universal long-distance non-perturbative correlators such as parton distribution functions (PDFs), up to controlled power corrections. A persistent structural feature is \emph{presentation non-uniqueness}: coefficients and correlators are not individually physical, but are defined only up to finite factorization-scheme redefinitions induced by collinear subtractions and renormalized-operator mixing. We formalize this redundancy categorically by introducing an \emph{interface algebra object} encoding admissible finite collinear counterterms/mixing kernels and by organizing coefficient data and hadronic data as right/left modules over this algebra in a symmetric monoidal category encoding the chosen recomposition calculus. Our main result, the \emph{Core Representation Theorem}, identifies the universal scheme-invariant carrier: the functor of balanced (scheme-invariant) pairings is represented by the relative tensor product $C\otimes_A f$, which is terminal among all quotients of the naive composite $C\otimes f$ that preserve scheme-invariant semantics. Finally, we show how standard physics inputs (symmetry constraints, locality/OPE, and a stated accuracy truncation) canonically induce the interface algebra and module structures, and we prove a minimal closure principle for completing a generating set of long-distance operators/correlators to an $A$-stable sector.
When considering a model selection or, more generally, an aggregation approach for adaptive statistical inference, it is often necessary to compute estimators over a wide range of model complexities including unnecessarily large models even when the true data-generating process is relatively simple, due to the lack of prior knowledge. This requirement can lead to substantial computational inefficiency. In this work, we propose a novel framework for efficient model aggregation called the early-stopped aggregation (ESA): instead of computing and aggregating estimators for all candidate models, we compute only a small number of simpler ones using an early-stopping criterion and aggregate only these for final inference. Our framework is versatile and applies to both Bayesian model selection, in particular, within the variational Bayes framework, and frequentist estimation, including a general penalized estimation setting. We investigate adaptive optimal property of the ESA approach across three learning paradigms. We first show that ESA achieves optimal adaptive contraction rates in the variational Bayes setting under mild conditions. We extend this result to variational empirical Bayes, where prior hyperparameters are chosen in a data-dependent manner. In addition, we apply the ESA approach to frequentist aggregation including both penalization-based and sample-splitting implementations, and establish corresponding theory. As we demonstrate, there is a clear unification between early-stopped Bayes and frequentist penalized aggregation, with a common "energy" functional comprising a data-fitting term and a complexity-control term that drives both procedures. We further present several applications and numerical studies that highlight the efficiency and strong performance of the proposed approach.
As search depth increases in autonomous reasoning and embodied planning, the candidate action space expands exponentially, heavily taxing computational budgets. While heuristic pruning is a common countermeasure, it operates without formal safety guarantees when surrogate models (like LLMs) exhibit systematic evaluation biases. This paper frames the node expansion process as a localized Best-Arm Identification (BAI) problem over dynamic frontiers, subject to a bounded systematic bias $L$. By inverting the Lambert W function, we establish an additive sample complexity of $\mathcal{O}((Δ-4L)^{-2})$, which indicates that safe node elimination is only feasible when the empirical reward gap exceeds $4L$. We complement this with an information-theoretic lower bound of $Ω((Δ-2L)^{-2})$ to confirm the structural limits of biased search. Subsequent evaluations on both synthetic trees and complex reasoning tasks demonstrate that adhering to this local safety boundary successfully preserves optimal trajectories while maximizing sample allocation efficiency.
We introduce path-sampled integrated gradients (PS-IG), a framework that generalizes feature attribution by computing the expected value over baselines sampled along the linear interpolation path. We prove that PS-IG is mathematically equivalent to path-weighted integrated gradients, provided the weighting function matches the cumulative distribution function of the sampling density. This equivalence allows the stochastic expectation to be evaluated via a deterministic Riemann sum, improving the error convergence rate from $O(m^{-1/2})$ to $O(m^{-1})$ for smooth models. Furthermore, we demonstrate analytically that PS-IG functions as a variance-reducing filter against gradient noise - strictly lowering attribution variance by a factor of 1/3 under uniform sampling - while preserving key axiomatic properties such as linearity and implementation invariance.
Applying kernel methods to matchings is challenging due to their discrete, non-Euclidean nature. In this paper, we develop a principled framework for constructing geometric kernels that respect the natural geometry of the space of matchings. To this end, we first provide a complete characterization of stationary kernels, i.e. kernels that respect the inherent symmetries of this space. Because the class of stationary kernels is too broad, we specifically focus on the heat and Matérn kernel families, adding an appropriate inductive bias of smoothness to stationarity. While these families successfully extend widely popular Euclidean kernels to matchings, evaluating them naively incurs a prohibitive super-exponential computational cost. To overcome this difficulty, we introduce and analyze a novel, sub-exponential algorithm leveraging zonal polynomials for efficient kernel evaluation. Finally, motivated by the known bijective correspondence between matchings and phylogenetic trees-a crucial data modality in biology-we explore whether our framework can be seamlessly transferred to the space of trees, establishing novel negative results and identifying a significant open problem.
We derive a robust update rule for the online infinite hidden Markov model (iHMM) for when the streaming data contains outliers and the model is misspecified. Leveraging recent advances in generalised Bayesian inference, we define robustness via the posterior influence function (PIF), and provide conditions under which the online iHMM has bounded PIF. Imposing robustness inevitably induces an adaptation lag for regime switching. Our method, which is called Batched Robust iHMM (BR-iHMM), balances adaptivity and robustness with two additional tunable parameters. Across limit order book data, hourly electricity demand, and a synthetic high-dimensional linear system, BR-iHMM reduces one-step-ahead forecasting error by up to 67% relative to competing online Bayesian methods. Together with theoretical guarantees of bounded PIF, our results highlight the practicality of our approach for both forecasting and interpretable online learning.
Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold. Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes: at small batch sizes it converges to a lower plateau $2(1-β)/η$, reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau $2(1+β)/η$, where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics. We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling.
We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.
We show that the maximum expected inner product between a random vector and the standard normal vector over all couplings subject to a mutual information constraint or regularization is equivalent to a truncated integral involving the rate-distortion function, up to universal multiplicative constants. The proof is based on a lifting technique, which constructs a Gaussian process indexed by a random subset of the type class of the probability distribution involved in the information-theoretic inequality, and then applying a form of the majorizing measure theorem.
Antibody lead optimization is inherently a multi-objective challenge in drug discovery. Achieving a balance between different drug-like properties is crucial for the development of viable candidates, and this search becomes exponentially challenging as desired properties grow. The ever-growing zoo of sophisticated in silico tools for predicting antibody properties calls for an efficient joint optimization procedure to overcome resource-intensive sequential filtering pipelines. We present BOAT, a versatile Bayesian optimization framework for multi-property antibody engineering. Our `plug-and-play' framework couples uncertainty-aware surrogate modeling with a genetic algorithm to jointly optimize various predicted antibody traits while enabling efficient exploration of sequence space. Through systematic benchmarking against genetic algorithms and newer generative learning approaches, we demonstrate competitive performance with state-of-the-art methods for multi-objective protein optimization. We identify clear regimes where surrogate-driven optimization outperforms expensive generative approaches and establish practical limits imposed by sequence dimensionality and oracle costs.
Why do capitalist economies recurrently generate crises whose severity is disproportionate to the size of the triggering shock? This paper proposes a structural answer grounded in the evolutionary geometry of production networks. As economies evolve through specialization, integration, and competitive selection, their inter-sectoral linkages drift toward configurations of increasing geometric fragility, eventually crossing a threshold beyond which small disturbances generate disproportionately large cascades. We introduce Sandpile Economics, a formal framework that interprets macroeconomic instability as an emergent property of disequilibrium production networks. The key state variable is the Forman--Ricci curvature of the input--output graph, capturing local substitution possibilities when supply chains are disrupted. We show that when curvature falls below an endogenous threshold, the distribution of cascade sizes follows a power law with tail index $α\in (1,2)$, implying a regime of unbounded amplification. The underlying mechanism is evolutionary: specialization reduces input substitutability, pushing the economy toward criticality, while crisis episodes induce endogenous network reconfiguration and path dependence. These dynamics are inherently non-ergodic and cannot be captured by representative-agent frameworks. Empirically, using global input--output data, we document that production networks operate in persistently negative curvature regimes and that curvature robustly predicts medium-run output dynamics. A one-standard-deviation increase in curvature is associated with higher cumulative growth over three-year horizons, and curvature systematically outperforms standard network metrics in explaining cross-country differences in resilience.
We study adaptive pooling under predictive heterogeneity in high-dimensional multivariate time series forecasting, where global models improve statistical efficiency but may fail to capture heterogeneous predictive structure, while naive specialization can induce negative transfer. We formulate adaptive pooling as a statistical decision problem and propose a validation-driven framework that determines when and how specialization should be applied. Rather than grouping series based on representation similarity, we define partitions through out-of-sample predictive performance, thereby aligning data organization with predictive risk, defined as expected out-of-sample loss and approximated via validation error. Cluster assignments are iteratively updated using validation losses for both point (Huber) and probabilistic (pinball) forecasting, improving robustness to heavy-tailed errors and local anomalies. To ensure reliability, we introduce a leakage-free fallback mechanism that reverts to a global model whenever specialization fails to improve validation performance, providing a safeguard against performance degradation under a strict training-validation-test protocol. Experiments on large-scale traffic datasets demonstrate consistent improvements over strong baselines while avoiding degradation when heterogeneity is weak. Overall, the proposed framework provides a principled and practically reliable approach to adaptive pooling in high-dimensional forecasting problems.
We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of $\widetilde{O}(\sqrt{α^* T})$ after $T$ rounds, where $α^*$ is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of $α^*$. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor and Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.
Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d*sqrt(T ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.
We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.
The statistical essence of the Transformer architecture has long remained elusive: Is it a universal approximator, or a neural network version of known computational algorithms? Through rigorous algebraic proof, we show that the latter better describes Transformer's basic nature: Ordinary Least Squares (OLS) is a special case of the single-layer Linear Transformer. Using the spectral decomposition of the empirical covariance matrix, we construct a specific parameter setting where the attention mechanism's forward pass becomes mathematically equivalent to the OLS closed-form projection. This means attention can solve the problem in one forward pass, not by iterating. Building upon this prototypical case, we further uncover a decoupled slow and fast memory mechanism within Transformers. Finally, the evolution from our established linear prototype to standard Transformers is discussed. This progression facilitates the transition of the Hopfield energy function from linear to exponential memory capacity, thereby establishing a clear continuity between modern deep architectures and classical statistical inference.
We introduce Metric-Aware Principal Component Analysis (MAPCA), a unified framework for scale-invariant representation learning based on the generalised eigenproblem max Tr(W^T Sigma W) subject to W^T M W = I, where M is a symmetric positive definite metric matrix. The choice of M determines the representation geometry. The canonical beta-family M(beta) = Sigma^beta, beta in [0,1], provides continuous spectral bias control between standard PCA (beta=0) and output whitening (beta=1), with condition number kappa(beta) = (lambda_1/lambda_p)^(1-beta) decreasing monotonically to isotropy. The diagonal metric M = D = diag(Sigma) recovers Invariant PCA (IPCA), a method rooted in Frisch (1928) diagonal regression, as a distinct member of the broader framework. We prove that scale invariance holds if and only if the metric transforms as M_tilde = CMC under rescaling C, a condition satisfied exactly by IPCA but not by the general beta-family at intermediate values. Beyond its classical interpretation, MAPCA provides a geometric language that unifies several self-supervised learning objectives. Barlow Twins and ZCA whitening correspond to beta=1 (output whitening); VICReg's variance term corresponds to the diagonal metric. A key finding is that W-MSE, despite being described as a whitening-based method, corresponds to M = Sigma^{-1} (beta = -1), outside the spectral compression range entirely and in the opposite spectral direction to Barlow Twins. This distinction between input and output whitening is invisible at the level of loss functions and becomes precise only within the MAPCA framework.
The robust low-rank tensor completion problem addresses the challenge of recovering corrupted high-dimensional tensor data with missing entries, outliers, and sparse noise commonly found in real-world applications. Existing methodologies have encountered fundamental limitations due to their reliance on uniform regularization schemes, particularly the tensor nuclear norm and $\ell_1$ norm regularization approaches, which indiscriminately apply equal shrinkage to all singular values and sparse components, thereby compromising the preservation of critical tensor structures. The proposed tensor weighted correlated total variation (TWCTV) regularizer addresses these shortcomings through an $M$-product framework that combines a weighted Schatten-$p$ norm on gradient tensors for low-rankness with smoothness enforcement and weighted sparse components for noise suppression. The proposed weighting scheme adaptively reduces the thresholding level to preserve both dominant singular values and sparse components, thus improving the reconstruction of critical structural elements and nuanced details in the recovered signal. Through a systematic algorithmic approach, we introduce an enhanced alternating direction method of multipliers (ADMM) that offers both computational efficiency and theoretical substantiation, with convergence properties comprehensively analyzed within the $M$-product framework.Comprehensive numerical evaluations across image completion, denoising, and background subtraction tasks validate the superior performance of this approach relative to established benchmark methods.
Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.
We prove that conditional diffusion models whose reverse kernels are finite Gaussian mixtures with ReLU-network logits can approximate suitably regular target distributions arbitrarily well in context-averaged conditional KL divergence, up to an irreducible terminal mismatch that typically vanishes with increasing diffusion horizon. A path-space decomposition reduces the output error to this mismatch plus per-step reverse-kernel errors; assuming each reverse kernel factors through a finite-dimensional feature map, each step becomes a static conditional density approximation problem, solved by composing Norets' Gaussian-mixture theory with quantitative ReLU bounds. Under exact terminal matching the resulting neural reverse-kernel class is dense in conditional KL.
The simulation of complex systems increasingly relies on sophisticated but fundamentally opaque computational black-box simulators. Surrogate models play a central role in reducing the computational cost of complex systems simulations across a wide range of scientific and engineering domains. Notwithstanding, they inevitably inherit and often exacerbate this black-box nature, obscuring how input variables drive physical responses. Conversely, Explainable Artificial Intelligence (XAI) offers powerful tools to unpack these models. Yet, XAI methods struggle with engineering-specific constraints, such as highly correlated inputs, dynamical systems, and rigorous reliability requirements. Consequently, surrogate modeling and XAI have largely evolved as distinct fields of research, despite their strong complementarity. To reconnect these approaches, this state-of-the-art survey provides a structured perspective that maps existing XAI techniques onto the various stages of surrogate modeling workflows for design and exploration. To ground this synthesis, we draw upon illustrative applications across both equation-based simulations and agent-based modeling. We survey a broad spectrum of techniques, highlighting their strengths for revealing interactions and supporting human comprehension. Finally, we identify pressing open challenges, including the explainability of dynamical systems and the handling of mixed-variable systems, and propose a research agenda to make explainability a core, embedded element of simulation-driven workflows from model construction through decision-making. By transforming opaque emulators into explainable tools, this agenda empowers practitioners to move beyond accelerating simulations to extracting actionable insights from complex system behaviors.
We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on covariates, creating selection bias that makes direct regression of the response on treatment unreliable. To address this issue, we propose a two-stage kernel ridge regression method. In the first stage, we learn a model for the response as a function of both treatment and covariates; in the second stage, we use this model to construct pseudo-outcomes that correct for distribution shift, and then fit a second model to estimate the treatment effect. Although the response varies with both treatment and covariates, the induced effect function obtained by averaging over covariates is typically much simpler, and our estimator adapts to this structure. Furthermore, we introduce a fully data-driven model selection procedure that achieves provable adaptivity to both the unknown degree of overlap and the regularity (eigenvalue decay) of the underlying kernel.
Davis, Drusvyatskiy, and Jiang showed that gradient descent with an adaptive stepsize converges locally at a nearly-linear rate for smooth functions that grow at least quartically away from their minimizers. The argument is intricate, relying on monitoring the performance of the algorithm relative to a certain manifold of slow growth -- called the ravine. In this work, we provide a direct Lyapunov-based argument that bypasses these difficulties when the objective is in addition convex and a has a unique minimizer. As a byproduct of the argument, we obtain a more adaptive variant than the original algorithm with encouraging numerical performance.
Quantization is a natural complement to the sparse, event-driven computation of Spiking Neural Networks, reducing memory bandwidth and arithmetic cost for deployment on resource-constrained hardware. However, existing SNN quantization evaluation focuses almost exclusively on accuracy, overlooking whether a quantized network preserves the firing behavior of its full-precision counterpart. We demonstrate that quantization method, clipping range, and bit-width can produce substantially different firing distributions at equivalent accuracy, differences invisible to standard metrics but relevant to deployment, where firing activity governs effective sparsity, state storage, and event-processing load. To capture this gap, we propose Earth Mover's Distance as a diagnostic metric for firing distribution divergence, and apply it systematically across weight and membrane quantization on SEW-ResNet architectures trained on CIFAR-10 and CIFAR-100. We find that uniform quantization induces distributional drift even when accuracy is preserved, while LQ-Net style learned quantization maintains firing behavior close to the full-precision baseline. Our results suggest that behavior preservation should be treated as an evaluation criterion alongside accuracy, and that EMD provides a principled tool for assessing it.
Traditional esports scouting workflows rely heavily on manual video review and aggregate performance metrics, which often fail to capture the nuanced decision-making patterns necessary to determine if a prospect fits a specific tactical archetype. To address this, we reframe style-based player evaluation in esports as an Inverse Reinforcement Learning (IRL) problem. In this paper, we introduce a novel player selection framework that learns professional-specific reward functions from logged gameplay demonstrations, allowing organizations to rank candidates by their stylistic alignment with a target star player. Our proposed architecture utilizes a multimodal, two-branch intake: one branch encodes structured state-action trajectories derived from high-resolution in-game telemetry, while the second encodes temporally aligned tactical pseudo-commentary generated by Vision-Language Models (VLMs) from broadcast footage. These representations are fused and evaluated via a Generative Adversarial Imitation Learning (GAIL) objective, where a discriminator learns to capture the unique mechanical and tactical signatures of elite professionals. By transitioning from generic skill estimation to scouting "by reward," this framework provides a scalable, workflow-aware digital twin system that enables data-driven roster construction and targeted talent discovery across massive candidate pools.
Physics-informed neural networks (PINNs) are often selected by a single scalar loss even when the quantity of interest is more specific. We study a hybrid design in which the governing PDE residual remains automatic-differentiation (AD) based, while finite differences (FD) appear only in a weak auxiliary term that penalizes gradients of the sampled residual field. The FD term regularizes the residual field without replacing the PDE residual itself. We examine this idea in two stages. Stage 1 is a controlled Poisson benchmark comparing a baseline PINN, the FD residual-gradient regularizer, and a matched AD residual-gradient baseline. Stage 2 transfers the same logic to a three-dimensional annular heat-conduction benchmark (PINN3D), where baseline errors concentrate near a wavy outer wall and the auxiliary grid is implemented as a body-fitted shell adjacent to the wall. In Stage 1, the FD regularizer reproduces the main effect of residual-gradient control while exposing a trade-off between field accuracy and residual cleanliness. In Stage 2, the shell regularizer improves the application-facing quantities, namely outer-wall flux and boundary-condition behavior. Across seeds 0-5 and 100k epochs, the most reliable tested configuration is a fixed shell weight of 5e-4 under the Kourkoutas-beta optimizer regime: relative to a matched run without the shell term, it reduces the mean outer-wall BC RMSE from 1.22e-2 to 9.29e-4 and the mean wall-flux RMSE from 9.21e-3 to 9.63e-4. Adam with beta2=0.999 becomes usable when the initial learning rate is reduced to 1e-3, although its shell benefit is less robust than under Kourkoutas-beta. Overall, the results support a targeted view of hybrid PINNs: an auxiliary-only FD regularizer is most valuable when it is aligned with the physical quantity of interest, here the outer-wall flux.
Neuromotor decoding from upper-limb electromyography (sEMG) can enhance human-machine interfaces and offer a more natural means of controlling prosthetic limbs, virtual reality, and household electronics. Unfortunately, current sEMG technology does not always perform consistently across users because individual differences such as age and body mass index, among many others, can substantially alter signal quality. This variability makes sEMG characteristics highly idiosyncratic, often necessitating laborious personalization and iterative tuning to achieve reliable performance. This variability has particular import for sEMG-based assistive devices and neural interfaces, where demographic biases in sEMG features could undermine broad and fair deployment. In this study, we explore how demographic differences affect the sEMG signals produced and their implications for machine learning-based gesture decoding. We analyze the data set provided by, in which we derive 147 common sEMG features extracted from 81 demographically diverse individuals performing discrete hand gestures. Using mixed-effects linear models and partial least squares (PLS) analysis, which take into consideration demographic variables (including age, sex, height, weight, skin properties, subcutaneous fat, and hair density), we identify that 33\% (49 of 147) of commonly used sEMG features show significant associations with demographic characteristics. These results may help guide the development of fair and unbiased sEMG-based neural interfaces across a diverse population.
Quick and accurate emergency handling in Disaster Decision Support Systems (DDSS) is often hampered by network latency and suboptimal application accuracy. While Federated Learning (FL) addresses some of these issues, it is constrained by high communication costs and rigid synchronization requirements across heterogeneous convolutional neural network (CNN) architectures. To overcome these challenges, this paper proposes a decentralized ensembling framework based on asynchronous probability aggregation and feedback distillation. By shifting the exchange unit from model weights to class-probability vectors, our method maintains data privacy, reduces communication requirements by orders of magnitude, and improves overall accuracy. This approach enables diverse CNN designs to collaborate asynchronously, enhancing disaster image identification performance even in resource-constrained settings. Experimental tests demonstrate that the proposed method outperforms traditional individual backbones and standard federated approaches, establishing a scalable and resource-aware solution for real-time disaster response.
Zero-ablation -- replacing token activations with zero vectors -- is widely used to probe token function in vision transformers. Register zeroing in DINOv2+registers and DINOv3 produces large drops (up to $-36.6$\,pp classification, $-30.9$\,pp segmentation), suggesting registers are functionally indispensable. However, three replacement controls -- mean-substitution, noise-substitution, and cross-image register-shuffling -- preserve performance across classification, correspondence, and segmentation, remaining within ${\sim}1$\,pp of the unmodified baseline. Per-patch cosine similarity shows these replacements genuinely perturb internal representations, while zeroing causes disproportionately large perturbations, consistent with why it alone degrades tasks. We conclude that zero-ablation overstates dependence on exact register content. In the frozen-feature evaluations we test, performance depends on plausible register-like activations rather than on exact image-specific values. Registers nevertheless buffer dense features from \texttt{[CLS]} dependence and are associated with compressed patch geometry. These findings, including the replacement-control results, replicate at ViT-B scale.
We present Three-Phase Transformer (3PT), a residual-stream structural prior for decoder-only Transformers on a standard SwiGLU + RMSNorm + RoPE + GQA backbone. The hidden vector is partitioned into N equally-sized cyclic channels, each maintained by phase-respecting ops: a per-channel RMSNorm, a 2D Givens rotation between attention and FFN that rotates each channel by theta + i*(2*pi/N), and a head-count constraint aligning GQA heads with the partition. The architecture is a self-stabilizing equilibrium between scrambling and re-imposition, not a bolted-on module. The partition carves out a one-dimensional DC subspace orthogonal to the channels, into which we inject a fixed Gabriel's horn profile r(p) = 1/(p+1) as an absolute-position side-channel composing orthogonally with RoPE's relative-position rotation. The canonical N=3 borrows its metaphor from balanced three-phase AC, where three sinusoids 120 degrees apart sum to zero with no anti-correlated pair. At 123M parameters on WikiText-103, 3PT achieves -7.20% perplexity (-2.62% bits-per-byte) over a matched RoPE-Only baseline at +1,536 parameters (0.00124% of total), with 1.93x step-count convergence speedup (1.64x wall-clock). N behaves as a parameter-sharing knob rather than a unique optimum: at 5.5M an N-sweep over {1,2,3,4,6,8,12} is near-monotone with N=1 winning; at 123M a three-seed sweep finds N=3 and N=1 statistically indistinguishable. The load-bearing mechanism is the channel-partitioned residual stream, per-block rotation, per-phase normalization, and horn DC injection. We characterize (a) self-stabilization of the geometry without explicit enforcement, a novel instance of the conservation-law framework for neural networks; (b) a U-shaped depth profile of rotation-angle drift at 12 layers; (c) orthogonal composition with RoPE, attention, and FFN.
Most practical engineering design problems involve nonlinear spatio-temporal dynamical systems. Multi-physics simulations are often performed to capture the fine spatio-temporal scales which govern the evolution of these systems. However, these simulations are often high-fidelity in nature, and can be computationally very expensive. Hence, generating data from these expensive simulations becomes a bottleneck in an end-to-end engineering design process. Spatio-temporal surrogate modeling of these dynamical systems has been a popular data-driven solution to tackle this computational bottleneck. This is because accurate machine learning models emulating the dynamical systems can be orders of magnitude faster than the actual simulations. However, one key limitation of purely data-driven approaches is their lack of generalizability to inputs outside the training distribution. In this paper, we propose a physics-informed spatio-temporal surrogate modeling (PISTM) framework constrained by the physics of the underlying dynamical system. The framework leverages state-of-the-art advancements in the field of Koopman autoencoders to learn the underlying spatio-temporal dynamics in a non-intrusive manner, coupled with a spatio-temporal surrogate model which predicts the behavior of the Koopman operator in a specified time window for unknown operating conditions. We evaluate our framework on a prototypical fluid flow problem of interest: two-dimensional incompressible flow around a cylinder.
Rotating detonation engines (RDEs) are a promising propulsion concept that may offer higher thermodynamic efficiency and specific impulse than conventional systems, but nonlinear phenomena, including transitions to oscillatory or chaotic propagation modes, can hinder practical operation. Deep Reinforcement Learning (DRL) has emerged as a promising method for controlling complex nonlinear dynamics such as those observed in RDEs. However, the multi-timescale nature of the RDE system makes direct application of DRL challenging. We address this challenge by reformulating the DRL problem in a moving reference frame that follows the detonation-wave pattern, making the wave structure appear quasi-steady to the agent. This reformulation enables scale separation between fast detonation propagation and slower operating-mode dynamics. We train DRL controllers to modulate spatially segmented injection pressure in a one-dimensional reduced-order RDE model and induce rapid transitions between different mode-locked states. Across a range of actuation periods, initial states, and target modes, controllers trained in the moving frame learn more reliably than those trained in a stationary frame and remain effective over a broader range of actuation periods. These results suggest that symmetry-aware moving reference frame formulations may be useful for related multiscale flow-control problems and that scale separation should be exploited whenever possible to enable DRL control of multi-timescale systems.
Reinforcement learning (RL) has emerged as a powerful tool for aligning diffusion models with human preferences, typically by optimizing a single reward function under a KL regularization constraint. In practice, however, human preferences are inherently pluralistic, and aligned models must balance multiple downstream objectives, such as aesthetic quality and text-image consistency. Existing multi-objective approaches either rely on costly multi-objective RL fine-tuning or on fusing separately aligned models at denoising time, but they generally require access to reward values (or their gradients) and/or introduce approximation error in the resulting denoising objectives. In this paper, we revisit the problem of RL fine-tuning for diffusion models and address the intractability of identifying the optimal policy by introducing a step-level RL formulation. Building on this, we further propose Multi-objective Step-level Denoising-time Diffusion Alignment (MSDDA), a retraining-free framework for aligning diffusion models with multiple objectives, obtaining the optimal reverse denoising distribution in closed form, with mean and variance expressed directly in terms of single-objective base models. We prove that this denoising-time objective is exactly equivalent to the step-level RL fine-tuning, introducing no approximation error. Moreover, we provide numerical results, which indicate our method outperforms existing denoising-time approaches.
Catastrophic forgetting remains a primary hurdle in sequential task learning for artificial neural networks. We propose a silicon-native modular architecture that achieves structural parameter isolation using Task-Specific Experts and a distributed, outlier-based Gatekeeper. Moving beyond traditional sequential consolidation, our framework utilizes a Simultaneous Pipeline where Teacher learning, Student distillation, and Router manifold acquisition occur in parallel while raw data is present in a localized training session. This approach ensures computational efficiency and complies with privacy mandates like GDPR by deleting raw data as soon as a task is learned. We demonstrate that a Tight-Bottleneck Autoencoder (TB-AE) can effectively distinguish semantically crowded manifolds in high-dimensional latent spaces, overcoming the posterior collapse inherent to standard variational methods. By establishing strict topological boundaries, our TB-AE resolves latent space crowding in 4096-D LLM embeddings to provide a robust, unsupervised novelty signal. Furthermore, we validate an Autonomous Retrieval mechanism that confidently identifies returning manifolds, enabling stable lifelong learning without redundant module instantiation. Empirical results demonstrate that our ``Live Distillation'' approach acts as a natural regularizer, achieving strong retention across computer vision and natural language processing domains without suffering a student fidelity gap.
AI tools increasingly guide targeted interventions in healthcare, education, and recruiting. Algorithms score individuals, trigger outreach to those above a threshold (e.g., high-risk or high-value), and encourage them to request service; then providers deliver service to those who request. Standard practice sets the threshold and selects the algorithm to maximize predictive accuracy, assuming that better predictions yield better outcomes. We show that this approach is suboptimal when limited service capacity and probabilistic behavioral responses influence who receives service. In such settings, the optimal score threshold must balance two effects: ensuring all capacity is filled (utilization) and ensuring high-value individuals are served despite competition between requests (cannibalization). We characterize the optimal threshold and prove that policies based solely on predictive accuracy are generally suboptimal. Further, because optimal thresholds vary with service capacity, algorithm selection metrics like AUC, which weight all thresholds equally, are misaligned with operational performance. We introduce a new metric--Operational AUC (OpAUC)--and show it leads to optimal algorithm selection. Finally, we conduct a case study on sepsis early warning data and illustrate the magnitude of improvement that can be achieved from improved threshold and algorithm selection.
Online A/B testing at scale relies on proxy metrics -- short-term, easily-measured signals used in place of slow-moving long-term outcomes. When the proxy-outcome relationship is heterogeneous across user segments, aggregate correlation can mask directional failures akin to Simpson's Paradox, leading to costly ship/no-ship errors. We introduce PROXIMA (Proxy Metric Validation Framework for Online Experiments), a lightweight diagnostic framework that scores proxy reliability through a composite of three complementary dimensions: normalised effect correlation, directional accuracy, and segment-level fragility rate. Unlike surrogate-index approaches that predict long-term treatment effects, PROXIMA directly audits whether a candidate proxy leads to correct launch decisions and flags the user segments where it fails. We validate PROXIMA on two public datasets -- the Criteo Uplift corpus (14M observations, advertising) and KuaiRec (7K users, video recommendation) -- using 80 simulated A/B tests. Early engagement metrics achieve a composite reliability of 0.80 on Criteo and 0.62 on KuaiRec, yielding 98.4% average decision agreement with an oracle policy. Fragility analysis reveals that recommendation domains exhibit substantially higher segment-level heterogeneity (68% fragility) than advertising (13%), yet directional accuracy remains above 96% in both cases. A sensitivity analysis over the weight space confirms that no single component suffices and that the composite provides substantially better discrimination between reliable and unreliable proxies than correlation alone. Code and reproduction scripts are available at: https://github.com/Avinash-Amudala/PROXIMA
As search depth increases in autonomous reasoning and embodied planning, the candidate action space expands exponentially, heavily taxing computational budgets. While heuristic pruning is a common countermeasure, it operates without formal safety guarantees when surrogate models (like LLMs) exhibit systematic evaluation biases. This paper frames the node expansion process as a localized Best-Arm Identification (BAI) problem over dynamic frontiers, subject to a bounded systematic bias $L$. By inverting the Lambert W function, we establish an additive sample complexity of $\mathcal{O}((Δ-4L)^{-2})$, which indicates that safe node elimination is only feasible when the empirical reward gap exceeds $4L$. We complement this with an information-theoretic lower bound of $Ω((Δ-2L)^{-2})$ to confirm the structural limits of biased search. Subsequent evaluations on both synthetic trees and complex reasoning tasks demonstrate that adhering to this local safety boundary successfully preserves optimal trajectories while maximizing sample allocation efficiency.
We introduce path-sampled integrated gradients (PS-IG), a framework that generalizes feature attribution by computing the expected value over baselines sampled along the linear interpolation path. We prove that PS-IG is mathematically equivalent to path-weighted integrated gradients, provided the weighting function matches the cumulative distribution function of the sampling density. This equivalence allows the stochastic expectation to be evaluated via a deterministic Riemann sum, improving the error convergence rate from $O(m^{-1/2})$ to $O(m^{-1})$ for smooth models. Furthermore, we demonstrate analytically that PS-IG functions as a variance-reducing filter against gradient noise - strictly lowering attribution variance by a factor of 1/3 under uniform sampling - while preserving key axiomatic properties such as linearity and implementation invariance.
Key Opinion Leader (KOL) discourse on social media is widely consumed as investment guidance, yet turning it into executable trading strategies without injecting assumptions about unspecified execution decisions remains an open problem. We observe that the gaps in KOL statements are not random deficiencies but a structured separation: KOLs express directional intent (what to buy or sell and why) while leaving execution decisions (when, how much, how long) systematically unspecified. Building on this observation, we propose an intent-preserving policy completion framework that treats KOL discourse as a partial trading policy and uses offline reinforcement learning to complete the missing execution decisions around the KOL-expressed intent. Experiments on multimodal KOL discourse from YouTube and X (2022-2025) show that KICL achieves the best return and Sharpe ratio on both platforms while maintaining zero unsupported entries and zero directional reversals, and ablations confirm that the full framework yields an 18.9% return improvement over the KOL-aligned baseline.
Diffusion-model inference and overdamped Langevin dynamics are formally identical. A physical substrate that encodes the score function therefore equilibrates to the correct output by thermodynamics alone, requiring no digital arithmetic during inference and potentially achieving a $10{,}000\times$ reduction in energy relative to a GPU. Two fundamental barriers have until now prevented this equivalence from being realized at production scale: non-local skip connections, which locally coupled analog substrates cannot represent, and input conditioning, in which the coupling constants carry roughly $2{,}600\times$ too little signal to anchor the system to a specific input. We resolve both obstacles. \emph{Hierarchical bilinear coupling} encodes U-Net skip connections as rank-$k$ inter-module interactions derived directly from the singular structure of the encoder and decoder Gram matrices, requiring only $O(Dk)$ physical connections instead of $O(D^2)$. A \emph{minimal digital interface} -- a 4-dimensional bottleneck encoder together with a 16-unit transfer network, totalling \textbf{2,560 parameters} -- overcomes the conditioning barrier. When evaluated on activations drawn from a trained denoising U-Net, the complete system attains a decoder cosine similarity of \textbf{0.9906} against an oracle upper bound of 1.0000, while preserving theoretical net energy savings of approximately $10^7\times$ over GPU inference. These results constitute the first demonstration of trained-weight, production-scale thermodynamic diffusion inference.
Applying kernel methods to matchings is challenging due to their discrete, non-Euclidean nature. In this paper, we develop a principled framework for constructing geometric kernels that respect the natural geometry of the space of matchings. To this end, we first provide a complete characterization of stationary kernels, i.e. kernels that respect the inherent symmetries of this space. Because the class of stationary kernels is too broad, we specifically focus on the heat and Matérn kernel families, adding an appropriate inductive bias of smoothness to stationarity. While these families successfully extend widely popular Euclidean kernels to matchings, evaluating them naively incurs a prohibitive super-exponential computational cost. To overcome this difficulty, we introduce and analyze a novel, sub-exponential algorithm leveraging zonal polynomials for efficient kernel evaluation. Finally, motivated by the known bijective correspondence between matchings and phylogenetic trees-a crucial data modality in biology-we explore whether our framework can be seamlessly transferred to the space of trees, establishing novel negative results and identifying a significant open problem.
We derive a robust update rule for the online infinite hidden Markov model (iHMM) for when the streaming data contains outliers and the model is misspecified. Leveraging recent advances in generalised Bayesian inference, we define robustness via the posterior influence function (PIF), and provide conditions under which the online iHMM has bounded PIF. Imposing robustness inevitably induces an adaptation lag for regime switching. Our method, which is called Batched Robust iHMM (BR-iHMM), balances adaptivity and robustness with two additional tunable parameters. Across limit order book data, hourly electricity demand, and a synthetic high-dimensional linear system, BR-iHMM reduces one-step-ahead forecasting error by up to 67% relative to competing online Bayesian methods. Together with theoretical guarantees of bounded PIF, our results highlight the practicality of our approach for both forecasting and interpretable online learning.
Targeted amplicon panels are widely used in oncology diagnostics, but providing per-gene performance guarantees for copy number variant (CNV) detection remains challenging due to amplification artifacts, process-mismatch heterogeneity, and limited validation sample sizes. While Bayesian CNV callers naturally quantify per-sample uncertainty, translating this into the frequentist population-level guarantees required for clinical validation, coverage rates, false-positive bounds, and minimum detectable copy-number changes, is a fundamentally different inferential problem. We show empirically that even robust Bayesian credible intervals, including coarsened posteriors and sandwich-adjusted intervals, are severely miscalibrated on panels with small amplicon counts per gene. To address this, we propose a hybrid framework that evaluates Bayesian posterior functionals on validation samples and models the resulting squared losses with a Gamma distribution, yielding tolerance intervals with valid frequentist coverage. Three components make the method practical under real-world constraints: (1) imputation that removes the influence of true CNV-positive samples without requiring known ground truth, (2) regularization to address small sample variability, and (3) evidence-based stratification on the log model evidence to accommodate non-exchangeable noise profiles arising from process mismatch. Evaluated on two targeted amplicon panels using leave-one-out cross-validation, the proposed method achieves single-digit mean absolute coverage error across all genes under both process-matched and unmatched conditions, whereas Bayesian comparators exhibit mean absolute errors exceeding 60\% on clinically relevant genes such as ERBB2.
Tensor networks were developed in the context of many-body physics as compressed representations of multiparticle quantum states. These representations mitigate the exponential complexity of many-body systems by capturing only the most relevant dependencies. Due to the formal similarity between quantum entanglement and statistical correlations, tensor networks have recently been integrated in machine learning, operating both as alternative learning architectures and as decompositions of components of neural networks. The expectation is that the theoretical understanding of tensor networks developed within quantum many-body physics leads to novel methods that offer advantages in terms of computational efficiency, explainability, or privacy. Here we review the use of tensor networks in the context of machine learning, providing a critical assessment of the state of the art, the potential advantages, and the challenges that must be overcome.
While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its potential is fundamentally bounded by the base model's existing output distribution. Optimizing the marginal distribution P(y) in the Pre-train Space addresses this bottleneck by encoding reasoning ability and preserving broad exploration capacity. Yet, conventional pre-training relies on static corpora for passive learning, leading to a distribution shift that hinders targeted reasoning enhancement. In this paper, we introduce PreRL (Pre-train Space RL), which applies reward-driven online updates directly to P(y). We theoretically and empirically validate the strong gradient alignment between log P(y) and log P(y|x), establishing PreRL as a viable surrogate for standard RL. Furthermore, we uncover a critical mechanism: Negative Sample Reinforcement (NSR) within PreRL serves as an exceptionally effective driver for reasoning. NSR-PreRL rapidly prunes incorrect reasoning spaces while stimulating endogenous reflective behaviors, increasing transition and reflection thoughts by 14.89x and 6.54x, respectively. Leveraging these insights, we propose Dual Space RL (DSRL), a Policy Reincarnation strategy that initializes models with NSR-PreRL to expand the reasoning horizon before transitioning to standard RL for fine-grained optimization. Extensive experiments demonstrate that DSRL consistently outperforms strong baselines, proving that pre-train space pruning effectively steers the policy toward a refined correct reasoning subspace.
As language models are increasingly deployed for complex autonomous tasks, their ability to reason accurately over longer horizons becomes critical. An essential component of this ability is planning and managing a long, complex chain-of-thought (CoT). We introduce LongCoT, a scalable benchmark of 2,500 expert-designed problems spanning chemistry, mathematics, computer science, chess, and logic to isolate and directly measure the long-horizon CoT reasoning capabilities of frontier models. Problems consist of a short input with a verifiable answer; solving them requires navigating a graph of interdependent steps that span tens to hundreds of thousands of reasoning tokens. Each local step is individually tractable for frontier models, so failures reflect long-horizon reasoning limitations. At release, the best models achieve <10% accuracy (GPT 5.2: 9.8%; Gemini 3 Pro: 6.1%) on LongCoT, revealing a substantial gap in current capabilities. Overall, LongCoT provides a rigorous measure of long-horizon reasoning, tracking the ability of frontier models to reason reliably over extended periods.
Evaluating LLMs is challenging, as benchmark scores often fail to capture models' real-world usefulness. Instead, users often rely on ``vibe-testing'': informal experience-based evaluation, such as comparing models on coding tasks related to their own workflow. While prevalent, vibe-testing is often too ad hoc and unstructured to analyze or reproduce at scale. In this work, we study how vibe-testing works in practice and then formalize it to support systematic analysis. We first analyze two empirical resources: (1) a survey of user evaluation practices, and (2) a collection of in-the-wild model comparison reports from blogs and social media. Based on these resources, we formalize vibe-testing as a two-part process: users personalize both what they test and how they judge responses. We then introduce a proof-of-concept evaluation pipeline that follows this formulation by generating personalized prompts and comparing model outputs using user-aware subjective criteria. In experiments on coding benchmarks, we find that combining personalized prompts and user-aware evaluation can change which model is preferred, reflecting the role of vibe-testing in practice. These findings suggest that formalized vibe-testing can serve as a useful approach for bridging benchmark scores and real-world experience.
Rhetorical questions are asked not to seek information but to persuade or signal stance. How large language models internally represent them remains unclear. We analyze rhetorical questions in LLM representations using linear probes on two social-media datasets with different discourse contexts, and find that rhetorical signals emerge early and are most stably captured by last-token representations. Rhetorical questions are linearly separable from information-seeking questions within datasets, and remain detectable under cross-dataset transfer, reaching AUROC around 0.7-0.8. However, we demonstrate that transferability does not simply imply a shared representation. Probes trained on different datasets produce different rankings when applied to the same target corpus, with overlap among the top-ranked instances often below 0.2. Qualitative analysis shows that these divergences correspond to distinct rhetorical phenomena: some probes capture discourse-level rhetorical stance embedded in extended argumentation, while others emphasize localized, syntax-driven interrogative acts. Together, these findings suggest that rhetorical questions in LLM representations are encoded by multiple linear directions emphasizing different cues, rather than a single shared direction.
Given two symmetric positive-definite matrices $A, B \in \mathbb{R}^{n \times n}$, we study the spectral properties of the interpolation $A^{1-x} B^x$ for $0 \leq x \leq 1$. The presence of `common structures' in $A$ and $B$, eigenvectors pointing in a similar direction, can be investigated using this interpolation perspective. Generically, exact log-linearity of the operator norm $\|A^{1-x} B^x\|$ is equivalent to the existence of a shared eigenvector in the original matrices; stability bounds show that approximate log-linearity forces principal singular vectors to align with leading eigenvectors of both matrices. These results give rise to and provide theoretical justification for a multi-manifold learning framework that identifies common and distinct latent structures in multiview data.
Search agents extend Large Language Models (LLMs) beyond static parametric knowledge by enabling access to up-to-date and long-tail information unavailable during pretraining. While reinforcement learning has been widely adopted for training such agents, existing approaches face key limitations: process supervision often suffers from unstable value estimation, whereas outcome supervision struggles with credit assignment due to sparse, trajectory-level rewards. To bridge this gap, we propose Contribution-Weighted GRPO (CW-GRPO), a framework that integrates process supervision into group relative policy optimization. Instead of directly optimizing process rewards, CW-GRPO employs an LLM judge to assess the retrieval utility and reasoning correctness at each search round, producing per-round contribution scores. These scores are used to rescale outcome-based advantages along the trajectory, enabling fine-grained credit assignment without sacrificing optimization stability. Experiments on multiple knowledge-intensive benchmarks show that CW-GRPO outperforms standard GRPO by 5.0\% on Qwen3-8B and 6.3\% on Qwen3-1.7B, leading to more effective search behaviors. Additional analysis reveals that successful trajectories exhibit concentrated contributions across rounds, providing empirical insight into search agent tasks.
Sequential recommendation has become increasingly prominent in both academia and industry, particularly in e-commerce. The primary goal is to extract user preferences from historical interaction sequences and predict items a user is likely to engage with next. Recent advances have leveraged contrastive learning and graph neural networks to learn more expressive representations from interaction histories -- graphs capture relational structure between nodes, while ID-based representations encode item-specific information. However, few studies have explored multi-view contrastive learning between ID and graph perspectives to jointly improve user and item representations, especially in settings where only interaction data is available without auxiliary information. To address this gap, we propose Multi-View Contrastive learning for sequential recommendation (MVCrec), a framework that integrates complementary signals from both sequential (ID-based) and graph-based views. MVCrec incorporates three contrastive objectives: within the sequential view, within the graph view, and across views. To effectively fuse the learned representations, we introduce a multi-view attention fusion module that combines global and local attention mechanisms to estimate the likelihood of a target user purchasing a target item. Comprehensive experiments on five real-world benchmark datasets demonstrate that MVCrec consistently outperforms 11 state-of-the-art baselines, achieving improvements of up to 14.44\% in NDCG@10 and 9.22\% in HitRatio@10 over the strongest baseline. Our code and datasets are available at https://github.com/sword-Lz/MMCrec.
Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold. Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes: at small batch sizes it converges to a lower plateau $2(1-β)/η$, reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau $2(1+β)/η$, where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics. We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling.
We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on reparameterized policy gradient, which are difficult to scale to large generative models, or on reject sampling, which can be overly conservative when attempting to move beyond the behavior support. In this paper, we propose Value Gradient Flow (VGF), a scalable new paradigm for behavior-regularized RL. VGF casts behavior-regularized RL as an optimal transport problem that maps the reference distribution to the value-induced optimal policy distribution. We solve this transport problem via discrete gradient flow, where value gradients guide particles initialized from the reference distribution. Our analysis shows that VGF imposes regularization implicitly by controlling the transport budget. VGF eliminates explicit policy parameterization while remaining expressive and flexible, this enables adaptive test-time scaling by adjusting the transport budget. Extensive experiments demonstrate that VGF significantly outperforms prior methods, achieving state-of-the-art results on offline RL benchmarks (D4RL, OGBench) and LLM RL tasks. Code and runs can be found at https://ryanxhr.github.io/vgf.
On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally, but existing views of token importance are incomplete. We ask a direct question: which tokens carry the most useful learning signal in OPD? Our answer is that informative tokens come from two regions: positions with high student entropy, and positions with low student entropy plus high teacher--student divergence, where the student is overconfident and wrong. Empirically, student entropy is a strong first-order proxy: retaining $50\%$ of tokens with entropy-based sampling matches or exceeds all-token training while reducing peak memory by up to $47\%$. But entropy alone misses a second important region. When we isolate low-entropy, high-divergence tokens, training on fewer than $10\%$ of all tokens nearly matches full-token baselines, showing that overconfident tokens carry dense corrective signal despite being nearly invisible to entropy-only rules. We organize these findings with TIP (Token Importance in on-Policy distillation), a two-axis taxonomy over student entropy and teacher--student divergence, and give a theoretical explanation for why entropy is useful yet structurally incomplete. This view motivates type-aware token selection rules that combine uncertainty and disagreement. We validate this picture across three teacher--student pairs spanning Qwen3, Llama, and Qwen2.5 on MATH-500 and AIME 2024/2025, and on the DeepPlanning benchmark for long-horizon agentic planning, where Q3-only training on $<$$20\%$ of tokens surpasses full-token OPD. Our experiments are implemented by extending the OPD repository https://github.com/HJSang/OPSD_OnPolicyDistillation, which supports memory-efficient distillation of larger models under limited GPU budgets.
Accurate detection and segmentation of glomeruli in kidney tissue are essential for diagnostic applications. Traditional deep learning methods primarily rely on semantic segmentation, which often fails to precisely delineate adjacent glomeruli. To address this challenge, we propose a novel glomerulus detection and segmentation model that emphasises boundary separation. Leveraging pathology foundation models, the proposed U-Net-based architecture incorporates a specialised attention decoder designed to highlight critical regions and improve instancelevel segmentation. Experimental evaluations demonstrate that our approach surpasses state-of-the-art methods in both Dice score and Intersection over Union, indicating superior performance in glomerular delineation.
We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.
Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended model decreases the error rate by 42%. Multiple ablation studies show that all components of our architectures are essential.
GUI grounding models report over 85% accuracy on standard benchmarks, yet drop 27-56 percentage points when instructions require spatial reasoning rather than direct element naming. Current benchmarks miss this because they evaluate each screenshot once with a single fixed instruction. We introduce GUI-Perturbed, a controlled perturbation framework that independently varies visual scenes and instructions to measure grounding robustness. Evaluating three 7B models from the same architecture lineage, we find that relational instructions cause systematic accuracy collapse across all models, a 70% browser zoom produces statistically significant degradation, and rank-8 LoRA fine-tuning with augmented data degrades performance rather than improving it. By perturbing along independent axes, GUI-Perturbed isolates which specific capability axes are affected-spatial reasoning, visual robustness, reasoning calibration-providing diagnostic signal that aggregate benchmarks cannot. We release the dataset, augmentation pipeline, and a fine-tuned model.
This paper provides a systematic comparison between Fitted Dynamic Programming (DP), where demand is estimated from data, and Reinforcement Learning (RL) methods in finite-horizon dynamic pricing problems. We analyze their performance across environments of increasing structural complexity, ranging from a single typology benchmark to multi-typology settings with heterogeneous demand and inter-temporal revenue constraints. Unlike simplified comparisons that restrict DP to low-dimensional settings, we apply dynamic programming in richer, multi-dimensional environments with multiple product types and constraints. We evaluate revenue performance, stability, constraint satisfaction behavior, and computational scaling, highlighting the trade-offs between explicit expectation-based optimization and trajectory-based learning.
Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse rewards, weak credit assignment, and limited labeled data. Self-play offers a scalable route to reduce data dependence, but conventional self-play optimizes students only through sparse outcome rewards, leading to low learning efficiency. In this work, we observe that self-play naturally produces a question construction path (QCP) during task generation, an intermediate artifact that captures the reverse solution process. This reveals a new source of privileged information for self-distillation: self-play can itself provide high-quality privileged context for the teacher model in a low-cost and scalable manner, without relying on human feedback or curated privileged information. Leveraging this insight, we propose Privileged Information Self-Play ($π$-Play), a multi-agent self-evolution framework. In $π$-Play, an examiner generates tasks together with their QCPs, and a teacher model leverages QCP as privileged context to densely supervise a student via self-distillation. This design transforms conventional sparse-reward self-play into a dense-feedback self-evolution loop. Extensive experiments show that data-free $π$-Play surpasses fully supervised search agents and improves evolutionary efficiency by 2-3$\times$ over conventional self-play.
Parameter space is not function space for neural network architectures. This fact, investigated as early as the 1990s under terms such as ``reverse engineering," or ``parameter identifiability", has led to the natural question of parameter space symmetries\textemdash the study of distinct parameters in neural architectures which realize the same function. Indeed, the quotient space obtained by identifying parameters giving rise to the same function, called the \textit{neuromanifold}, has been shown in some cases to have rich geometric properties, impacting optimization dynamics. Thus far, techniques towards complete classifications have required the analyticity of the activation function, notably excising the important case of ReLU. Here, in contrast, we exploit the non-differentiability of the ReLU activation to provide a complete classification of the symmetries in the shallow case.
Fairness in algorithmic decision-making is often defined in the predictive space, where predictive performance - used as a proxy for decision-maker (DM) utility - is traded off against prediction-based fairness notions, such as demographic parity or equality of opportunity. This perspective, however, ignores how predictions translate into decisions and ultimately into utilities and welfare for both DM and decision subjects (DS), as well as their allocation across social-salient groups. In this paper, we propose a multi-stakeholder framework for fair algorithmic decision-making grounded in welfare economics and distributive justice, explicitly modeling the utilities of both the DM and DS, and defining fairness via a social planner's utility that captures inequalities in DS utilities across groups under different justice-based fairness notions (e.g., Egalitarian, Rawlsian). We formulate fair decision-making as a post-hoc multi-objective optimization problem, characterizing the achievable performance-fairness trade-offs in the two-dimensional utility space of DM utility and the social planner's utility, under different decision policy classes (deterministic vs. stochastic, shared vs. group-specific). Using the proposed framework, we then identify conditions (in terms of the stakeholders' utilities) under which stochastic policies are more optimal than deterministic ones, and empirically demonstrate that simple stochastic policies can yield superior performance-fairness trade-offs by leveraging outcome uncertainty. Overall, we advocate a shift from prediction-centric fairness to a transparent, justice-based, multi-stakeholder approach that supports the collaborative design of decision-making policies.
Reinforcement learning has shown promise for automating power-grid operation tasks such as topology control and congestion management. However, its deployment in real-world power systems remains limited by strict safety requirements, brittleness under rare disturbances, and poor generalization to unseen grid topologies. In safety-critical infrastructure, catastrophic failures cannot be tolerated, and learning-based controllers must operate within hard physical constraints. This paper proposes a safety-constrained hierarchical control framework for power-grid operation that explicitly decouples long-horizon decision-making from real-time feasibility enforcement. A high-level reinforcement learning policy proposes abstract control actions, while a deterministic runtime safety shield filters unsafe actions using fast forward simulation. Safety is enforced as a runtime invariant, independent of policy quality or training distribution. The proposed framework is evaluated on the Grid2Op benchmark suite under nominal conditions, forced line-outage stress tests, and zero-shot deployment on the ICAPS 2021 large-scale transmission grid without retraining. Results show that flat reinforcement learning policies are brittle under stress, while safety-only methods are overly conservative. In contrast, the proposed hierarchical and safety-aware approach achieves longer episode survival, lower peak line loading, and robust zero-shot generalization to unseen grids. These results indicate that safety and generalization in power-grid control are best achieved through architectural design rather than increasingly complex reward engineering, providing a practical path toward deployable learning-based controllers for real-world energy systems.
Functional magnetic resonance imaging (fMRI) is widely used for studying and diagnosing brain disorders, with functional connectivity (FC) matrices providing powerful representations of large-scale neural interactions. However, existing diagnostic models are trained either on a single site or under full multi-site access, making them unsuitable for real-world scenarios where clinical data arrive sequentially from different institutions. This results in limited generalization and severe catastrophic forgetting. This paper presents the first continual learning framework specifically designed for fMRI-based diagnosis across heterogeneous clinical sites. Our framework introduces a structure-aware variational autoencoder that synthesizes realistic FC matrices for both patient and control groups. Built on this generative backbone, we develop a multi-level knowledge distillation strategy that aligns predictions and graph representations between new-site data and replayed samples. To further enhance efficiency, we incorporate a hierarchical contextual bandit scheme for adaptive replay sampling. Experiments on multi-site datasets for major depressive disorder (MDD), schizophrenia (SZ), and autism spectrum disorder (ASD) show that the proposed generative model enhances data augmentation quality, and the overall continual learning framework substantially outperforms existing methods in mitigating catastrophic forgetting. Our code is available at https://github.com/4me808/FORGE.
Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. For unconstrained optimization, we develop a first-order stochastic trust-region algorithm and show that, under the strong growth condition, it achieves an iteration and stochastic first-order oracle complexity of $O(\varepsilon^{-2} \log(1/\varepsilon))$ for finding an $\varepsilon$-stationary point. For equality-constrained problems, we introduce a quadratic-penalty-based stochastic trust-region method with penalty parameter $μ$, and establish an iteration and oracle complexity of $O(\varepsilon^{-4} \log(1/\varepsilon))$ to reach an $\varepsilon$-stationary point of the penalized problem, corresponding to an $O(\varepsilon)$-approximate KKT point of the original constrained problem. Numerical experiments on deep neural network training and orthogonally constrained subspace fitting demonstrate that the proposed methods achieve performance comparable to well-tuned stochastic baselines, while exhibiting stable optimization behavior and effectively handling hard constraints without manual learning-rate scheduling.
Multimodal Continual Instruction Tuning (MCIT) is essential for sequential task adaptation of Multimodal Large Language Models (MLLMs) but is severely restricted by catastrophic forgetting. While existing literature focuses on the reasoning language backbone, in this work, we expose a critical yet neglected dual-forgetting phenomenon across both perception drift in Cross-modal Projection Space and reasoning collapse in Low-rank Parameter Space. To resolve this, we present \textbf{MAny} (\textbf{M}erge \textbf{Any}thing), a framework that merges task-specific knowledge through \textbf{C}ross-modal \textbf{P}rojection \textbf{M}erging (\textbf{CPM}) and \textbf{L}ow-rank \textbf{P}arameter \textbf{M}erging (\textbf{LPM}). Specifically, CPM recovers perceptual alignment by adaptively merging cross-modal visual representations via visual-prototype guidance, ensuring accurate feature recovery during inference. Simultaneously, LPM eliminates mutual interference among task-specific low-rank modules by recursively merging low-rank weight matrices. By leveraging recursive least squares, LPM provides a closed-form solution that mathematically guarantees an optimal fusion trajectory for reasoning stability. Notably, MAny operates as a training-free paradigm that achieves knowledge merging via efficient CPU-based algebraic operations, eliminating additional gradient-based optimization beyond initial tuning. Our extensive evaluations confirm the superior performance and robustness of MAny across multiple MLLMs and benchmarks. Specifically, on the UCIT benchmark, MAny achieves significant leads of up to 8.57\% and 2.85\% in final average accuracy over state-of-the-art methods across two different MLLMs, respectively.
Supervised Fine-Tuning (SFT) of large language models often suffers from task interference and catastrophic forgetting. Recent approaches alleviate this issue by isolating task-critical parameters during training. However, these methods represent a static solution to a dynamic problem, assuming that parameter importance remains fixed once identified. In this work, we empirically demonstrate that parameter importance exhibits temporal drift over the course of training. To address this, we propose Evolving Parameter Isolation (EPI), a fine-tuning framework that adapts isolation decisions based on online estimates of parameter importance. Instead of freezing a fixed subset of parameters, EPI periodically updates isolation masks using gradient-based signals, enabling the model to protect emerging task-critical parameters while releasing outdated ones to recover plasticity. Experiments on diverse multi-task benchmarks demonstrate that EPI consistently reduces interference and forgetting compared to static isolation and standard fine-tuning, while improving overall generalization. Our analysis highlights the necessity of synchronizing isolation mechanisms with the evolving dynamics of learning diverse abilities.
Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a special case of policy gradient optimization with an extremely sparse implicit reward and unstable inverse-probability weighting, which together lead to single-path dependency, entropy collapse, and gradient explosion. Motivated by this diagnosis, we propose Group Fine-Tuning (GFT), a unified post-training framework that addresses these intrinsic limitations through two mechanisms: Group Advantage Learning, which constructs diverse response groups and derives normalized contrastive supervision to alleviate reward sparsity, and Dynamic Coefficient Rectification, which adaptively bounds inverse-probability weights to stabilize optimization while preserving efficient knowledge injection. Experiments demonstrate that GFT consistently surpasses SFT-based methods and yields policies that integrate more smoothly with subsequent RL training.
Diffusion language models have recently emerged as a leading alternative to standard language models, due to their ability for bidirectional attention and parallel text generation. In this work, we explore variants for their use in speech recognition. Specifically, we introduce a comprehensive guide to incorporating masked diffusion language models (MDLM) and uniform-state diffusion models (USDMs) for rescoring ASR hypotheses. Additionally, we design a new joint-decoding method that combines CTC and USDM by integrating the framewise probability distributions derived from CTC with the labelwise probability distributions computed by USDM at each decoding step, thereby generating new candidates that combine strong language knowledge from USDM and acoustic information from CTC. Our findings reveal that USDM, as well as MDLM, can significantly improve the accuracy of recognized text. We publish all our code and recipes.
Accurate methane sorption prediction across heterogeneous coal ranks requires models that combine thermodynamic consistency, efficient knowledge transfer across data-scarce geological systems, and calibrated uncertainty estimates, capabilities that are rarely addressed together in existing frameworks. We present a physics-informed transfer learning framework that adapts a hydrogen sorption PINN to methane sorption prediction via Elastic Weight Consolidation, coal-specific feature engineering, and a three-phase curriculum that progressively balances transfer preservation with thermodynamic fine-tuning. Trained on 993 equilibrium measurements from 114 independent coal experiments spanning lignite to anthracite, the framework achieves R2 = 0.932 on held-out coal samples, a 227% improvement over pressure-only classical isotherms, while hydrogen pre-training delivers 18.9% lower RMSE and 19.4% faster convergence than random initialization. Five Bayesian uncertainty quantification approaches reveal a systematic divergence in performance across physics-constrained architectures. Monte Carlo Dropout achieves well-calibrated uncertainty at minimal overhead, while deep ensembles, regardless of architectural diversity or initialization strategy, exhibit performance degradation because shared physics constraints narrow the admissible solution manifold. SHAP and ALE analyses confirm that learned representations remain physically interpretable and aligned with established coal sorption mechanisms: moisture-volatile interactions are most influential, pressure-temperature coupling captures thermodynamic co-dependence, and features exhibit non-monotonic effects. These results identify Monte Carlo Dropout as the best-performing UQ method in this physics-constrained transfer learning framework, and demonstrate cross-gas transfer learning as a data-efficient strategy for geological material modeling.
Large language models (LLMs) are prone to generating factually incorrect outputs. Recent work has applied conformal prediction to provide uncertainty estimates and statistical guarantees for the factuality of LLM generations. However, existing approaches are typically not prompt-adaptive, limiting their ability to capture input-dependent variability. As a result, they may filter out too few items (leading to over-coverage) or too many (under-coverage) for a given task or prompt. We propose an adaptive conformal prediction approach that extends conformal score transformation methods to LLMs, with applications to long-form generation and multiple-choice question answering. This enables prompt-dependent calibration, retaining marginal coverage guarantees while improving conditional coverage. In addition, the approach naturally supports selective prediction, allowing unreliable claims or answer choices to be filtered out in downstream applications. We evaluate our approach on multiple white-box models across diverse domains and show that it significantly outperforms existing baselines in terms of conditional coverage.
Objective: Investigate whether hypnogram 'realism' can be used to guide an unsupervised method for handling arbitrary types of signal degradation in mobile sleep monitoring. Approach: Combining a pretrained, state-of-the-art 'u-sleep' model with a 'discriminator' network, we align features from a target domain with a feature space learned during pretraining. To test the approach, we distort the source domain with realistic signal degradations, to see how well the method can adapt to different types of degradation. We compare the performance of the resulting model with best-case models designed in a supervised manner for each type of transfer. Main Results: Depending on the type of distortion, we find that the unsupervised approach can increase Cohen's kappa with as little as 0.03 and up to 0.29, and that for all transfers, the method does not decrease performance. However, the approach never quite reaches the estimated theoretical optimal performance, and when tested on a real-life domain mismatch between two sleep studies, the benefit was insignificant. Significance: 'Discriminator-guided fine tuning' is an interesting approach to handling signal degradation for 'in the wild' sleep monitoring, with some promise. In particular, what it says about sleep data in general is interesting. However, more development will be necessary before using it 'in production'.
Predicting the effects of perturbations in-silico on cell state can identify drivers of cell behavior at scale and accelerate drug discovery. However, modeling challenges remain due to the inherent heterogeneity of single cell gene expression and the complex, latent gene dependencies. Here, we present PRiMeFlow, an end-to-end flow matching based approach to directly model the effects of genetic and small molecule perturbations in the gene expression space. The distribution-fitting approach taken by PRiMeFlow enables it to accurately approximate the empirical distribution of single-cell gene expression, which we demonstrate through extensive benchmarking inside PerturBench. Through ablation studies, we also validate important model design choices such as operating in gene expression space and parameterizing the velocity field with a U-Net architecture. The PRiMeFlow architecture was used as the basis for the model that won the Generalist Prize in the first ARC Virtual Cell Challenge.
New data for the total inclusive helicity-dependent cross section for the proton and deuteron were obtained in the photon energy interval 200-1400 MeV. The experiment was performed at the A2 tagged-photon facility of the Mainz Microtron (MAMI) using a circularly polarized photon beam and longitudinally polarized proton and deuteron targets. The reaction products were detected using the large-acceptance Crystal Ball/TAPS calorimeter, which covers 97% of the full solid angle. These new results, obtained with fine energy binning, significantly expand both the quantity and the quality of the available data for these observables and enable a detailed comparison with state-of-the-art theoretical calculations. From the combination of the results for the deuteron and the proton, important information could also be extracted for the free neutron. Based on these data, and using existing models to evaluate the missing contributions from unmeasured photon energy regions, the validity of the Gerasimov-Drell-Hearn (GDH) sum rule has been verified for the proton, the neutron, and the deuteron. These new data provide a precise experimental benchmark for theoretical models used to study nucleons, both in their free state and when embedded in the nuclear medium.
We present an extraction of unpolarized quark transverse-momentum-dependent parton distribution functions (TMD PDFs) from Drell-Yan data within a Bayesian inference framework, incorporating artificial intelligence at multiple stages of the analysis. Our analysis is performed at ${\rm N^3LO}$ in perturbative QCD combined with ${\rm N^4LL}$ resummation accuracy. We first employ an AI-driven iterative procedure to explore and rank candidate functional forms for the nonperturbative contributions to TMD PDFs at the initial scale, as well as for the Collins-Soper evolution kernel, using $χ^2$ fits and physics constraints. To enable efficient Bayesian inference, we construct a surrogate model for TMD cross sections by training a machine-learning emulator over the parameter space, replacing computationally expensive repeated evaluations and allowing scalable sampling with an affine-invariant Markov Chain Monte Carlo (MCMC) ensemble. Using this framework, we perform a global analysis of Drell-Yan data from fixed-target, RHIC, and LHC experiments and extract TMD PDFs with quantified uncertainties. We compare the results with those obtained using the replica method and highlight differences in the resulting uncertainty estimates.
The STAR experiment at the Relativistic Heavy Ion Collider presents measurements of correlations between charged hadron triggers of high transverse momenta ($7 < p_{\rm T} < 30$ GeV/$c$) with recoiling charged hadrons ($3 < p_{\rm T} < 7$ GeV/$c$) or charged--particle jets ($p_{\rm T, jet} > 8$ GeV/$c$) in event--activity selected O+O collisions at $\sqrt{s_{\mathrm {NN}}}=200$ GeV. Yields of associated hadrons and jets, normalized by the number of trigger hadrons, are suppressed by approximately 20\% in high event activity relative to low event activity collisions, with an absence of suppression excluded with high significance. This suppression corresponds to a shift in p_{\rm T} of $0.70\pm0.15~(\rm stat.)~\pm0.10~(\rm syst.)$ GeV/$c$ for large--radius charged--particle jets ($R=0.5$), quantifying their energy redistribution due to final--state interactions. These measurements provide strong evidence for jet quenching in O+O collisions at $\sqrt{s_\mathrm{NN}}=200$ GeV, offering new insight into quark--gluon plasma formation in small collision systems.
We examine the leading-power fragmentation of fully heavy pentaquarks in high-energy hadronic collisions. To this end, we complete the release of the hadron-structure-oriented PQ5Q1.0 fragmentation functions, by discussing the $P_{5c}$ set and delivering the $P_{5b}$ one. These functions incorporate an improved computation of the initial-scale input for the constituent heavy-quark fragmentation channel, making them particularly suitable for describing both the direct formation of a compact multicharm state and the hadronization from a diquark-antiquark-diquark configuration. For phenomenological applications, we employ the data-validated (sym)JETHAD framework to compute and analyze NLL/NLO$^+$ semi-inclusive production rates of pentaquark-plus-jet systems at the upcoming HL-LHC and the future FCC. This study marks a further step toward connecting hadronic structure, precision QCD, and the emerging physics of exotic matter.
The spectroscopy of $^{11}$Be is explored using the $^{10}$Be$(d,p)$$^{11}$Be transfer reaction performed in inverse kinematics at $9.6\,\MeV/u$ using the Active Target Time Projection Chamber (AT-TPC) inside the SOLARIS solenoid. This experiment is the first attempt at coupling the AT-TPC with SOLARIS to perform a high luminosity transfer reaction measurement without compromising excitation energy and scattering angle resolutions. The angular momentum transfer for states up to $3.40\,\MeV$ are determined from distorted-wave Born approximation analysis of the measured angular distributions, from which the corresponding spectroscopic factors are deduced. These factors are compared with those from various shell model interactions, and those for the $3.40\,\MeV$ state are consistent with a positive parity assignment. Recent \textit{ab initio} no-core configuration interaction (NCCI) calculations with various nucleon-nucleon interactions are presented for the low-lying positive parity states of $^{11}$Be. The excitation energies produced using the Daejeon16 interaction are in good agreement with those found from both this experiment and the literature, thus supporting a positive parity assignment. The $3.40\,\MeV$ state, if assigned a tentative $J^π=3/2^+$, would then correspond to the second excited state of the $K^P=1/2^+$ one-neutron halo ground state rotational band also predicted from such NCCI calculations.
The interaction of neutrons and nuclei at low energies may potentially lead to scattering lengths several orders of magnitude larger than the effective range of the interaction, well beyond the nuclear scale. If such cases existed, they could lead to the observation of the Efimov effect in nuclei, a remarkable universal phenomenon that has been observed only in atoms. The interaction parameters of neutrons scattering off unstable nuclei can be explored in neutron-nucleus systems created after the fast removal of a few nucleons from a slightly heavier beam. The case of the $^{17}$B-$n$ system is considered, and the implications of its potentially huge scattering length on the structure of $^{19}$B as a $^{17}$B-$n$-$n$ Efimov trimer are discussed.
The study of spin polarization of $Λ$ hyperons in ultrarelativistic heavy-ion collisions provides insights into the angular momentum and vortical structure of the possible existence of QGP. The present study examines the global spin polarization of $Λ$ hyperons using a second-order relativistic viscous hydrodynamic framework that incorporates medium vorticity, shear viscosity, and evolving magnetic fields. It explores thermal vorticity evolution in relativistic heavy-ion collisions and evaluates its value at the decoupling isothermal freeze-out surface. We quantify the contributions of thermal vorticity and magnetic field to the global spin polarization of $Λ$ hyperons. Comparing results with recent ALICE measurements in Pb+Pb collisions at $\sqrt{s_{NN}}$ = 2.76 and 5.02 TeV shows qualitative agreement, offering new insights into the vortical structure of QCD matter. It also explores the relationship between magnetic and rotational dynamics, with implications for spin polarization at RHIC and LHC energies.
Neutrinoless double beta decay is a hypothetical nuclear transition whose observation would demonstrate that neutrinos are their own antiparticles and that lepton number is not conserved, with far-reaching implications for the origin of neutrino mass and the matter-antimatter imbalance in the Universe. This review examines the theoretical foundations of this process and surveys the principal experimental strategies developed to search for it, focusing on their operating concepts, strengths, and limitations. We summarize the current experimental landscape by presenting the most sensitive results achieved so far and by outlining the complementary approaches pursued by different detection techniques. Finally, we discuss the future direction of the field, emphasizing the technological advances needed to reach substantially better sensitivities and, ultimately, to detect this rare phenomenon
Comparison of two probability density/mass functions (PDF/PMFs) is ubiquitous in various forms of scientific analysis, including machine learning, optimization problems, and hypothesis tests. A copious amount of distance metrics have already been proposed and are regularly being used in this regard. In this document, we report a data-driven systematic comparison among a few of such metrics. The metrics considered here are Hellinger distance, Wasserstein distances (1D), $\sqrt{JS}$ distance, $L_\infty$ norm, Kolmogorov-Smirnov distance, and Fisher-Rao metric. We perform this comparison using electron and photon events from a decaying \iso{Kr}{83} isotope, collected through an HPGe spectrometer operating under cryo-vacuum conditions. To accomplish this, first, a dimensionless Parameter of Interest (PoI) was established, then PDF/PMFs were generated from the data, and finally the stabilities of the PoI under various criteria, such as sample size, discretization length, and normalizing functions, were studied and the results were summarized. In this report, we also propose a list of properties that a normalizing function should have and utilize them in the comparison.
A new technique is developed to identify dielectrons (e$^+$e$^-$) with Lorentz boost $γ_\mathrm{L}$ $\gt$ 20 that produce one single merged cluster in the electromagnetic calorimeter of the CMS detector. The identification uses two multivariate models: one for the case where both electron tracks are reconstructed, and another where only one of the tracks is reconstructed. The efficiency is determined using proton-proton collision data collected at a center-of-mass energy of 13 TeV. Boosted J/$ψ$ mesons decaying into e$^+$e$^-$ pairs are used to estimate the efficiency of the model with two tracks, yielding an overall efficiency of 80%. The Z $\to$ $μ^+μ^-γ$ events, where the photon converts into a collimated dielectron, are used for the model with a single track, yielding an efficiency of about 60%. A dedicated energy correction for dielectron candidates is also developed using B$^\pm$ $\to$ J/$ψ$K$^\pm$ $\to$ e$^+$e$^-$K$^\pm$ data.
The BM@N experiment (Baryonic Matter at the Nuclotron) is the first fixed-target experiment at the JINR NICA accelerator complex. In this work, data on the interactions of a carbon-ion beam with kinetic energies of 4.0A~GeV and 4.5A~GeV with C, Al, Cu, and Pb targets are used to measure transverse momentum spectra and rapidity distributions of $Λ$ hyperon yields. The results are compared with the predictions of DCM-SMM, UrQMD, and PHSD transport models and with the $Λ$ yield measurements in other experiments at similar collision energies.
We present a lattice QCD calculation of the electric polarizability of the charged kaon using a four-point function approach, which is the Euclidean analog of low-energy Compton scattering. In the case of the charged kaon, the polarizability is separated into an elastic (Born) term, determined from the charge radius extracted via the kaon electromagnetic form factor, and an inelastic (non-Born) term obtained from the time-integrated difference of four-point correlation functions. Our study employs 500 configurations of Wilson quenched $24^3\times 48$ lattices, and we compute connected diagrams as a proof of principle. From this analysis, we obtain values for the charged kaon electric polarizability of $α_E = (0.988 \pm 0.534) \times 10^{-4}\;\mathrm{fm}^3$ as well as $\langle r_E^2\rangle =0.3303\pm 0.0028\;\mathrm{fm}^2$ for the squared kaon charge radius, after extrapolation to the physical pion mass. The study demonstrates the applicability of the four-point function framework to strange mesons, extends previous four-point function polarizability studies, and provides a foundation for future calculations with increased statistics, dynamical fermions, and improved control of systematic uncertainties.
Spin dependent phenomena in inclusive hadron production have been extensively investigated, yet their microscopic origin and universality across different hadrons are still not fully understood. In particular, it is presently unknown whether antiprotons produced in unpolarized hadronic collisions can acquire a transverse polarization as a result of spin dependent $\bar{p}N$ interactions and nonperturbative hadronization mechanisms. Establishing the presence or absence of such an effect would provide new empirical constraints on the spin structure of the antinucleon-nucleon interaction, which is only weakly constrained by existing data. In this work, we investigate the experimental feasibility of a first dedicated measurement of the transverse polarization of antiprotons produced in proton-nucleus collisions. The polarization is accessed through the left-right asymmetry in elastic $\bar{p}p$ scattering in the Coulomb Nuclear Interference region. Based on detailed Monte Carlo simulations of the proposed experimental setup at the European Organization for Nuclear Research (CERN), we estimate the statistical sensitivity required to detect a certain degree of polarization.
Quasiparticle poisoning following particle impacts poses a significant challenge to the development of fault-tolerant superconducting quantum computers, as a sudden excess of quasiparticles can simultaneously degrade the coherence of multiple qubits across large device arrays. In this work, we present a statistical analysis that models the time evolution of radiation-induced qubit energy relaxation through quasiparticle density dynamics. This study provides insight into quasiparticle loss processes by distinguishing between recombination and trapping decay channels and assessing their respective impact on qubit performance. We precisely measure quasiparticle recombination in multiple transmon qubits and uncover an unexpected dependence of qubit relaxation dynamics on deposited energy. By linking correlated relaxation events across qubits to ballistic phonon propagation, we introduce a statistical localization approach to extract the energy deposited in the substrate, which is in good agreement with Monte Carlo simulation. This work establishes the quantitative framework for using an arbitrary subset of superconducting transmon qubits in a QPU as energy-resolving witness particle detectors.
The precise determination of the parton distribution functions (PDFs) of the proton is an essential ingredient for LHC analyses, including for those at the upcoming High-Luminosity LHC. So far, PDFs are determined from global fits to binned low-dimensional data obtained from unfolded hard-scattering cross section measurements. In this work we demonstrate for the first time the feasibility of neural simulation-based inference (NSBI) for constraining the proton PDFs using a high-dimensional unbinned data set. Exploiting the full statistical power of unbinned data removes the loss of information inherited by the binning procedure. As a proof-of-concept, we determine the gluon PDF from simulated data of top quark pair production at the LHC with $\sqrt{s}=13$ TeV. Taking into account both experimental and theoretical systematic uncertainties in the detector-level features, we demonstrate how the NSBI pipeline achieves significant improvements in precision compared to existing low-dimensional binned analyses. Our results illustrate the potential of unbinned inference to reduce the reliance on coarse approximations of uncertainties and their correlations entering PDF determinations, hence contributing to a new paradigm of unbinned detector-level ML-assisted measurements at the LHC.
We study the radiative decay of the $Z$ boson, $Z \to μ^+μ^-γ$, at the LHC, providing both Standard Model (SM) precision analysis and new physics projections. With detailed analysis of Run-2 and future HL-LHC performances, we demonstrate that this decay mode can be measured with a statistical precision at the sub-percentage level. From existing Run-1 data, we extract $\text{Br}^\text{fid}(Z \to μμγ) = (3.34 \pm 0.016)\times 10^{-4}$. We further explore the sensitivity of this channel to axion-like particles (ALPs) and to an anomalous $U(1)_X$ gauge boson coupled to the muon. Both scenarios feature resonant structures in the dimuon invariant mass spectrum within the $Z \to a/X + γ\to μ^+μ^-γ$ final state. Our results show that the radiative $Z$ decay provides a clean and statistically powerful probe of such leptophilic new physics, extending the current collider reach for ALPs and anomalous gauge forces down to $g_X \sim \mathcal{O}(10^{-3})$. This study highlights the potential of rare electroweak gauge boson decays as precision tests of the SM and sensitive probes of new interactions at the LHC.
Glueballs represent a fascinating aspect of the strong interaction in nature. Gluons that serve as the mediators of the strong interaction are massless particles, but they possess a property unique to the strong interaction called color charge, which is analogous to electric charge in the electromagnetic interaction. Glueballs are composed of multiple gluons and would be massless without color charges. The interaction of the color charges, however, makes glueballs becoming massive objects. Glueballs thus offer a unique way to study the mass creation of strongly interacting particles.
A precision measurement of the $W$-boson production cross-section is performed using the $W \to μν$ decay channel, based on a sample of proton-proton collision data collected by the LHCb experiment at $\sqrt{s}$ = 13 TeV and corresponding to an integrated luminosity of 5.1 $fb^{-1}$. The cross-section is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. The integrated production cross-sections of $W$ bosons are measured to be $$ \begin{array}{lcl} σ_{W^+ \to μ^+ν} &=& 1754.2 \pm 1.5 \pm 11.9 \pm 35.1\text{ pb} \\ σ_{W^- \to μ^-\barν} &=& 1178.1 \pm 1.3 \pm 9.7 \pm 23.6\text{ pb} \end{array} $$ where uncertainties are statistical, systematic, and due to the luminosity determination, respectively. Results are in good agreement with theoretical predictions at next-to-next-to-leading order in perturbative quantum chromodynamics. This measurement is significantly more precise than previous results in this kinematic regime.
The ultra-rare decay $K^+\toπ^+ν\barν$ is a golden mode in flavor physics. The Standard Model prediction for its branching ratio is below $10^{-10}$. This decay mode is highly sensitive to new physics models at mass scales up to $\mathcal{O}(100\,\mathrm{TeV})$. The NA62 experiment at CERN SPS is designed to measure this decay mode. A preliminary result of the branching ratio measurement using data collected in 2023--2024 is presented. With the new dataset, the NA62 experiment doubled its signal sample while reducing the background in proportion. Combining the data collected in 2016--2024, the branching ratio is measured to be $\mathcal{B}(K^+\toπ^+ν\barν) = \left(9.6^{+1.9}_{-1.8}\right)\times10^{-11}$. The result is compatible with the Standard Model prediction with a precision better than $20\%$.
The impact of open-flavor thresholds on the quarkonium spectrum has been a subject of study since the introduction of the Cornell potential and has been quantified through various phenomenological approaches, most notably the $^3P_0$ model. We revisit this problem using the Born--Oppenheimer effective field theory (BOEFT), an effective field theory systematically derived from QCD by exploiting hierarchies of energy scales and symmetries. Within the BOEFT, open-flavor threshold effects emerge from the mixing between quarkonium and tetraquark static potentials sharing the same Born--Oppenheimer quantum numbers. The shapes of the static potentials are constrained by lattice QCD calculations. Furthermore, we account for the distinctive behavior of the BOEFT tetraquark static potentials at short and large distances: at short distances they are repulsive, reflecting the color-octet configuration of the heavy quark-antiquark pair, while at large distances they asymptotically approach heavy-light meson-antimeson thresholds. To quantify threshold effects on the quarkonium spectrum below threshold, we solve a set of coupled Schrödinger equations dictated by the BOEFT, whose only free parameter, the adjoint meson mass, is fixed to the mass of the $χ_{c1}(3872)$ state. These coupled equations are solved both in the spin-isospin averaged threshold limit and, for the first time, including the spin splittings of the physical thresholds. We validate our results by computing the same threshold effects as self-energy corrections to the quarkonium propagator. We compare our predictions with existing experimental data and previous literature. Finally, we provide a field-theoretical interpretation of the pair-creation constant $γ$ appearing in the $^3P_0$ model.
A precision measurement of the muon charge asymmetry from $W$-boson decays in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The analysis utilizes data corresponding to an integrated luminosity of 5.1 $fb^{-1}$, recorded by the LHCb detector during 2016, 2017 and 2018. The asymmetry is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. This result represents the most precise determination of the muon charge asymmetry in the forward region to date, exhibiting excellent agreement with next-to-next-to-leading-order predictions in perturbative quantum chromodynamics.
The RELICS (REactor neutrino LIquid xenon Coherent elastic Scattering) experiment employs a dual-phase liquid xenon time projection chamber to search for Coherent Elastic Neutrino-Nucleus Scattering (CE$ν$NS) induced by reactor neutrinos. To detect these sub-keV nuclear recoils and minimize signal attenuation, it is critical to maintain a sufficiently low impurity concentration in the detector. This work presents a comprehensive purity evolution model developed to describe impurity migration inside the detector. Utilizing measured material outgassing rates as input parameters, the model incorporates non-uniform transport mechanisms of the impurities, including circulation, vaporization, and condensation. The model is validated using data from a dedicated prototype detector. Based on this validated model, projections for the purification performance of the upcoming RELICS-10 and RELICS-50 detectors are provided.
We calculate electromagnetic multipole moments of $Σ$-type strange hidden-charm pentaquarks $P^Σ_{ψs}$ (isospin triplet $Σ^+,Σ^0,Σ^-$) using QCD light-cone sum rules, with six (spin-1/2) and seven (spin-3/2) interpolating currents built from diquark-diquark-antiquark operators. We compute magnetic dipole $μ$ for all channels and, for spin-3/2, electric quadrupole ${\cal Q}$ and magnetic octupole ${\cal O}$ moments (first computation), and give the first quark-flavor decomposition. Scalar diquark currents yield charm-dominated, flavor-insensitive moments ($μ\in[-1.92,-1.21]μ_N$ for spin-1/2, $|μ|\lesssim1.2μ_N$ for spin-3/2), consistent with heavy-quark spin symmetry. Axial-vector diquark currents produce larger, flavor-sensitive moments with sign reversals governed by $e_u/e_d=-2$. For ${\cal Q}$, scalar-diquark currents give oblate deformations ($Q_0\approx-2.0\times10^{-2}{\rm fm}^2$) dominated by charm, while two-axial-vector-diquark currents predict prolate values up to $Q_0=+8.0\times10^{-2}{\rm fm}^2$, with sign reversal for $[su][uc]\bar{c}$ in two currents. Currents with scalar antiquark coupling yield a topology-independent octupole ${\cal O}\approx-0.25\times10^{-3}{\rm fm}^3$, a lattice QCD benchmark. Comparison with constituent quark models identifies four discriminants: $|μ|\gtrsim3μ_N$ in spin-1/2; sign of $μ$ for $[su][uc]\bar{c}$ in spin-3/2; non-zero ${\cal Q}$ (vanishes in $S$-wave molecules); and the ${\cal Q}$-${\cal O}$ sign correlation, probing $1/m_q$ weighting.
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$.
Future AI-based studies in particle physics will likely start from a foundation model to accelerate training and enhance sensitivity. As a step towards a general-purpose foundation model for particle physics, we investigate whether the OmniLearned foundation model pre-trained on diverse high-$Q^2$ simulated and real $pp$ and $ep$ collisions can be effectively transferred to a few-GeV fixed-target neutrino experiment. We process MINERvA neutrino--nucleus scattering events and evaluate pre-trained models on two types of tasks: regression of available energy and binary classification of charged-current pion final states ($\mathrm{CC1π^{\pm}}$, $\mathrm{CCNπ^{\pm}}$, and $\mathrm{CC1π^{0}}$). Pre-trained OmniLearned models consistently outperform similarly sized models trained from scratch, achieving better overall performance at the same compute budget, as well as achieving better performance at the same number of training steps. These results suggest that particle-level foundation models acquire inductive biases that generalize across large differences in energy scale, detector technology, and underlying physics processes, pointing toward a paradigm of detector-agnostic inference in particle physics.
Vector-like top partners with electric charge $+2/3$ are predicted in many extensions of the Standard Model and are actively searched for at the LHC through their electroweak decays $T\to Wb$, $Zt$, and $Ht$. More general scenarios, however, allow dipole interactions that induce radiative decays $T\to tγ$ and $T\to tg$. We reinterpret precision measurements of top-associated photon production to constrain such dipole operators. This approach provides a complementary probe to traditional resonance searches, which rely on direct reconstruction of heavy states, by instead exploiting distortions in precision observables. Using unfolded differential cross sections for $t\bar{t}γ$ production measured by CMS and the fiducial $t\bar{t}γγ$ cross section reported by ATLAS, we derive constraints on the electromagnetic and chromomagnetic dipole couplings of a vector like $T$ quark within an effective field theory framework. We present limits in terms of the effective couplings $c_{tγ}$ and $c_{tg}$, as well as the corresponding branching fractions $BR(T \to tγ)$ and $BR(T \to tg)$, for masses in the range $500~GeV \le m_T \le 2.0~TeV$. For $m_T = 500~GeV$, the analysis reaches sensitivity to the electromagnetic dipole coupling as small as $c_{tγ} \simeq 0.005~TeV^{-1}$ in the gluon dominated scenario $B_γ = 0.1$, while the sensitivity degrades to $O(1)~TeV^{-1}$ at $m_T = 2.0~TeV$. We find that the $t\bar{t}γ$ and $t\bar{t}γγ$ measurements provide complementary sensitivity, probing different regions of parameter space and lifting degeneracies between electromagnetic and chromomagnetic dipole interactions. These results demonstrate that precision measurements of top-associated photon final states provide a powerful and complementary probe of vector-like quarks in scenarios where radiative decays dominate.
We present a lattice QCD study of heavy baryons containing charm and bottom quarks, with particular emphasis on the relativistic treatment of all valence quarks. We use $N_f=2+1+1$ HISQ ensembles at the physical point to compute ground-state energies of spin-$3/2^+$ baryons, including singly-, doubly-, and triply-heavy charmed and bottom baryons. This work represents the first investigation of heavy baryons using fully relativistic bottom quarks.
Quantum phase measurements offer a complementary route to axion searches. We show that axion-photon interactions can imprint both Aharonov-Bohm (AB) and Berry phases in experimentally motivated quantum setups. For a coherently oscillating axion dark matter background, the induced effective current generates a time dependent magnetic flux in an rf-SQUID, leading to a measurable voltage signal through the Josephson phase. For representative benchmarks, this AB phase search reaches the minimum axion-photon coupling $g_{aγγ}^{\mathrm{min}}\sim 7.8\times10^{-14}~\mathrm{GeV}^{-1}$ at axion mass $m_a\sim 10^{-10}~\mathrm{eV}$, with projected sensitivity that can improve on existing limits in that parameter space by roughly one to two orders of magnitude. We also identify a geometric phase observable in a Mach-Zehnder interferometer with an adiabatically rotating magnetic field, providing a proof-of-principle phase-based probe of meV-scale axions even when they do not constitute the dark matter, although sensitivity on the coupling remains weaker than current bounds with conservative tabletop benchmarks. Extending the analysis to a three level photon-axion quasiparticle (AQP)-axion system, with the AQP realized in a topological magnetic insulator, we find a potentially measurable THz Berry phase dominated by the AQP sector, furnishing a nontrivial validation of the formalism in a richer coupled system. These setups establish quantum phase observables as a useful new framework for axion searches, with immediate phenomenological promise in superconducting circuits and longer term potential in quantum enhanced interferometry.
We present a new method to compute the soft function for the $N$-Jettiness variable for arbitrary $N$ at high perturbative orders in QCD. It is based on the observation that the most singular part of the soft function, the dipole contribution, can be represented by a sum of an analytically calculable inclusive soft function and a remainder. The latter is absent at NLO, is immediately finite at NNLO and can be made finite with the help of simple NLO-like infrared subtractions at N$^3$LO. As a byproduct of this approach, we derive a very simple formula for the tripole contribution to the $N$-Jettiness NNLO soft function, which results in a fast numerical evaluation. We apply this method to compute the $N$-Jettiness soft function at NNLO, and report numerical results for up to five jets for the hadron-collider soft function. We finally outline the prospects for applications at N$^3$LO.
The precise determination of the parton distribution functions (PDFs) of the proton is an essential ingredient for LHC analyses, including for those at the upcoming High-Luminosity LHC. So far, PDFs are determined from global fits to binned low-dimensional data obtained from unfolded hard-scattering cross section measurements. In this work we demonstrate for the first time the feasibility of neural simulation-based inference (NSBI) for constraining the proton PDFs using a high-dimensional unbinned data set. Exploiting the full statistical power of unbinned data removes the loss of information inherited by the binning procedure. As a proof-of-concept, we determine the gluon PDF from simulated data of top quark pair production at the LHC with $\sqrt{s}=13$ TeV. Taking into account both experimental and theoretical systematic uncertainties in the detector-level features, we demonstrate how the NSBI pipeline achieves significant improvements in precision compared to existing low-dimensional binned analyses. Our results illustrate the potential of unbinned inference to reduce the reliance on coarse approximations of uncertainties and their correlations entering PDF determinations, hence contributing to a new paradigm of unbinned detector-level ML-assisted measurements at the LHC.
We study the radiative decay of the $Z$ boson, $Z \to μ^+μ^-γ$, at the LHC, providing both Standard Model (SM) precision analysis and new physics projections. With detailed analysis of Run-2 and future HL-LHC performances, we demonstrate that this decay mode can be measured with a statistical precision at the sub-percentage level. From existing Run-1 data, we extract $\text{Br}^\text{fid}(Z \to μμγ) = (3.34 \pm 0.016)\times 10^{-4}$. We further explore the sensitivity of this channel to axion-like particles (ALPs) and to an anomalous $U(1)_X$ gauge boson coupled to the muon. Both scenarios feature resonant structures in the dimuon invariant mass spectrum within the $Z \to a/X + γ\to μ^+μ^-γ$ final state. Our results show that the radiative $Z$ decay provides a clean and statistically powerful probe of such leptophilic new physics, extending the current collider reach for ALPs and anomalous gauge forces down to $g_X \sim \mathcal{O}(10^{-3})$. This study highlights the potential of rare electroweak gauge boson decays as precision tests of the SM and sensitive probes of new interactions at the LHC.
The term 'neutrinoless' is a cornerstone of modern particle physics, yet it defines a fundamental process by what is missing rather than what is created. We trace the origins of this privative neologism to a 1953 experimental claim and show how a 'sociology of suspicion' transformed Ettore Majorana's affirmative ontology into an agnostic shorthand. By examining this linguistic shift, we argue that our current terminology may obscure the profound physical meaning of the search. Reclaiming the language of 'matter creation' is not merely a semantic choice, but a timely conceptual shift to bridge the gap between experimental caution and the radical character of the laws of nature we aim to uncover.
Glueballs represent a fascinating aspect of the strong interaction in nature. Gluons that serve as the mediators of the strong interaction are massless particles, but they possess a property unique to the strong interaction called color charge, which is analogous to electric charge in the electromagnetic interaction. Glueballs are composed of multiple gluons and would be massless without color charges. The interaction of the color charges, however, makes glueballs becoming massive objects. Glueballs thus offer a unique way to study the mass creation of strongly interacting particles.
Charged lepton-flavor violation is a null-test frontier of the Standard Model and a direct probe of physics beyond it. We present a global effective field theory (EFT) analysis across FCC-ee, ILC, CLIC, HL-LHC, HE-LHC, and muon colliders at 3 and 10 TeV, with operator identification as the primary target rather than exclusion reach alone. The analysis combines low-energy constraints, collider differential observables, and Dalitz-level $μ\to 3e$ information in a common profile-likelihood framework. Key hadron-collider and muon-collider signal/background samples are generated at event level and propagated through Delphes detector simulation, while clean $e^+e^-$ benchmark channels are modeled with CDR-calibrated parametric response. We include one-loop renormalization-group (RG) running and operator mixing between UV matching and measurement scales, finding 10--30\% shifts in selected operator-correlation entries when comparing tree-level and RG-evolved coefficient mappings at multi-TeV matching scales. Polarization asymmetries are used to separate $c_{H\ell}$ and $c_{He}$ directions, and UV discrimination is quantified with Bayes factors for benchmark leptoquark and heavy-neutral-lepton hypotheses. The full code chain for event generation, detector response, inference, and figure reproduction is provided.
The temporal and spatial coincidence between the gravitational wave (GW) event GW190425 and the fast radio burst (FRB) event FRB 20190425A raises the intriguing possibility of a physical connection between the two. The widely discussed possibility invoking the collapse of a supermassive neutron star as the merger product suffers the inconsistency between the model prediction and the measured inclination angle of the system. Here, we propose a novel physical mechanism to account for the association. We envisage a magnetar located at about 2.5 light hours away from the binary neutron star merger site. The kiloherz GWs generated by the merger are converted into kiloherz electromagnetic (EM) radiation via the Gertsenshtein-Zeldovich (GZ) effect near the magnetar. Subsequent inverse Compton scattering off the kilohertz EM waves by relativistic particles generates the observed gigahertz FRB emission. Our calculation reveals that, with appropriate parameter choices, the properties of FRB 20190425A can be reproduced.
We extend the $SU(3)_C \times SU(3)_L \times U(1)_X$ model with neutral leptons (331LHN) by introducing scalar leptoquarks. We determine the particle content of the leptoquark multiplets and their Yukawa interactions with fermions. We find that a singlet leptoquark can fully account for the $4.2σ$ discrepancy in the muon anomalous magnetic moment $Δa_μ^{2021}$. The corresponding leptoquark mass is constrained to be $m_S \gtrsim 1.8$~TeV, consistent with current LHC bounds. We further consider the updated $Δa_μ^{2025}$ based on recent lattice QCD results, which strengthen the lower bound to $m_S \gtrsim 6$~TeV. Combining $Δa_μ$ with low-energy leptonic observables, including charged lepton flavor violation and the $μ$--$e$ conversion rate, we constrain the viable parameter space. The allowed leptoquark Yukawa couplings exhibit a normal hierarchical pattern under all constraints. We also investigate the collider phenomenology of the singlet leptoquark, showing that its QCD-driven pair production leads to suppressed signal rates at the LHC for multi-TeV masses, while future hadron colliders can significantly extend the discovery reach.
The experimental observation of collective behaviour in proton-proton and proton-nucleus collisions poses a fundamental theoretical question regarding the proper characterization of the initial state underlying hydrodynamic evolution. While relativistic hydrodynamics requires an initial condition (IC) characterized by an entropy current, corresponding to a maximally mixed state, the microscopic description of the proton is based on inherently quantum objects, that are projections of pure states. We show that the appropriate matching between proton wave function and classical hydrodynamics emerges from the coarse-graining of its phase-space distribution quantified by the Wehrl-like entropy. This entropy provides a semi-classical, positive-definite measure of the density of accessible microstates at a given resolution scale, and therefore constitutes the appropriate quantity to characterize entropy deposition in small collision systems.
We study graviton production from an oscillating inflaton condensate during reheating by systematically comparing Boltzmann and Bogoliubov descriptions for inflaton potentials of the form $V(φ)\proptoφ^n$ around the minimum. The Bogoliubov framework provides a unified description of graviton production, capturing both perturbative and non-perturbative effects across short and long wavelengths, whereas the Boltzmann approach is restricted to perturbative production at short wavelengths. For the quadratic case ($n=2$), we find that the two approaches yield identical graviton spectra at short wavelengths, indicating that the Boltzmann treatments fully captures perturbative gravitational production in this regime. For steeper potentials ($n>2$), however, we identify a sizable contribution arising from the non-adiabatic transition between inflation and reheating. This component is naturally incorporated in the Bogoliubov formalism but absent in the Boltzmann description, and we show that it is important over a broad range of momenta. We derive analytic approximations within both frameworks that clarify the physical origin and scaling behavior of the spectrum. Our results delineate the regime of validity of Boltzmann approaches and show that, for steeper inflaton potentials, graviton production is governed by non-adiabatic transition dynamics for which the Bogoliubov formalism provides the most appropriate description.
The production of two isolated photons in high-energy hadron collisions poses a challenge to perturbative QCD because of large corrections through next-to-next-to-leading order (NNLO). We present novel next-to-next-to-next-to-leading order ($\text{N}^3$LO) predictions and finally demonstrate perturbative convergence for this process. We discuss the considerable computational challenges and phenomenological results for the Large Hadron Collider.
The impact of open-flavor thresholds on the quarkonium spectrum has been a subject of study since the introduction of the Cornell potential and has been quantified through various phenomenological approaches, most notably the $^3P_0$ model. We revisit this problem using the Born--Oppenheimer effective field theory (BOEFT), an effective field theory systematically derived from QCD by exploiting hierarchies of energy scales and symmetries. Within the BOEFT, open-flavor threshold effects emerge from the mixing between quarkonium and tetraquark static potentials sharing the same Born--Oppenheimer quantum numbers. The shapes of the static potentials are constrained by lattice QCD calculations. Furthermore, we account for the distinctive behavior of the BOEFT tetraquark static potentials at short and large distances: at short distances they are repulsive, reflecting the color-octet configuration of the heavy quark-antiquark pair, while at large distances they asymptotically approach heavy-light meson-antimeson thresholds. To quantify threshold effects on the quarkonium spectrum below threshold, we solve a set of coupled Schrödinger equations dictated by the BOEFT, whose only free parameter, the adjoint meson mass, is fixed to the mass of the $χ_{c1}(3872)$ state. These coupled equations are solved both in the spin-isospin averaged threshold limit and, for the first time, including the spin splittings of the physical thresholds. We validate our results by computing the same threshold effects as self-energy corrections to the quarkonium propagator. We compare our predictions with existing experimental data and previous literature. Finally, we provide a field-theoretical interpretation of the pair-creation constant $γ$ appearing in the $^3P_0$ model.
We review the role of primordial black holes (PBHs) for illuminating the dark ages of the cosmological evolution and as dark matter (DM) candidates. We elucidate the role of phase transitions for primordial black hole formation in the early Universe and focus our attention on the cosmological QCD phase transition within a recent microscopical model. We explore the impact of physics beyond the Standard Model (SM) on the cosmic equation of state and the probability distribution for the formation of PBHs which serve as candidates for DM and contribute to present-day binary black-hole merger events.
The subprocess $nn\to ppe^-e^-$ is a key ingredient in the interpretation of nuclear neutrinoless double-beta decay. Intermediate $Δ$ resonances may provide additional enhancements to this transition. We take a first step toward a $Δ$-full description of $nn\to ppe^-e^-$ by investigating the neutrinoless double-beta decay $Δ^- \to p e^-e^-$ in the framework of chiral effective field theory. We systematically derive the long-range contribution from light-Majorana-neutrino exchange through loop diagrams and incorporate the short-range part through counterterms required by renormalization. We predict the pion-mass dependence of the decay amplitude in the kinematic configuration with collinear electrons. Furthermore, to facilitate lattice-QCD matching, we calculate the decay amplitude in the degenerate $Δ$-nucleon mass limit and provide the corresponding long-range prediction.
We calculate electromagnetic multipole moments of $Σ$-type strange hidden-charm pentaquarks $P^Σ_{ψs}$ (isospin triplet $Σ^+,Σ^0,Σ^-$) using QCD light-cone sum rules, with six (spin-1/2) and seven (spin-3/2) interpolating currents built from diquark-diquark-antiquark operators. We compute magnetic dipole $μ$ for all channels and, for spin-3/2, electric quadrupole ${\cal Q}$ and magnetic octupole ${\cal O}$ moments (first computation), and give the first quark-flavor decomposition. Scalar diquark currents yield charm-dominated, flavor-insensitive moments ($μ\in[-1.92,-1.21]μ_N$ for spin-1/2, $|μ|\lesssim1.2μ_N$ for spin-3/2), consistent with heavy-quark spin symmetry. Axial-vector diquark currents produce larger, flavor-sensitive moments with sign reversals governed by $e_u/e_d=-2$. For ${\cal Q}$, scalar-diquark currents give oblate deformations ($Q_0\approx-2.0\times10^{-2}{\rm fm}^2$) dominated by charm, while two-axial-vector-diquark currents predict prolate values up to $Q_0=+8.0\times10^{-2}{\rm fm}^2$, with sign reversal for $[su][uc]\bar{c}$ in two currents. Currents with scalar antiquark coupling yield a topology-independent octupole ${\cal O}\approx-0.25\times10^{-3}{\rm fm}^3$, a lattice QCD benchmark. Comparison with constituent quark models identifies four discriminants: $|μ|\gtrsim3μ_N$ in spin-1/2; sign of $μ$ for $[su][uc]\bar{c}$ in spin-3/2; non-zero ${\cal Q}$ (vanishes in $S$-wave molecules); and the ${\cal Q}$-${\cal O}$ sign correlation, probing $1/m_q$ weighting.
Magnetic field amplification is an integral part of the process of particle acceleration at non-relativistic shocks. It is necessary to reach the maximum energies required by observations, especially in supernova remnants, thought to be sources of the bulk of Galactic cosmic rays. Such amplification can be caused by the acoustic instability that develops when small density perturbations interact with the cosmic-ray pressure gradient in the upstream of a cosmic-ray-modified shock. The vorticity induced by the nonlinear development of the instability may lead to turbulence, which amplifies the pre-existing magnetic fields. To study this phenomenon, we use the PLUTO code to carry out 2D (and some 3D) magnetohydrodynamical simulations of the evolution of small density perturbations in the presence of an assigned cosmic-ray pressure gradient. Adopting more realistic values of Mach number and cosmic-ray acceleration efficiency than previously assumed in the literature, we show that the acoustic instability can transform small density perturbations into large nonlinear structures while the fluid crosses the precursor region of a cosmic-ray-modified shock. We study the power spectrum of turbulent magnetic fluctuations that may be important to scatter particles. We comment on the possible constructive interference between acoustic and non-resonant streaming instabilities. We discuss limitations of previous and current numerical investigations in accessing spatial scales where turbulence is expected to turn nonlinear, and outline perspectives for future investigations.
We study a two-component pseudo-Nambu-Goldstone-boson (pNGB) dark matter (DM) model motivated by boosted dark matter (BDM). The model is based on a complex scalar field charged under a dark $\text{U}(1)_V$ gauge symmetry, with a softly broken global $\text{SU}(3)_g$ symmetry that is spontaneously broken. The pNGB nature suppresses DM--Nucleon scattering, while the residual $\text{U}(1)_3 \times \text{U}(1)_{T_0}$ symmetry automatically stabilizes the two pNGB DM candidates and allows conversion of the heavier component into the lighter one. A central point is that the heavier or light component hierarchy is controlled by the two independent soft-breaking parameters that split the pNGB multiplet, so an abundant heavier component required for BDM can be obtained without introducing ad hoc hierarchies among independent portal coupling tuned to enable effective conversion. We analyze the relic abundance together with the constraints considered in this work, including Higgs invisible decays and perturbative unitarity, classify the coupled freeze-out dynamics, and assess the resulting BDM scattering cross section and flux.
Future AI-based studies in particle physics will likely start from a foundation model to accelerate training and enhance sensitivity. As a step towards a general-purpose foundation model for particle physics, we investigate whether the OmniLearned foundation model pre-trained on diverse high-$Q^2$ simulated and real $pp$ and $ep$ collisions can be effectively transferred to a few-GeV fixed-target neutrino experiment. We process MINERvA neutrino--nucleus scattering events and evaluate pre-trained models on two types of tasks: regression of available energy and binary classification of charged-current pion final states ($\mathrm{CC1π^{\pm}}$, $\mathrm{CCNπ^{\pm}}$, and $\mathrm{CC1π^{0}}$). Pre-trained OmniLearned models consistently outperform similarly sized models trained from scratch, achieving better overall performance at the same compute budget, as well as achieving better performance at the same number of training steps. These results suggest that particle-level foundation models acquire inductive biases that generalize across large differences in energy scale, detector technology, and underlying physics processes, pointing toward a paradigm of detector-agnostic inference in particle physics.
Observations of ultra-dense substructures in strong lensing systems challenge the standard cosmological model at small scales. Self-interacting dark matter (SIDM), as an alternative to the cold and collisionless dark matter (CDM) of the standard cosmological model, provides a natural mechanism for forming such structures via gravothermal core collapse. We show that strong gravitational lensing of fast radio bursts (FRBs) provides an effective approach to detecting these substructures and probing dark matter self-interactions. Core-collapsed SIDM halos exhibit steeper central density profiles than CDM halos, enhancing the lensing cross section and producing longer time delays between FRB images. We compute lensing properties of core-collapsed subhalos and host halos, including maximal impact parameters and time-delay distributions. We demonstrate that future all-sky monitors, such as BURSTT, SKA2-Low, and SKA2-Mid, which are expected to detect $10^{5}$--$10^{7}$ FRBs over a decade, can measure time-delay distributions with high statistical significance. Modeling collapsed halos with a cored power-law density profile with inner slope $γ=3$ and assuming no excess beyond the singular isothermal sphere lens model, we show that our strategy can probe self-interaction cross section strengths of $σ_{\text{SI}}/m \gtrsim \min\{18,\, 40λ_{\text{sub}}\}\,\text{cm}^2/\text{g}$, where $λ_{\text{sub}}$ parameterizes the collapse time of a subhalo relative to that of the isolated case.
We explore the sensitivity of the $e^{+} e^{-} \to K^+ K^- 2 π^0$ cross section to the magnetic dipole moment (MDM) of the $K^*$ vector meson. We describe the $γ^* \to 2K2π$ vertex using a vector meson dominance model, including the intermediate resonant contributions relevant for energies below 2.4 GeV. Using BaBar data for this process, we show that this observable is indeed sensitive to the MDM of the $K^*$; we obtain a central value for the MDM of $μ_{K^*}=4.5$ and an upper bound of $\barμ_{K^*} = 6.3$, in units of $e/2 m_{K^*}$. We emphasize the need for higher precision data to provide a first data-driven determination of this parameter to confront it with theoretical predictions.
t-SNE has gained popularity as a dimension reduction technique, especially for visualizing data. It is well-known that all dimension reduction techniques may lose important features of the data. We provide a mathematical framework for understanding this loss for t-SNE by establishing a number of results in different scenarios showing how important features of data are lost by using t-SNE.
Adaptive Conformal Inference (ACI) provides distribution-free prediction intervals with asymptotic coverage guarantees for time series under distribution shift. However, ACI only adapts the quantile threshold -- it cannot shift the interval center. When a base forecaster develops persistent bias after a regime change, ACI compensates by widening intervals symmetrically, producing unnecessarily conservative bands. We propose Bias-Corrected ACI (BC-ACI), which augments standard ACI with an online exponentially weighted moving average (EWM) estimate of forecast bias. BC-ACI corrects nonconformity scores before quantile computation and re-centers prediction intervals, addressing the root cause of miscalibration rather than its symptom. An adaptive dead-zone threshold suppresses corrections when estimated bias is indistinguishable from noise, ensuring no degradation on well-calibrated data. In controlled experiments across 688 runs spanning two base models, four synthetic regimes, and three real datasets, BC-ACI reduces Winkler interval scores by 13--17% under mean and compound distribution shifts (Wilcoxon p < 0.001) while maintaining equivalent performance on stationary data (ratio 1.002x). We provide finite-sample analysis showing that coverage guarantees degrade gracefully with bias estimation error.
Causal representation learning (CRL) aims to identify the underlying latent variables from high-dimensional observations, even when variables are dependent with each other. We study this problem for latent variables that follow a potentially degenerate Gaussian mixture distribution and that are only observed through the transformation via a piecewise affine mixing function. We provide a series of progressively stronger identifiability results for this challenging setting in which the probability density functions are ill-defined because of the potential degeneracy. For identifiability up to permutation and scaling, we leverage a sparsity regularization on the learned representation. Based on our theoretical results, we propose a two-stage method to estimate the latent variables by enforcing sparsity and Gaussianity in the learned representations. Experiments on synthetic and image data highlight our method's effectiveness in recovering the ground-truth latent variables.
Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.
The Energy Conserving Descent (ECD) algorithm was recently proposed (De Luca & Silverstein, 2022) as a global non-convex optimization method. Unlike gradient descent, appropriately configured ECD dynamics escape strict local minima and converge to a global minimum, making it appealing for machine learning optimization. We present the first analytical study of ECD, focusing on the one-dimensional setting for this first installment. We formalize a stochastic ECD dynamics (sECD) with energy-preserving noise, as well as a quantum analog of the ECD Hamiltonian (qECD), providing the foundation for a quantum algorithm through Hamiltonian simulation. For positive double-well objectives, we compute the expected hitting time from a local to the global minimum. We prove that both sECD and qECD yield exponential speedup over respective gradient descent baselines--stochastic gradient descent and its quantization. For objectives with tall barriers, qECD achieves a further speedup over sECD.
Interference arises when the treatment assigned to one individual affects the outcomes of other individuals. Commonly, individuals are naturally grouped into clusters, and interference occurs only among individuals within the same cluster, a setting referred to as partial interference. We study network causal effects on outcome quantiles in the presence of partial interference. We develop a general nonparametric efficiency theory for estimating these network quantile causal effects, which leads to a nonparametrically efficient estimator. The proposed estimator is consistent and asymptotically normal with parametric convergence rates, while allowing for flexible, data-adaptive estimation of complex nuisance functions. We leverage a three-way cross-fitting procedure that avoids direct estimation of the conditional outcome distribution. Simulations demonstrate adequate finite-sample performance of the proposed estimators, and we apply the methods to a clustered observational study.
Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate uncertainty quantification in existing methods. We introduce the Causal Diffusion Model (CDM), the first denoising diffusion probabilistic approach explicitly designed to generate full probabilistic distributions of counterfactual outcomes under sequential interventions. CDM employs a novel residual denoising architecture with relational self-attention, capturing intricate temporal dependencies and multimodal outcome trajectories without requiring explicit adjustments (e.g., inverse-probability weighting or adversarial balancing) for confounding. In rigorous evaluation on a pharmacokinetic-pharmacodynamic tumor-growth simulator widely adopted in prior work, CDM consistently outperforms state-of-the-art longitudinal causal inference methods, achieving a 15-30% relative improvement in distributional accuracy (1-Wasserstein distance) while maintaining competitive or superior point-estimate accuracy (RMSE) under high-confounding regimes. By unifying uncertainty quantification and robust counterfactual prediction in complex, sequentially confounded settings, without tailored deconfounding, CDM offers a flexible, high-impact tool for decision support in medicine, policy evaluation, and other longitudinal domains.
The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).
This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.
We investigate random feature models in which neural networks sampled from a prescribed initialization ensemble are frozen and used as random features, with only the readout weights optimized. Adopting a statistical-physics viewpoint, we study the training, test, and generalization errors beyond the mean-kernel approximation. Since the predictor is a nonlinear functional of the induced random kernel, the ensemble-averaged errors depend not only on the mean kernel but also on higher-order fluctuation statistics. Within an effective field-theoretic framework, these finite-width contributions naturally appear as loop corrections. We derive the loop corrections to the training, test, and generalization errors, obtain their scaling laws, and support the theory with experimental verification.
Adversarial training (AT) is an effective defense for large language models (LLMs) against jailbreak attacks, but performing AT on LLMs is costly. To improve the efficiency of AT for LLMs, recent studies propose continuous AT (CAT) that searches for adversarial inputs within the continuous embedding space of LLMs during AT. While CAT has achieved empirical success, its underlying mechanism, i.e., why adversarial perturbations in the embedding space can help LLMs defend against jailbreak prompts synthesized in the input token space, remains unknown. This paper presents the first theoretical analysis of CAT on LLMs based on in-context learning (ICL) theory. For linear transformers trained with adversarial examples from the embedding space on in-context linear regression tasks, we prove a robust generalization bound that has a negative correlation with the perturbation radius in the embedding space. This clearly explains why CAT can defend against jailbreak prompts from the LLM's token space. Further, the robust bound shows that the robustness of an adversarially trained LLM is closely related to the singular values of its embedding matrix. Based on this, we propose to improve LLM CAT by introducing an additional regularization term, which depends on singular values of the LLM's embedding matrix, into the objective function of CAT. Experiments on real-world LLMs demonstrate that our method can help LLMs achieve a better jailbreak robustness-utility tradeoff. The code is available at https://github.com/fshp971/continuous-adv-icl.
This paper studies Graphical SLOPE for precision matrix estimation, with emphasis on its ability to recover both sparsity and clusters of edges with equal or similar strength. In a fixed-dimensional regime, we establish that the root-$n$ scaled estimation error converges to the unique minimizer of a strictly convex optimization problem defined through the directional derivative of the SLOPE penalty. We also establish convergence of the induced SLOPE pattern, thereby obtaining an asymptotic characterization of the clustering structure selected by the estimator. A comparison with GLASSO shows that the grouping property of SLOPE can substantially improve estimation accuracy when the precision matrix exhibits structured edge patterns. To assess the effect of departures from Gaussianity, we then analyze Gaussian-loss precision matrix estimation under elliptical distributions. In this setting, we derive the limiting distribution and quantify the inflation in variability induced by heavy tails relative to the Gaussian benchmark. We also study TSLOPE, based on the multivariate $t$-loss, and derive its limiting distribution. The results show that TSLOPE offers clear advantages over GSLOPE under heavy-tailed data-generating mechanisms. Simulation evidence suggests that these qualitative conclusions persist in high-dimensional settings, and an empirical application shows that SLOPE-based estimators, especially TSLOPE, can uncover economically meaningful clustered dependence structures.
The deployment of deep neural networks in safety-critical systems necessitates reliable and efficient uncertainty quantification (UQ). A practical and widespread strategy for UQ is repurposing stochastic regularizers as scalable approximate Bayesian inference methods, such as Monte Carlo Dropout (MCD) and MC-DropBlock (MCDB). However, this paradigm remains under-explored for Stochastic Depth (SD), a regularizer integral to the residual-based backbones of most modern architectures. While prior work demonstrated its empirical promise for segmentation, a formal theoretical connection to Bayesian variational inference and a benchmark on complex, multi-task problems like object detection are missing. In this paper, we first provide theoretical insights connecting Monte Carlo Stochastic Depth (MCSD) to principled approximate variational inference. We then present the first comprehensive empirical benchmark of MCSD against MCD and MCDB on state-of-the-art detectors (YOLO, RT-DETR) using the COCO and COCO-O datasets. Our results position MCSD as a robust and computationally efficient method that achieves highly competitive predictive accuracy (mAP), notably yielding slight improvements in calibration (ECE) and uncertainty ranking (AUARC) compared to MCD. We thus establish MCSD as a theoretically-grounded and empirically-validated tool for efficient Bayesian approximation in modern deep learning.
The menstrual cycle influences numerous physiological and psychological outcomes, yet standardised, open-source statistical methods for quantifying these cyclic effects remain lacking. We developed mcanalysis, an open-source package in R and Python implementing a Fourier-basis generalised additive model (GAM) for menstrual cycle research. The package provides a complete pipeline: processing period dates, labelling cycle days relative to menstruation onset, filtering physiologically plausible cycles, normalising outcomes to individual means, fitting cyclic GAMs with bootstrap confidence intervals, and identifying turning points to generate phase-specific linear trend estimates. We demonstrate the package on 15 wearable and self-reported outcomes using data from the Juli chronic health management application (N = 2,816 users). Nine of 15 outcomes showed evidence of association with the menstrual cycle (p < 0.05), spanning physiological (HRV p < 0.001, oxygen saturation p = 0.002), sleep (p = 0.003), symptom (migraine p < 0.001, headache p = 0.005), mood (EMA mood p = 0.024, PHQ-8 lack of energy p = 0.008, mania p = 0.041), and activity (hours outside p = 0.019) domains. No tested confounders were significantly associated with cycle-normalised outcomes. mcanalysis provides a standardised, reproducible approach to menstrual cycle analysis for users at all levels of statistical expertise. The package is freely available at https://github.com/kyradelray/mcanalysis, with a no-code web interface at https://kyradelray.shinyapps.io/mcanalysis/.
Large language models (LLMs) can generate survey responses at low cost, but their reliability varies substantially across questions and is unknown before data collection. Deploying LLMs in surveys still requires costly human responses for verification and correction. How should a limited human-labeling budget be allocated across questions in real time? We propose an adaptive allocation algorithm that learns which questions are hardest for the LLM while simultaneously collecting human responses. Each human label serves a dual role: it improves the estimate for that question and reveals how well the LLM predicts human responses on it. The algorithm directs more budget to questions where the LLM is least reliable, without requiring any prior knowledge of question-level LLM accuracy. We prove that the allocation gap relative to the best possible allocation vanishes as the budget grows, and validate the approach on both synthetic data and a real survey dataset with 68 questions and over 2000 respondents. On real survey data, the standard practice of allocating human labels uniformly across questions wastes 10--12% of the budget relative to the optimal; our algorithm reduces this waste to 2--6%, and the advantage grows as questions become more heterogeneous in LLM prediction quality. The algorithm achieves the same estimation quality as traditional uniform sampling with fewer human samples, requires no pilot study, and is backed by formal performance guarantees validated on real survey data. More broadly, the framework applies whenever scarce human oversight must be allocated across tasks where LLM reliability is unknown.
In-context learning enables transformers to adapt to new tasks from a few examples at inference time, while grokking highlights that this generalization can emerge abruptly only after prolonged training. We study task generalization and grokking in in-context learning using a Bayesian perspective, asking what enables the delayed transition from memorization to generalization. Concretely, we consider modular arithmetic tasks in which a transformer must infer a latent linear function solely from in-context examples and analyze how predictive uncertainty evolves during training. We combine approximate Bayesian techniques to estimate the posterior distribution and we study how uncertainty behaves across training and under changes in task diversity, context length, and context noise. We find that epistemic uncertainty collapses sharply when the model groks, making uncertainty a practical label-free diagnostic of generalization in transformers. Additionally, we provide theoretical support with a simplified Bayesian linear model, showing that asymptotically both delayed generalization and uncertainty peaks arise from the same underlying spectral mechanism, which links grokking time to uncertainty dynamics.
We decompose the Kullback--Leibler generalization error (GE) -- the expected KL divergence from the data distribution to the trained model -- of unsupervised learning into three non-negative components: model error, data bias, and variance. The decomposition is exact for any e-flat model class and follows from two identities of information geometry: the generalized Pythagorean theorem and a dual e-mixture variance identity. As an analytically tractable demonstration, we apply the framework to $ε$-PCA, a regularized principal component analysis in which the empirical covariance is truncated at rank $N_K$ and discarded directions are pinned at a fixed noise floor $ε$. Although rank-constrained $ε$-PCA is not itself e-flat, it admits a technical reformulation with the same total GE on isotropic Gaussian data, under which each component of the decomposition takes closed form. The optimal rank emerges as the cutoff $λ_{\mathrm{cut}}^{*} = ε$ -- the model retains exactly those empirical eigenvalues exceeding the noise floor -- with the cutoff reflecting a marginal-rate balance between model-error gain and data-bias cost. A boundary comparison further yields a three-regime phase diagram -- retain-all, interior, and collapse -- separated by the lower Marchenko--Pastur edge and an analytically computable collapse threshold $ε_{*}(α)$, where $α$ is the dimension-to-sample-size ratio. All claims are verified numerically.
Fine-tuning is a widely used strategy for adapting pre-trained models to new tasks, yet its methodology and theoretical properties in high-dimensional nonparametric settings with variable selection have not yet been developed. This paper introduces the fine-tuning factor augmented neural Lasso (FAN-Lasso), a transfer learning framework for high-dimensional nonparametric regression with variable selection that simultaneously handles covariate and posterior shifts. We use a low-rank factor structure to manage high-dimensional dependent covariates and propose a novel residual fine-tuning decomposition in which the target function is expressed as a transformation of a frozen source function and other variables to achieve transfer learning and nonparametric variable selection. This augmented feature from the source predictor allows for the transfer of knowledge to the target domain and reduces model complexity there. We derive minimax-optimal excess risk bounds for the fine-tuning FAN-Lasso, characterizing the precise conditions, in terms of relative sample sizes and function complexities, under which fine-tuning yields statistical acceleration over single-task learning. The proposed framework also provides a theoretical perspective on parameter-efficient fine-tuning methods. Extensive numerical experiments across diverse covariate- and posterior-shift scenarios demonstrate that the fine-tuning FAN-Lasso consistently outperforms standard baselines and achieves near-oracle performance even under severe target sample size constraints, empirically validating the derived rates.
Biosignals exhibit substantial cross-subject and cross-session variability, inducing severe domain shifts that degrade post-deployment performance for small, edge-oriented AI models. On-device adaptation is therefore essential to both preserve user privacy and ensure system reliability. However, existing sub-100 mW MCU-based wearable platforms can only support shallow or sparse adaptation schemes due to the prohibitive memory footprint and computational cost of full backpropagation (BP). In this paper, we propose BioTrain, a framework enabling full-network fine-tuning of state-of-the-art biosignal models under milliwatt-scale power and sub-megabyte memory constraints. We validate BioTrain using both offline and on-device benchmarks on EEG and EOG datasets, covering Day-1 new-subject calibration and longitudinal adaptation to signal drift. Experimental results show that full-network fine-tuning achieves accuracy improvements of up to 35% over non-adapted baselines and outperforms last-layer updates by approximately 7% during new-subject calibration. On the GAP9 MCU platform, BioTrain enables efficient on-device training throughput of 17 samples/s for EEG and 85 samples/s for EOG models within a power envelope below 50 mW. In addition, BioTrain's efficient memory allocator and network topology optimization enable the use of a large batch size, reducing peak memory usage. For fully on-chip BP on GAP9, BioTrain reduces the memory footprint by 8.1x, from 5.4 MB to 0.67 MB, compared to conventional full-network fine-tuning using batch normalization with batch size 8.
Communication in Large Language Model (LLM)-based multi-agent systems is moving beyond discrete tokens to preserve richer context. Recent work such as LatentMAS enables agents to exchange latent messages through full key-value (KV) caches. However, full KV relay incurs high memory and communication cost. We adapt eviction-style KV compression to this setting and introduce Orthogonal Backfill (OBF) to mitigate information loss from hard eviction. OBF injects a low-rank orthogonal residual from discarded KV states into the retained KV states. We evaluate proposed method against full KV relay on nine standard benchmarks spanning mathematical reasoning, coding, and knowledge-intensive QA. It achieves performance comparable to full KV relay while reducing communication cost by 79.8%--89.4%. OBF further improves the performance and achieves the best results on 7 of the 9 benchmarks. This suggests that more information does not necessarily lead to better communication; preserving the most useful information matters more. Our codebase is publicly available on https://github.com/markli404/When-Less-Latent-Leads-to-Better-Relay.
Identifying meaningful feature interactions is a central challenge in building accurate and interpretable models for tabular data. Generalized additive models (GAMs) have shown great success at modeling tabular data, but often rely on heuristic procedures to select interactions, potentially missing higher-order or context-dependent effects. To meet this challenge, we propose TabDistill, a method that leverages tabular foundation models and post-hoc distillation methods. Our key intuition is that tabular foundation models implicitly learn rich, adaptive feature dependencies through large-scale representation learning. Given a dataset, TabDistill first fits a tabular foundation model to the dataset, and then applies a post-hoc interaction attribution method to extract salient feature interactions from it. We evaluate these interactions by then using them as terms in a GAM. Across tasks, we find that interactions identified by TabDistill lead to consistent improvements in downstream GAMs' predictive performance. Our results suggest that tabular foundation models can serve as effective, data-driven guides for interaction discovery, bridging high-capacity models and interpretable additive frameworks.
In electronic health record (EHR) mining, learning high-quality representations of medical concepts (e.g., standardized diagnosis, medication, and procedure codes) is fundamental for downstream clinical prediction. However, robust concept representation learning is hindered by two key challenges: (i) clinically important cross-type dependencies (e.g., diagnosis-medication and medication-procedure relations) are often missing or incomplete in existing ontology resources, limiting the ability to model complex EHR patterns; and (ii) rich clinical semantics are often missing from structured resources, and even when available as text, are difficult to integrate with KG structure for representation learning. To address these challenges, we present CoMed, an LLM-empowered graph learning framework for medical concept representation. CoMed first builds a global knowledge graph (KG) over medical codes by combining statistically reliable associations mined from EHRs with type-constrained LLM prompting to infer semantic relations. It then utilizes LLMs to enrich the KG into a text-attributed graph by generating node descriptions and edge rationales, providing semantic signals for both concepts and their relationships. Finally, CoMed jointly trains a LoRA-tuned LLaMA text encoder with a heterogeneous GNN, fusing text semantics and graph structure into unified concept embeddings. Extensive experiments on MIMIC-III and MIMIC-IV show that CoMed consistently improves prediction performance and serves as an effective plug-in concept encoder for standard EHR pipelines.
Pathology reports serve as the definitive record for breast cancer staging, yet their unstructured format impedes large-scale data curation. While Large Language Models (LLMs) offer semantic reasoning, their deployment is often limited by high computational costs and hallucination risks. This study introduces a parameter-efficient, multi-task framework for automating the extraction of Tumor-Node-Metastasis (TNM) staging, histologic grade, and biomarkers. We fine-tune a Llama-3-8B-Instruct encoder using Low-Rank Adaptation (LoRA) on a curated, expert-verified dataset of 10,677 reports. Unlike generative approaches, our architecture utilizes parallel classification heads to enforce consistent schema adherence. Experimental results demonstrate that the model achieves a Macro F1 score of 0.976, successfully resolving complex contextual ambiguities and heterogeneous reporting formats that challenge traditional extraction methods including rule-based natural language processing (NLP) pipelines, zero-shot LLMs, and single-task LLM baselines. The proposed adapter-efficient, multi-task architecture enables reliable, scalable pathology-derived cancer staging and biomarker profiling, with the potential to enhance clinical decision support and accelerate data-driven oncology research.
Modern GPU workloads, especially large language model (LLM) inference, suffer from kernel launch overheads and coarse synchronization that limit inter-kernel parallelism. Recent megakernel techniques fuse multiple operators into a single persistent kernel to eliminate launch gaps and expose inter-kernel parallelism, but struggle to handle dynamic shapes and data-dependent computation in real workloads. We present Event Tensor, a unified compiler abstraction for dynamic megakernels. Event Tensor encodes dependencies between tiled tasks, and enables first-class support for both shape and data-dependent dynamism. Built atop this abstraction, our Event Tensor Compiler (ETC) applies static and dynamic scheduling transformations to generate high-performance persistent kernels. Evaluations show that ETC achieves state-of-the-art LLM serving latency while significantly reducing system warmup overhead.
Neural operators have emerged as fast surrogate models for physics simulations, yet they remain acutely vulnerable to adversarial perturbations, a critical liability for safety-critical digital twin deployments. We present a synergistic defense that combines active learning-based data generation with an input denoising architecture. The active learning component adaptively probes model weaknesses using differential evolution attacks, then generates targeted training data at discovered vulnerability locations while an adaptive smooth-ratio safeguard preserves baseline accuracy. The input denoising component augments the operator architecture with a learnable bottleneck that filters adversarial noise while retaining physics-relevant features. On the viscous Burgers' equation benchmark, the combined approach achieves a 2.04% combined error (1.21% baseline + 0.83% robustness), representing an 87% reduction relative to standard training (15.42% combined) and outperforming both active learning alone (3.42%) and input denoising alone (5.22%). More broadly, our results, combined with cross-architecture vulnerability analysis from prior work, suggest that optimal training data for neural operators is architecture-dependent: because different architectures concentrate sensitivity in distinct input subspaces, uniform sampling cannot adequately cover the vulnerability landscape of all models. These findings have potential implications for the deployment of neural operators in safety-critical energy systems including nuclear reactor monitoring.
High-resolution data in spatial and temporal contexts is imperative for developing climate resilient cities. Current datasets for monitoring urban parameters are developed primarily using manual inspections, embedded-sensing, remote sensing, or standard street-view imagery (RGB). These methods and datasets are often constrained respectively by poor scalability, inconsistent spatio-temporal resolutions, overhead views or low spectral information. We present a novel method and its open implementation: a multi-spectral terrestrial-view dataset that circumvents these limitations. This dataset consists of 17,718 street level multi-spectral images captured with RGB, Near-infrared, and Thermal imaging sensors on bikes, across diverse urban morphologies (village, town, small city, and big urban area) in the Netherlands. Strict emphasis is put on data calibration and quality while also providing the details of our data collection methodology (including the hardware and software details). To the best of our knowledge, Spectrascapes is the first open-access dataset of its kind. Finally, we demonstrate two downstream use-cases enabled using this dataset and provide potential research directions in the machine learning, urban planning and remote sensing domains.
Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word order and attribute binding. This limitation arises from a scarcity of informative samples needed to differentiate subtle semantic variations during contrastive pretraining. Although hard negative mining offers a promising remedy, existing methods lack explicit mechanisms to dictate which linguistic elements undergo modification. Instead of engineering generative architectures, this study establishes lexical concreteness as a fundamental determinant of negative sample efficacy. Modifying highly concrete terms generates more pronounced structural and visual discrepancies, providing a substantially stronger learning signal. Leveraging this principle, ConcretePlant is proposed to systematically isolate and manipulate perceptually grounded concepts. Analyses of the InfoNCE further reveals a severe gradient imbalance, where easily distinguishable pairs disproportionately overwhelm the optimization process and restrict the bandwidth available for nuanced learning. To resolve this degradation, the Cement loss is formulated utilizing a margin-based approach. By correlating psycholinguistic scores with sample difficulty, this objective dynamically calibrates the penalization applied to individual training pairs. Comprehensive evaluations substantiate these theoretical claims. The integrated framework, designated as Slipform, achieves state-of-the-art accuracy across diverse compositional evaluation benchmarks, general cross-modal retrieval, single and multi label linear probing.
An unsupervised framework for hyperspectral image (HSI) clustering is proposed that incorporates masked deep representation learning with diffusion-based clustering, extending the Spatially-Regularized Superpixel-based Diffusion Learning ($S^2DL$) algorithm. Initially, a denoised latent representation of the original HSI is learned via an unsupervised masked autoencoder (UMAE) model with a Vision Transformer backbone. The UMAE takes spatial context and long-range spectral correlations into account and incorporates an efficient pretraining process via masking that utilizes only a small subset of training pixels. In the next stage, the entropy rate superpixel (ERS) algorithm is used to segment the image into superpixels, and a spatially regularized diffusion graph is constructed using Euclidean and diffusion distances within the compressed latent space instead of the HSI space. The proposed algorithm, Deep Spatially-Regularized Superpixel-based Diffusion Learning ($DS^2DL$), leverages more faithful diffusion distances and subsequent diffusion graph construction that better reflect the intrinsic geometry of the underlying data manifold, improving labeling accuracy and clustering quality. Experiments on Botswana and KSC datasets demonstrate the efficacy of $DS^2DL$.
This paper investigates the problem of data-driven modeling of port-Hamiltonian systems while preserving their intrinsic Hamiltonian structure and stability properties. We propose a novel neural-network-based port-Hamiltonian modeling technique that relaxes the convexity constraint commonly imposed by neural network-based Hamiltonian approximations, thereby improving the expressiveness and generalization capability of the model. By removing this restriction, the proposed approach enables the use of more general non-convex Hamiltonian representations to enhance modeling flexibility and accuracy. Furthermore, the proposed method incorporates information about stable equilibria into the learning process, allowing the learned model to preserve the stability of multiple isolated equilibria rather than being restricted to a single equilibrium as in conventional methods. Two numerical experiments are conducted to validate the effectiveness of the proposed approach and demonstrate its ability to achieve more accurate structure- and stability-preserving learning of port-Hamiltonian systems compared with a baseline method.
t-SNE has gained popularity as a dimension reduction technique, especially for visualizing data. It is well-known that all dimension reduction techniques may lose important features of the data. We provide a mathematical framework for understanding this loss for t-SNE by establishing a number of results in different scenarios showing how important features of data are lost by using t-SNE.
Accurate characterization of subsurface heterogeneity is challenging but essential for applications such as reservoir pressure management, geothermal energy extraction and CO$_2$, H$_2$, and wastewater injection operations. This challenge becomes especially acute in extreme pressure events, which are rarely observed but can strongly affect operational risk. Traditional history matching and inversion techniques rely on expensive full-physics simulations, making it infeasible to handle uncertainty and extreme events at scale. Purely data-driven models often struggle to maintain physics consistency when dealing with sparse observations, complex geology, and extreme events. To overcome these limitations, we introduce a physics-informed machine learning method that embeds a differentiable subsurface flow simulator directly into neural network training. The network infers heterogeneous permeability fields from limited pressure observations, while training minimizes both permeability and pressure losses through the simulator, enforcing physical consistency. Because the simulator is used only during training, inference remains fast once the model is learned. In an initial test, the proposed method reduces the pressure inference error by half compared with a purely data-driven approach. We then extend the test over eight distinct data scenarios, and in every case, our method produces significantly lower pressure inference errors than the purely data-driven model. We also evaluate our method on extreme events, which represent high-consequence data in the tail of the sample distribution. Similar to the bulk distribution, the physics-informed model maintains higher pressure inference accuracy in the extreme event regimes. Overall, the proposed method enables rapid, physics-consistent subsurface inversion for real-time reservoir characterization and risk-aware decision-making.
Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining scalability to billion-parameter models, we propose modeling decisions and introduce an efficient procedure for computing the inverse Hessian, preserving the efficiency of state-of-the-art one-shot pruners. When combined with state-of-the-art pruning methods on Llama-3.2 and Llama-2 models, MOONSHOT reduces C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy across seven classification benchmarks by up to 4.9 points. On Vision Transformers, it improves accuracy on ImageNet-1k by over 5 points at 70% sparsity, and on ResNet-50, it yields a 4-point gain at 90% sparsity.
Earth Observation (EO) satellite scheduling (deciding which imaging tasks to perform and when) is a well-studied combinatorial optimization problem. Existing methods typically assume that the operational constraint model is fully specified in advance. In practice, however, constraints governing separation between observations, power budgets, and thermal limits are often embedded in engineering artefacts or high-fidelity simulators rather than in explicit mathematical models. We study EO scheduling under \emph{unknown constraints}: the objective is known, but feasibility must be learned interactively from a binary oracle. Working with a simplified model restricted to pairwise separation and global capacity constraints, we introduce Conservative Constraint Acquisition~(CCA), a domain-specific procedure designed to identify justified constraints efficiently in practice while limiting unnecessary tightening of the learned model. Embedded in the \textsc{Learn\&Optimize} framework, CCA supports an interactive search process that alternates optimization under a learned constraint model with targeted oracle queries. On synthetic instances with up to 50~tasks and dense constraint networks, L\&O improves over a no-knowledge greedy baseline and uses far fewer main oracle queries than a two-phase acquire-then-solve baseline (FAO). For $n\leq 30$, the average gap drops from 65--68\% (Priority Greedy) to 17.7--35.8\% using L\&O. At $n{=}50$, where the CP-SAT reference is the best feasible solution found in 120~s, L\&O improves on FAO on average (17.9\% vs.\ 20.3\%) while using 21.3 main queries instead of 100 and about $5\times$ less execution time.
Aerial object detection in UAV imagery presents unique challenges due to the high prevalence of tiny objects, adverse environmental conditions, and strict computational constraints. Standard YOLO-based detectors fail to address these jointly: their minimum detection stride of 8 pixels renders sub-32px objects nearly undetectable, their CIoU loss produces zero gradients for non-overlapping tiny boxes, and their architectures contain significant filter redundancy. We propose DroneScan-YOLO, a holistic system contribution that addresses these limitations through four coordinated design choices: (1) increased input resolution of 1280x1280 to maximize spatial detail for tiny objects, (2) RPA-Block, a dynamic filter pruning mechanism based on lazy cosine-similarity updates with a 10-epoch warm-up period, (3) MSFD, a lightweight P2 detection branch at stride 4 adding only 114,592 parameters (+1.1%), and (4) SAL-NWD, a hybrid loss combining Normalized Wasserstein Distance with size-adaptive CIoU weighting, integrated into YOLOv8's TaskAligned assignment pipeline. Evaluated on VisDrone2019-DET, DroneScan-YOLO achieves 55.3% mAP@50 and 35.6% mAP@50-95, outperforming the YOLOv8s baseline by +16.6 and +12.3 points respectively, improving recall from 0.374 to 0.518, and maintaining 96.7 FPS inference speed with only +4.1% parameters. Gains are most pronounced on tiny object classes: bicycle AP@50 improves from 0.114 to 0.328 (+187%), and awning-tricycle from 0.156 to 0.237 (+52%).
Larger language models become simultaneously better and worse at handling contextual information -- better at ignoring false claims, worse at ignoring irrelevant tokens. We formalize this apparent paradox through the first scaling laws for contextual entrainment, the tendency of models to favor tokens that appeared in context regardless of relevance. Analyzing the Cerebras-GPT (111M-13B) and Pythia (410M-12B) model families, we find entrainment follows predictable power-law scaling, but with opposite trends depending on context type: semantic contexts show decreasing entrainment with scale, while non-semantic contexts show increasing entrainment. Concretely, the largest models are four times more resistant to counterfactual misinformation than the smallest, yet simultaneously twice as prone to copying arbitrary tokens. These diverging trends, which replicate across model families, suggest that semantic filtering and mechanical copying are functionally distinct behaviors that scale in opposition -- scaling alone does not resolve context sensitivity, it reshapes it.
Large Language Models (LLMs) are increasingly applied to complex telecommunications tasks, including 3GPP specification analysis and O-RAN network troubleshooting. However, a critical limitation remains: LLM-generated confidence scores are often biased and unreliable, frequently exhibiting systematic overconfidence. This lack of trustworthy self-assessment makes it difficult to verify model outputs and safely rely on them in practice. In this paper, we study confidence calibration in telecom-domain LLMs using the representative Gemma-3 model family (4B, 12B, and 27B parameters), evaluated on TeleQnA, ORANBench, and srsRANBench. We show that standard single-pass, verbalized confidence estimates fail to reflect true correctness, often assigning high confidence to incorrect predictions. To address this, we propose a novel Twin-Pass Chain of Thought (CoT)-Ensembling methodology for improving confidence estimation by leveraging multiple independent reasoning evaluations and aggregating their assessments into a calibrated confidence score. Our approach reduces Expected Calibration Error (ECE) by up to 88% across benchmarks, significantly improving the reliability of model self-assessment. These results highlight the limitations of current confidence estimation practices and demonstrate a practical path toward more trustworthy evaluation of LLM outputs in telecommunications.
Meta-learning offers a principled framework leveraging \emph{task-invariant} priors from related tasks, with which \emph{task-specific} models can be fine-tuned on downstream tasks, even with limited data records. Gradient-based meta-learning (GBML) relies on gradient descent (GD) to adapt the prior to a new task. Albeit effective, these methods incur high computational overhead that scales linearly with the number of GD steps. To enhance efficiency and scalability, existing methods approximate the gradient of prior parameters (meta-gradient) via truncated backpropagation, yet suffer large approximation errors. Targeting accurate approximation, this work puts forth binomial GBML (BinomGBML), which relies on a truncated binomial expansion for meta-gradient estimation. This novel expansion endows more information in the meta-gradient estimation via efficient parallel computation. As a running paradigm applied to model-agnostic meta-learning (MAML), the resultant BinomMAML provably enjoys error bounds that not only improve upon existing approaches, but also decay super-exponentially under mild conditions. Numerical tests corroborate the theoretical analysis and showcase boosted performance with slightly increased computational overhead.
In medical image segmentation, uncertainty estimates are often reported but rarely used to guide decisions. We study the missing step: how uncertainty maps are converted into actionable policies such as accepting, flagging, or deferring predictions. We formulate segmentation as a two-stage pipeline, estimation followed by decision, and show that optimizing uncertainty alone fails to capture most of the achievable safety gains. Using retinal vessel segmentation benchmarks (DRIVE, STARE, CHASE_DB1), we evaluate two uncertainty sources (Monte Carlo Dropout and Test-Time Augmentation) combined with three deferral strategies, and introduce a simple confidence-aware deferral rule that prioritizes uncertain and low-confidence predictions. Our results show that the best method and policy combination removes up to 80 percent of segmentation errors at only 25 percent pixel deferral, while achieving strong cross-dataset robustness. We further show that calibration improvements do not translate to better decision quality, highlighting a disconnect between standard uncertainty metrics and real-world utility. These findings suggest that uncertainty should be evaluated based on the decisions it enables, rather than in isolation.
Neural models for TCR-pMHC binding prediction are susceptible to shortcut learning: they exploit spurious correlations in training data -- such as peptide length bias or V-gene co-occurrence -- rather than the physical binding interface. This renders predictions brittle under family-held-out and distance-aware evaluation, where such shortcuts do not transfer. We introduce \emph{Counterfactual Invariant Prediction} (CIP), a training framework that generates biologically constrained counterfactual peptide edits and enforces invariance to edits at non-anchor positions while amplifying sensitivity at MHC anchor residues. CIP augments the base classifier with two auxiliary objectives: (1) an invariance loss penalizing prediction changes under conservative non-anchor substitutions, and (2) a contrastive loss encouraging large prediction changes under anchor-position disruptions. Evaluated on a curated VDJdb-IEDB benchmark under family-held-out, distance-aware, and random splits, CIP achieves AUROC 0.831 and counterfactual consistency (CFC) 0.724 under the challenging family-held-out protocol -- a 39.7\% reduction in shortcut index relative to the unconstrained baseline. Ablations confirm that anchor-aware edit generation is the dominant driver of OOD gains, providing a practical recipe for causally-grounded TCR specificity modeling.
Adaptive Conformal Inference (ACI) provides distribution-free prediction intervals with asymptotic coverage guarantees for time series under distribution shift. However, ACI only adapts the quantile threshold -- it cannot shift the interval center. When a base forecaster develops persistent bias after a regime change, ACI compensates by widening intervals symmetrically, producing unnecessarily conservative bands. We propose Bias-Corrected ACI (BC-ACI), which augments standard ACI with an online exponentially weighted moving average (EWM) estimate of forecast bias. BC-ACI corrects nonconformity scores before quantile computation and re-centers prediction intervals, addressing the root cause of miscalibration rather than its symptom. An adaptive dead-zone threshold suppresses corrections when estimated bias is indistinguishable from noise, ensuring no degradation on well-calibrated data. In controlled experiments across 688 runs spanning two base models, four synthetic regimes, and three real datasets, BC-ACI reduces Winkler interval scores by 13--17% under mean and compound distribution shifts (Wilcoxon p < 0.001) while maintaining equivalent performance on stationary data (ratio 1.002x). We provide finite-sample analysis showing that coverage guarantees degrade gracefully with bias estimation error.
Anomaly detection aims to identify observations that deviate from expected behavior. Because anomalous events are inherently sparse, most frameworks are trained exclusively on normal data to learn a single reference model of normality. This implicitly assumes that normal behavior can be captured by a single, unconditional reference distribution. In practice, however, anomalies are often context-dependent: A specific observation may be normal under one operating condition, yet anomalous under another. As machine learning systems are deployed in dynamic and heterogeneous environments, these fixed-context assumptions introduce structural ambiguity, i.e., the inability to distinguish contextual variation from genuine abnormality under marginal modeling, leading to unstable performance and unreliable anomaly assessments. While modern sensing systems frequently collect multimodal data capturing complementary aspects of both system behavior and operating conditions, existing methods treat all data streams equally, without distinguishing contextual information from anomaly-relevant signals. As a result, abnormality is often evaluated without explicitly conditioning on operating conditions. We argue that multimodal anomaly detection should be reframed as a cross-modal contextual inference problem, in which modalities play asymmetric roles, separating context from observation, to define abnormality conditionally rather than relative to a single global reference. This perspective has implications for model design, evaluation protocols, and benchmark construction, and outline open research challenges toward robust, context-aware multimodal anomaly detection.
Analog optical computers promise large efficiency gains for machine learning inference, yet no demonstration has moved beyond small-scale image benchmarks. We benchmark the analog optical computer (AOC) digital twin on mortgage approval classification from 5.84 million U.S. HMDA records and separate three sources of accuracy loss. On the original 19 features, the AOC reaches 94.6% balanced accuracy with 5,126 parameters (1,024 optical), compared with 97.9% for XGBoost; the 3.3 percentage-point gap narrows by only 0.5pp when the optical core is widened from 16 to 48 channels, suggesting an architectural rather than hardware limitation. Restricting all models to a shared 127-bit binary encoding drops every model to 89.4--89.6%, with an encoding cost of 8pp for digital models and 5pp for the AOC. Seven calibrated hardware non-idealities impose no measurable penalty. The three resulting layers of limitation (encoding, architecture, hardware fidelity) locate where accuracy is lost and what to improve next.
Mapping the spatial distribution of species is essential for conservation policy and invasive species management. Species distribution models (SDMs) are the primary tools for this task, serving two purposes: achieving robust predictive performance while providing ecological insights into the driving factors of distribution. However, the increasing complexity of deep learning SDMs has made extracting these insights more challenging. To reconcile these objectives, we propose the first implementation of concept-based Explainable AI (XAI) for SDMs. We leverage the Robust TCAV (Testing with Concept Activation Vectors) methodology to quantify the influence of landscape concepts on model predictions. To enable this, we provide a new open-access landscape concept dataset derived from high-resolution multispectral and LiDAR drone imagery. It includes 653 patches across 15 distinct landscape concepts and 1,450 random reference patches, designed to suit a wide range of species. We demonstrate this approach through a case study of two aquatic insects, Plecoptera and Trichoptera, using two Convolutional Neural Networks and one Vision Transformer. Results show that concept-based XAI helps validate SDMs against expert knowledge while uncovering novel associations that generate new ecological hypotheses. Robust TCAV also provides landscape-level information, useful for policy-making and land management. Code and datasets are publicly available.
Exploratory Landscape Analysis (ELA) provides numerical features for characterizing black-box optimization problems. In high-dimensional settings, however, ELA suffers from sparsity effects, high estimator variance, and the prohibitive cost of computing several feature classes. Dimensionality reduction has therefore been proposed as a way to make ELA applicable in such settings, but it remains unclear whether features computed in reduced spaces still reflect intrinsic properties of the original landscape. In this work, we investigate the robustness of ELA features under dimensionality reduction via Random Gaussian Embeddings (RGEs). Starting from the same sampled points and objective values, we compute ELA features in projected spaces and compare them to those obtained in the original search space across multiple sample budgets and embedding dimensions. Our results show that linear random projections often alter the geometric and topological structure relevant to ELA, yielding feature values that are no longer representative of the original problem. While a small subset of features remains comparatively stable, most are highly sensitive to the embedding. Moreover, robustness under projection does not necessarily imply informativeness, as apparently robust features may still reflect projection-induced artifacts rather than intrinsic landscape characteristics.
Large Language Models (LLMs) rely heavily on Key-Value (KV) caching to minimize inference latency. However, standard KV caches are context-dependent: reusing a cached document in a new context requires recomputing KV states to account for shifts in attention distribution. Existing solutions such as CacheBlend, EPIC, and SAM-KV mitigate this issue by selectively recomputing a subset of tokens; however, they still incur non-negligible computational overhead (FLOPs) and increased Time-to-First-Token (TTFT) latency. In this paper, we propose KV Packet, a recomputation-free cache reuse framework that treats cached documents as immutable ``packets'' wrapped in light-weight trainable soft-token adapters, which are trained via self-supervised distillation to bridge context discontinuities. Experiments on Llama-3.1 and Qwen2.5 demonstrate that the proposed KV Packet method achieves near-zero FLOPs and lower TTFT than recomputation-based baselines, while retaining F1 scores comparable to those of the full recomputation baseline.
Causal representation learning (CRL) aims to identify the underlying latent variables from high-dimensional observations, even when variables are dependent with each other. We study this problem for latent variables that follow a potentially degenerate Gaussian mixture distribution and that are only observed through the transformation via a piecewise affine mixing function. We provide a series of progressively stronger identifiability results for this challenging setting in which the probability density functions are ill-defined because of the potential degeneracy. For identifiability up to permutation and scaling, we leverage a sparsity regularization on the learned representation. Based on our theoretical results, we propose a two-stage method to estimate the latent variables by enforcing sparsity and Gaussianity in the learned representations. Experiments on synthetic and image data highlight our method's effectiveness in recovering the ground-truth latent variables.
Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.
As Large Language Models (LLMs) are increasingly integrated into agentic workflows, their unpredictability stemming from numerical instability has emerged as a critical reliability issue. While recent studies have demonstrated the significant downstream effects of these instabilities, the root causes and underlying mechanisms remain poorly understood. In this paper, we present a rigorous analysis of how unpredictability is rooted in the finite numerical precision of floating-point representations, tracking how rounding errors propagate, amplify, or dissipate through Transformer computation layers. Specifically, we identify a chaotic "avalanche effect" in the early layers, where minor perturbations trigger binary outcomes: either rapid amplification or complete attenuation. Beyond specific error instances, we demonstrate that LLMs exhibit universal, scale-dependent chaotic behaviors characterized by three distinct regimes: 1) a stable regime, where perturbations fall below an input-dependent threshold and vanish, resulting in constant outputs; 2) a chaotic regime, where rounding errors dominate and drive output divergence; and 3) a signal-dominated regime, where true input variations override numerical noise. We validate these findings extensively across multiple datasets and model architectures.
Many materials show anisotropic light scattering patterns due to the shape and local alignment of their underlying micro structures: surfaces with small elements such as fibers, or the ridges of a brushed metal, are very sparse and require a high spatial resolution to be properly represented as a volume. The acquisition of voxel data from such objects is a time and memory-intensive task, and most rendering approaches require an additional Level-of-Detail (LoD) data structure to aggregate the visual appearance, as observed from multiple distances, in order to reduce the number of samples computed per pixel (E.g.: MIP mapping). In this work we introduce first, an efficient parallel voxelization method designed to facilitate fast data aggregation at multiple resolution levels, and second, a novel representation based on hierarchical SGGX clustering that provides better accuracy than baseline methods. We validate our approach with a CUDA-based implementation of the voxelizer, tested both on triangle meshes and volumetric fabrics modeled with explicit fibers. Finally, we show the results generated with a path tracer based on the proposed LoD rendering model.
This paper presents HUANet, a constrained deep neural network architecture that unrolls the iterations of the Alternating Direction Method of Multipliers (ADMM) into a trainable neural network for solving constrained convex optimization problems. Existing end-to-end learning methods operate as black-box mappings from parameters to solutions, often lacking explicit optimality principles and failing to enforce constraints. To address this limitation, we unroll ADMM and embed a hard-constrained neural network at each iteration to accelerate the algorithm, where equality constraints are enforced via a differentiable correction stage at the network output. Furthermore, we incorporate first-order optimality conditions as soft constraints during training to promote the convergence of the proposed unrolled algorithm. Extensive numerical experiments are conducted to validate the effectiveness of the proposed architecture for constrained optimization problems.
Large language models can be aligned with human preferences through offline reinforcement learning (RL) on small labeled datasets. While single-objective alignment is well-studied, many real-world applications demand the simultaneous optimization of multiple conflicting rewards, e.g. optimizing both catalytic activity and specificity in protein engineering, or helpfulness and harmlessness for chatbots. Prior work has largely relied on linear reward scalarization, but this approach provably fails to recover non-convex regions of the Pareto front. In this paper, instead of scalarizing the rewards directly, we frame multi-objective RL itself as an optimization problem to be scalarized via smooth Tchebysheff scalarization, a recent technique that overcomes the shortcomings of linear scalarization. We use this formulation to derive Smooth Tchebysheff Optimization of Multi-Objective Preferences (STOMP), a novel offline RL algorithm that extends direct preference optimization to the multi-objective setting in a principled way by standardizing the individual rewards based on their observed distributions. We empirically validate STOMP on a range of protein engineering tasks by aligning three autoregressive protein language models on three laboratory datasets of protein fitness. Compared to state-of-the-art baselines, STOMP achieves the highest hypervolumes in eight of nine settings according to both offline off-policy and generative evaluations. We thus demonstrate that STOMP is a powerful, robust multi-objective alignment algorithm that can meaningfully improve post-trained models for multi-attribute protein optimization and beyond.
3D Gaussian Splatting (3DGS) has recently enabled highly photorealistic 3D reconstruction from casually captured multi-view images. However, this accessibility raises a privacy concern: publicly available images or videos can be exploited to reconstruct detailed 3D models of scenes or objects without the owner's consent. We present PatchPoison, a lightweight dataset-poisoning method that prevents unauthorized 3D reconstruction. Unlike global perturbations, PatchPoison injects a small high-frequency adversarial patch, a structured checkerboard, into the periphery of each image in a multi-view dataset. The patch is designed to corrupt the feature-matching stage of Structure-from-Motion (SfM) pipelines such as COLMAP by introducing spurious correspondences that systematically misalign estimated camera poses. Consequently, downstream 3DGS optimization diverges from the correct scene geometry. On the NeRF-Synthetic benchmark, inserting a 12 X 12 pixel patch increases reconstruction error by 6.8x in LPIPS, while the poisoned images remain unobtrusive to human viewers. PatchPoison requires no pipeline modifications, offering a practical, "drop-in" preprocessing step for content creators to protect their multi-view data.
The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art average F1-score of 0.9909 and outperforms the best baseline by 2.72 percentage points. It delivers superior accuracy while completely eliminating decompression and parsing overheads, offering a robust solution that generalizes to structured streaming compressors.
The Energy Conserving Descent (ECD) algorithm was recently proposed (De Luca & Silverstein, 2022) as a global non-convex optimization method. Unlike gradient descent, appropriately configured ECD dynamics escape strict local minima and converge to a global minimum, making it appealing for machine learning optimization. We present the first analytical study of ECD, focusing on the one-dimensional setting for this first installment. We formalize a stochastic ECD dynamics (sECD) with energy-preserving noise, as well as a quantum analog of the ECD Hamiltonian (qECD), providing the foundation for a quantum algorithm through Hamiltonian simulation. For positive double-well objectives, we compute the expected hitting time from a local to the global minimum. We prove that both sECD and qECD yield exponential speedup over respective gradient descent baselines--stochastic gradient descent and its quantization. For objectives with tall barriers, qECD achieves a further speedup over sECD.
On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the student and teacher should share compatible thinking patterns; and (ii) even with consistent thinking patterns and higher scores, the teacher must offer genuinely new capabilities beyond what the student has seen during training. We validate these findings through weak-to-strong reverse distillation, showing that same-family 1.5B and 7B teachers are distributionally indistinguishable from the student's perspective. Probing into the token-level mechanism, we show that successful OPD is characterized by progressive alignment on high-probability tokens at student-visited states, a small shared token set that concentrates most of the probability mass (97%-99%). We further propose two practical strategies to recover failing OPD: off-policy cold start and teacher-aligned prompt selection. Finally, we show that OPD's apparent free lunch of dense token-level reward comes at a cost, raising the question of whether OPD can scale to long-horizon distillation.
On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting in substantial infrastructure overhead. In this work, we investigate whether on-policy distillation can be performed offline. A natural approach is to precompute teacher log-probabilities once over SFT rollouts and reuse them during training. In practice, however, this offline variant fails to reliably match the performance of standard OPD. To understand this discrepancy, we identify a previously overlooked condition that is critical for any OPD pipeline, which we term teacher consistency. This condition requires that the same teacher model be used for both supervised fine-tuning and OPD. We show that violating teacher consistency introduces an irreducible gradient bias, causing both offline and online OPD to converge to a suboptimal fixed point regardless of training duration. Building on this insight, we propose Lightning OPD, an offline on-policy distillation framework that enforces teacher consistency by precomputing teacher log-probabilities over SFT rollouts. This design eliminates the need for a live teacher server entirely. We further show that, under teacher consistency, Lightning OPD shares the same optimum as standard OPD, with bounded gradient discrepancy and an implicit regularization effect that helps prevent policy drift. Extensive experiments on mathematical reasoning and code generation demonstrate that Lightning OPD achieves state-of-the-art performance with significantly improved efficiency. Starting from an SFT-initialized Qwen3-8B-Base model, Lightning OPD reaches 69.9% on AIME 2024 in just 30 GPU hours, achieving a 4.0x speedup over standard OPD and substantially lowering the barrier to entry for academic research on LLM post-training.
Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate uncertainty quantification in existing methods. We introduce the Causal Diffusion Model (CDM), the first denoising diffusion probabilistic approach explicitly designed to generate full probabilistic distributions of counterfactual outcomes under sequential interventions. CDM employs a novel residual denoising architecture with relational self-attention, capturing intricate temporal dependencies and multimodal outcome trajectories without requiring explicit adjustments (e.g., inverse-probability weighting or adversarial balancing) for confounding. In rigorous evaluation on a pharmacokinetic-pharmacodynamic tumor-growth simulator widely adopted in prior work, CDM consistently outperforms state-of-the-art longitudinal causal inference methods, achieving a 15-30% relative improvement in distributional accuracy (1-Wasserstein distance) while maintaining competitive or superior point-estimate accuracy (RMSE) under high-confounding regimes. By unifying uncertainty quantification and robust counterfactual prediction in complex, sequentially confounded settings, without tailored deconfounding, CDM offers a flexible, high-impact tool for decision support in medicine, policy evaluation, and other longitudinal domains.
Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam, serve as the cornerstone of modern training pipelines. However, large-scale model training, stringent differential privacy requirements, and distributed learning paradigms expose critical limitations in these conventional approaches regarding privacy protection and memory efficiency. To mitigate these bottlenecks, researchers explore second-order optimization techniques to surpass first-order performance ceilings, while zeroth-order methods reemerge to alleviate memory constraints inherent to large-scale training. Despite this proliferation of methodologies, the field lacks a cohesive framework that unifies underlying principles and delineates application scenarios for these disparate approaches. In this work, we retrospectively analyze the evolutionary trajectory of deep learning optimization algorithms and present a comprehensive empirical evaluation of mainstream optimizers across diverse model architectures and training scenarios. We distill key emerging trends and fundamental design trade-offs, pinpointing promising directions for future research. By synthesizing theoretical insights with extensive empirical evidence, we provide actionable guidance for designing next-generation highly efficient, robust, and trustworthy optimization methods. The code is available at https://github.com/APRIL-AIGC/Awesome-Optimizer.
The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).
The most cited calibration result in deep learning -- post-temperature-scaling ECE of 0.012 on CIFAR-100 (Guo et al., 2017) -- is below the statistical noise floor. We prove this is not a failure of the experiment but a law: the minimax rate for estimating calibration error with model error rate epsilon is Theta((Lepsilon/m)^{1/3}), and no estimator can beat it. This "verification tax" implies that as AI models improve, verifying their calibration becomes fundamentally harder -- with the same exponent in opposite directions. We establish four results that contradict standard evaluation practice: (1) self-evaluation without labels provides exactly zero information about calibration, bounded by a constant independent of compute; (2) a sharp phase transition at mepsilon approx 1 below which miscalibration is undetectable; (3) active querying eliminates the Lipschitz constant, collapsing estimation to detection; (4) verification cost grows exponentially with pipeline depth at rate L^K. We validate across five benchmarks (MMLU, TruthfulQA, ARC-Challenge, HellaSwag, WinoGrande; ~27,000 items) with 6 LLMs from 5 families (8B-405B parameters, 27 benchmark-model pairs with logprob-based confidence), 95% bootstrap CIs, and permutation tests. Self-evaluation non-significance holds in 80% of pairs. Across frontier models, 23% of pairwise comparisons are indistinguishable from noise, implying that credible calibration claims must report verification floors and prioritize active querying once gains approach benchmark resolution.
Traditional fixed-depth architectures scale quality by increasing training FLOPs, typically through increased parameterization, at the expense of a higher memory footprint, or data. A potential alternative is looped architectures, which instead increase FLOPs by sending activations through a block of layers in a loop. While promising, existing recipes for training looped architectures can be unstable, suffering from residual explosion and loss spikes. We address these challenges by recasting looping as a nonlinear time-variant dynamical system over the residual stream. Via a linear approximation to this system, we find that instability occurs in existing looped architectures as a result of large spectral norms in their injection parameters. To address these instability issues, we propose Parcae, a novel stable, looped architecture that constrains the spectral norm of the injection parameters via discretization of a negative diagonal parameterization. As a result, Parcae achieves up to 6.3% lower validation perplexity over prior large-scale looped models. Using our stable looped architecture, we investigate the scaling properties of looping as a medium to improve quality by increasing FLOPs in training and test-time. For training, we derive predictable power laws to scale FLOPs while keeping parameter count fixed. Our initial scaling laws suggest that looping and data should be increased in tandem, given a fixed FLOP budget. At test-time, we find that Parcae can use looping to scale compute, following a predictable, saturating exponential decay. When scaled up to 1.3B parameters, we find that Parcae improves CORE and Core-Extended quality by 2.99 and 1.18 points when compared to strong Transformer baselines under a fixed parameter and data budget, achieving a relative quality of up to 87.5% a Transformer twice the size.
Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of training data can improve efficiency and generalization, but existing methods rely on fixed schedules that do not adapt during training. In this work, we propose Adaptive Data Dropout, a simple framework that dynamically adjusts the subset of training data based on performance feedback. Inspired by self-regulated learning, our approach treats data selection as an adaptive process, increasing or decreasing data exposure in response to changes in training accuracy. We introduce a lightweight stochastic update mechanism that modulates the dropout schedule online, allowing the model to balance exploration and consolidation over time. Experiments on standard image classification benchmarks show that our method reduces effective training steps while maintaining competitive accuracy compared to static data dropout strategies. These results highlight adaptive data selection as a promising direction for efficient and robust training. Code will be released.
This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.
Token-based semantic communication is promising for future wireless networks, as it can compact semantic tokens under very limited channel capacity. However, harsh wireless channels often cause missing tokens, leading to severe distortion that prevents reliable semantic recovery at the receiver. In this article, we propose a token encoding framework for robust semantic recovery (TokCode), which incurs no additional transmission overhead and supports plug-and-play deployment. For efficient token encoder optimization, we develop a sentence-semantic-guided foundation model adaptation algorithm (SFMA) that avoids costly end-to-end training. Based on simulation results on prompt-based generative image transmission, TokCode mitigates semantic distortion and can approach the performance upper-bound, even under harsh channels where 40% to 60% of tokens are randomly lost.
Force and torque (F/T) sensing is critical for robot-environment interaction, but physical F/T sensors impose constraints in size, cost, and fragility. To mitigate this, recent studies have estimated force/wrench sensorlessly from robot internal states. While existing methods generally target relatively slow interactions, tasks involving rapid interactions, such as grinding, can induce task-critical high-frequency vibrations, and estimation in such robotic settings remains underexplored. To address this gap, we propose a Frequency-aware Decomposition Network (FDN) for short-term forecasting of vibration-rich wrench from proprioceptive history. FDN predicts spectrally decomposed wrench with asymmetric deterministic and probabilistic heads, modeling the high-frequency residual as a learned conditional distribution. It further incorporates frequency-awareness to adaptively enhance input spectra with learned filtering and impose a frequency-band prior on the outputs. We pretrain FDN on a large-scale open-source robot dataset and transfer the learned proprioception-to-wrench representation to the downstream. On real-world grinding excavation data from a 6-DoF hydraulic manipulator and under a delayed estimation setting, FDN outperforms baseline estimators and forecasters in the high-frequency band and remains competitive in the low-frequency band. Transfer learning provides additional gains, suggesting the potential of large-scale pretraining and transfer learning for robotic wrench estimation. Code and data will be made available upon acceptance.
Multimodal language models (MLLMs) are increasingly paired with vision tools (e.g., depth, flow, correspondence) to enhance visual reasoning. However, despite access to these tool-generated visual cues, MLLMs often fail to benefit from them. Existing approaches typically feed raw tool outputs into the model, but these dense, pixel-level representations are misaligned with the language-native reasoning strengths of LLMs, leading to weak perception and reliance on language priors. We argue that, in problems where vision tools can provide the necessary visual cues, the bottleneck is not more tool calls or larger MLLMs, it is how tool outputs are represented. We introduce Perception Programs (P$^2$), a training-free, model-agnostic method that rewrites tool outputs into compact, structured, language-native summaries that MLLMs can directly parse and reason over. Across six perception-centric tasks in BLINK, P$^2$ consistently yields large improvements over base models and raw tool-augmented baselines. With GPT-5 Mini as the base model, P$^2$ raises its accuracy from 41.35\% to 86.47\% on multi-view reasoning, from 52.42\% to 81.45\% on relative depth, and achieves a 22\% average gain across tasks, setting new state-of-the-art results. Even on smaller MLLMs, e.g., InternVL3.5-4B and Qwen3VL-4B, we observe 15-40\% absolute gains from P$^2$, surpassing prior agentic, supervised, and RL-based tool-use methods-without any training or model modifications.
Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal transferability across platforms. In this paper, we introduce TCL, a novel efficient and transferable compiler framework for fast tensor program optimization across diverse hardware platforms to address these challenges. Specifically, TCL is built on three core enablers: (1) the RDU Sampler, a data-efficient active learning strategy that selects only 10% of tensor programs by jointly optimizing Representativeness, Diversity, and Uncertainty, substantially reducing data collection costs while maintaining near-original model accuracy; (2) a new Mamba-based cost model that efficiently captures long-range schedule dependencies while achieving a favorable trade-off between prediction accuracy and computational cost through reduced parameterization and lightweight sequence modeling; and (3) a continuous knowledge distillation framework that effectively and progressively transfers knowledge across multiple hardware platforms while avoiding the parameter explosion and data dependency issues typically caused by traditional multi-task learning. Extensive experiments validate the effectiveness of each individual enabler and the holistic TCL framework. When optimizing a range of mainstream DL models on both CPU and GPU platforms, TCL achieves, on average, 16.8x and 12.48x faster tuning time, and 1.20x and 1.13x lower inference latency, respectively, compared to Tenset-MLP.
Visual tokenizers map high-dimensional raw pixels into a compressed representation for downstream modeling. Beyond compression, tokenizers dictate what information is preserved and how it is organized. A de facto standard approach to video tokenization is to represent a video as a spatiotemporal 3D grid of tokens, each capturing the corresponding local information in the original signal. This requires the downstream model that consumes the tokens, e.g., a text-to-video model, to learn to predict all low-level details "pixel-by-pixel" irrespective of the video's inherent complexity, leading to high learning complexity. We present VideoFlexTok, which represents videos with a variable-length sequence of tokens structured in a coarse-to-fine manner -- where the first tokens (emergently) capture abstract information, such as semantics and motion, and later tokens add fine-grained details. The generative flow decoder enables realistic video reconstructions from any token count. This representation structure allows adapting the token count according to downstream needs and encoding videos longer than the baselines with the same budget. We evaluate VideoFlexTok on class- and text-to-video generative tasks and show that it leads to more efficient training compared to 3D grid tokens, e.g., achieving comparable generation quality (gFVD and ViCLIP Score) with a 5x smaller model (1.1B vs 5.2B). Finally, we demonstrate how VideoFlexTok can enable long video generation without prohibitive computational cost by training a text-to-video model on 10-second 81-frame videos with only 672 tokens, 8x fewer than a comparable 3D grid tokenizer.
The BM@N experiment (Baryonic Matter at the Nuclotron) is the first fixed-target experiment at the JINR NICA accelerator complex. In this work, data on the interactions of a carbon-ion beam with kinetic energies of 4.0A~GeV and 4.5A~GeV with C, Al, Cu, and Pb targets are used to measure transverse momentum spectra and rapidity distributions of $Λ$ hyperon yields. The results are compared with the predictions of DCM-SMM, UrQMD, and PHSD transport models and with the $Λ$ yield measurements in other experiments at similar collision energies.
Observation of the 511-keV positron-annihilation line would be a powerful probe of classical novae, with the primary source of positrons likely from the $β^+$ decay of \textsuperscript{18}F. We have determined the properties of important resonances in $^{19}$Ne which govern the \textsuperscript{18}F($p,α$)\textsuperscript{15}O reaction rate and the production of \textsuperscript{18}F in novae. Measured $α$ and proton angular distributions from states populated in the \textsuperscript{19}F(\textsuperscript{3}He,$t$)\textsuperscript{19}Ne reaction identified six near-threshold proton $s$-wave \textsuperscript{18}F$+p$ ($L_p=0$) states, and the asymptotic normalization of these states was studied using the symmetry-adapted no-core shell model. We have improved our understanding of states contributing to the \textsuperscript{18}F($p,α$)\textsuperscript{15}O reaction rate and show that earlier studies significantly underestimated the uncertainties.
The precise determination of the parton distribution functions (PDFs) of the proton is an essential ingredient for LHC analyses, including for those at the upcoming High-Luminosity LHC. So far, PDFs are determined from global fits to binned low-dimensional data obtained from unfolded hard-scattering cross section measurements. In this work we demonstrate for the first time the feasibility of neural simulation-based inference (NSBI) for constraining the proton PDFs using a high-dimensional unbinned data set. Exploiting the full statistical power of unbinned data removes the loss of information inherited by the binning procedure. As a proof-of-concept, we determine the gluon PDF from simulated data of top quark pair production at the LHC with $\sqrt{s}=13$ TeV. Taking into account both experimental and theoretical systematic uncertainties in the detector-level features, we demonstrate how the NSBI pipeline achieves significant improvements in precision compared to existing low-dimensional binned analyses. Our results illustrate the potential of unbinned inference to reduce the reliance on coarse approximations of uncertainties and their correlations entering PDF determinations, hence contributing to a new paradigm of unbinned detector-level ML-assisted measurements at the LHC.
The term 'neutrinoless' is a cornerstone of modern particle physics, yet it defines a fundamental process by what is missing rather than what is created. We trace the origins of this privative neologism to a 1953 experimental claim and show how a 'sociology of suspicion' transformed Ettore Majorana's affirmative ontology into an agnostic shorthand. By examining this linguistic shift, we argue that our current terminology may obscure the profound physical meaning of the search. Reclaiming the language of 'matter creation' is not merely a semantic choice, but a timely conceptual shift to bridge the gap between experimental caution and the radical character of the laws of nature we aim to uncover.
In this paper we study axion-like particles (ALPs) with lepton-flavour-violating (LFV) couplings in the mass regime above the muon threshold, $m_a>m_μ$, where the strong bound from the exotic muon decay $μ\to ea$ no longer apply and the decay channel $a\to eμ$ becomes kinematically accessible. In this region, the ALP typically decays promptly, motivating new search strategies based on its production in decays involving virtual muons. We analyse charged-meson and $W$ decays, neutral-current processes such as $Z$ and quarkonium decays, and, when couplings to the third generation are present, LFV $τ$ decays. The subsequent decay $a\to eμ$ leads to striking LFV signatures with negligible Standard Model backgrounds. Combining these production modes with current low-energy constraints, we assess the sensitivity of future high-energy $e^+e^-$ colliders, flavour factories such as Belle II and STCF, fixed-target experiments such as NA62, and proton beam-dump facilities such as SHiP. Overall, our results identify LFV ALP production in meson, gauge-boson, quarkonium and $τ$ decays (with displaced vertices) as a promising and largely unexplored avenue to test ALP interactions with charged leptons above the muon mass threshold.
Neutrons are important final-state particles in neutrino interactions, yet they are not considered or reconstructed in most current neutrino LArTPC physics analyses. In this paper, we present a simulation-based proof-of-concept study of neutron reconstruction in a generic LArTPC detector. Leveraging isolated, MeV-scale energy deposits, or blips, from neutron inelastic scattering, and using realistic blip response from published experimental results, we demonstrate the capability to identify neutrons and to reconstruct the direction and energy of the final-state neutron system in sub-GeV neutrino interactions. We then explore how neutron-related blip attributes can be used to improve physics studies of neutrino interactions, such as enhancing neutrino-antineutrino separation in atmospheric neutrinos and reverse-horn-current beam neutrinos. This simple study provides an initial quantification of LArTPC neutron reconstruction capabilities, which we expect to improve with future advancements in blip reconstruction, identification, and classification algorithms, as well as the modeling of neutrons.
To address the Reactor Antineutrino Anomaly (RAA) observed in neutrino experiments, the Reactor Experiment for Neutrino and Exotics (RENE) has been initiated using a liquid scintillation detector. In this study, we investigate the characteristics of two 20-inch Hamamatsu R12860 photomultiplier tubes (PMTs) intended for installation in the RENE detector. The charge and timing responses of the PMTs were evaluated at both the nominal and target gains expected during actual operation. In particular, gain non-uniformity arising from the large-diameter photocathode with a box-and-line type dynode structure was examined, and the maximum gain variation was measured. The occurrence rate, timing, and charge distributions of late pulses and afterpulses were also investigated to characterize the specific response features of the R12860 PMT. The results reported in this study will aid in the interpretation of signals from the RENE detector and serve as a reference for estimating potential systematic uncertainties in RENE data. Furthermore, these findings are expected to provide valuable information for other experiments employing the same type of PMTs.
We have developed and commissioned an experimental system at ELI-NP towards searches for axion-like particles (ALPs) in the worldwide 10~PW-class laser facility. The search principle is based on the Four-Wave Mixing (FWM) process at a focal region of coaxially combined two laser beams. The subsystems to control vacuum pressure, area size, spatiotemporal overlap and trigger-event pattern, are integrated into the experimental area for 0.1 PW laser output at ELI-NP. The integrated system is dedicated to identifying the possible background sources originated from the residual atoms and the optical elements. The performance and functionality of the subsystems were validated through the evaluations of laser characteristics, their stability and the FWM signal detections. Furthermore commissioning results for the background studies were demonstrated with 20 mJ-level laser pulses at the vacuum pressure of $1.3 \times 10^{-7}$ mbar. In conclusion, the integrated experimental system is fully functional as designed and provides a suitable platform for the background studies towards the ALP searches, enabling a stepwise scale-up of the laser pulse energies from 20 mJ to the maximum energy of 2.5 J in the 0.1 PW experimental area.
Collider experiments are equipped with trigger systems that rapidly inspect the physics content emerging from collisions to decide whether the resulting products are worth saving for later analysis. One crucial aspect for analyzing the final states originating from the collisions is to process the information produced by charged particles in the innermost detectors to reconstruct the corresponding trajectories. This task is a challenge for the experiments running at the Large Hadron Collider (LHC) at CERN because of the large number of secondary collisions per bunch crossing, the so-called pile-up vertices, giving rise to extremely high hit occupancies in the detector layers close to the beam line. Reconstructing tracks is a combinatorial problem and its processing time strongly depends on the average pile-up per event. The future accelerator-complex upgrade to the High-Luminosity LHC, implying even higher detector occupancies, will result in a considerable growth of the computational cost of the current trigger strategies. To face this issue, a new technique for assisting track reconstruction by filtering out unnecessary detector information is presented and characterized in this work. The algorithm is based on a convolutional-neural-network architecture which can be easily deployed on accelerator cards. The impact of this approach is assessed and future prospects are also discussed.
We present the TQ4Q2.0 fragmentation functions for the production of all-heavy (fully heavy) $S$-wave tetraquarks ($T_{4Q}$) with scalar ($0^{++}$), axial-vector ($1^{+-}$), and tensor ($2^{++}$) quantum numbers in high-energy hadronic collisions. This work extends the previous TQ4Q1.1 framework by incorporating nonconstituent heavy-quark contributions and introducing a replica-based uncertainty-quantification strategy derived from multi-scale variations (MHOUs). The construction follows a nonrelativistic QCD factorization approach, combining gluon- and heavy-quark-initiated fragmentation channels at leading power. Initial-scale inputs are modeled through updated potential-inspired wave functions, while the subsequent DGLAP evolution is performed via the threshold-aware HF-NRevo scheme. A comprehensive systematic analysis of uncertainties is carried out, with contributions from color-composite long-distance matrix elements (LDMEs) and perturbative multiscale inputs. The resulting TQ4Q2.0 grids, publicly released in LHAPDF6 format, provide the first complete phenomenological set for all-heavy exotics, enabling precise studies of all-charm tetraquark production and jet-associated observables within the JETHAD environment. This article completes the high-energy resummation-driven generation of the TQ4Q program and establishes a definitive baseline for future collider-oriented analyses of all-heavy multiquark dynamics.
The production of top quark pairs is one of the most relevant production modes at the LHC, and allow for precise measurement of the properties of this particle. Top quarks are also produced through rarer mechanisms, including the production of multiple top quarks or the associated production of top quarks with electroweak gauge bosons. Although these processes have significantly smaller cross sections, they provide unique sensitivity to the couplings of the top quark and to possible effects of physics beyond the standard model (SM). This contribution reviews recent analyses of rare top quark production performed by the ATLAS and CMS Collaborations.
A method for selecting and/or rejecting leptons from charm semileptonic decays based on the tagging of the secondary vertex using a hadron track is introduced. The method is developed for dimuon Drell-Yan measurements in LHCb using full simulations in proton-proton collisions at $\sqrt{s}=13.6$ TeV. We focus on the invariant mass range between 2.9 and 5 GeV/$c^2$ with single muon transverse momentum larger than 1 GeV/$c$. A novel strategy is detailed for background rejection, achieving an improvement of the signal over background of a factor $\sim 4$ at an efficiency of 81% with minimal bias on the Drell-Yan signal properties. Moreover, a second approach is presented for the construction of unbiased background-pure samples of single muons from charm decays, achieving a charm efficiency of 21.4% at a Drell-Yan efficiency of 1.1%.
Low Gain Avalanche Diodes are prime candidates for high-resolution timing applications in High Energy Physics, Nuclear science, and several other fields. Operating these devices in high-radiation environments presents various hazards, including the risk of their permanent degradation or destruction caused by effects such as Single Event Burnout. Studies using minimum ionizing particles found a greatly reduced Single Event Burnout risk by operating below a bias voltage corresponding to an average electric field of 12 V/$μ$m - however, as high energy particle colliders produce a wide energy spectrum of radiation, it is crucial to understand this phenomenon and other possible damage mechanisms at energy deposition levels greater than those of minimum ionizing particles. This was achieved by pre-irradiating LGADs and PiN diodes with active thicknesses of 20, 30, and 50 $μ$m up to 1.5 $\times$ 10$^{15}$ $\mathrm{n_{eq}/cm^2}$, and exposing them to beams of protons and heavy ions (C, O, Fe, Au) at the BNL Tandem van de Graaff accelerator. Several mortality categories were observed, defined by different electrical and mechanical damage signatures. This furthers our understanding of permanent radiation damage of silicon devices, crucial towards mitigating Single Event Burnout and other damage mechanisms to safely operate future detectors.
Observation of baryon number violation (BNV) in laboratory experiments would constitute unambiguous evidence for physics beyond the Standard Model. We propose dedicated searches for \textit{apparent} BNV in charm-baryon decays, $Λ_c^+\to M^+ +$ missing energy ($M=π, K$) where the missing energy stems from a resonance. These channels have not been explored experimentally so far, despite the relatively clean environment potentially provided by near $Λ_c^+\overlineΛ_c^-$ threshold production at $e^+e^-$ colliders. Performing state-of-the-art Monte Carlo simulations for the proposed Super Tau-Charm Facility (STCF), we evaluate the signal efficiencies and derive projected model-independent sensitivities under the assumption of negligible background. We further interpret these sensitivities within two theoretical frameworks: a sterile-neutrino-extended low-energy effective field theory ($ν$LEFT) and R-parity-violating (RPV) supersymmetry. With an integrated luminosity of 1 ab$^{-1}$, STCF can probe new-physics scales of several TeV in the $ν$LEFT description and constrain the RPV model parameter $λ''_{212}/m^2_{\tilde{q}}$ down to about $0.1~\mathrm{TeV}^{-2}$. Our results demonstrate that STCF provides a highly competitive opportunity for probing BNV interactions in rare charm-baryon decays.
The first observation of a charmless purely baryonic decay, $Λ_b^{0} \to Λp \bar{p}$, is reported using the full Run 2 LHCb dataset, corresponding to an integrated luminosity of $6.0~fb^{-1}$. The branching fraction is measured relative to that of the topologically similar normalisation mode $Λ_b^{0} \to ΛK^+K^-$. A simultaneous fit to the long- and downstream-track categories yields a signal significance of $5.1σ$ after including systematic uncertainties. The relative branching fraction is measured to be $\left(5.13 \pm 1.28_{\rm stat} \pm 0.27_{\rm syst}\right)\times 10^{-2}$ in the region $m(h^+h^-)< 2.85$ GeV.
The inclusive production of the $η_c(1S)$, $η_c(2S)$ and $χ_{c}$ charmonium states in $b$-hadron decays is studied with LHCb Run~2 data, corresponding to an integrated luminosity of $5.9~\text{fb}^{-1}$, using charmonia decays to $φφ$ pairs. The production branching fractions of the $χ_{c}(1P)$ states in $b$-hadron decays are measured, using $b \to η_c(1S) (\to φφ) X$ as a normalisation channel, with $X$ indicating any additional particles. The results are \begin{align*} &{\cal{B}} (b \to χ_{c0} X) = (1.34 \pm 0.13 \pm 0.06 \pm 0.37) \times 10^{-3}, &{\cal{B}} (b \to χ_{c1} X) = (1.58 \pm 0.12 \pm 0.09 \pm 0.44) \times 10^{-3}, &{\cal{B}} (b \to χ_{c2} X) = (0.55 \pm 0.08 \pm 0.05 \pm 0.15) \times 10^{-3}, \end{align*} where the first uncertainty is statistical, the second systematic and the last is due to the limited knowledge of externally measured branching fractions. The production branching fraction of $η_c(2S)$ times the branching fraction of its decay into $φφ$ is measured as ${\cal{B}} (b \to η_c(2S) X) \times {\cal{B}} (η_c(2S) \to φφ) = (4.0 \pm 0.6 \pm 0.6 \pm 1.1) \times 10^{-7}$. Furthermore, the mass of the $η_c(1S)$ state is measured to be $M_{η_c(1S)} = 2984.1 \pm 0.5 \pm 0.5$ MeV with the best precision to date.
We searched for proton decay via $p \to e^{+}π^{0}π^{0}$ and $p \to μ^{+}π^{0}π^{0}$ in 0.401 megaton-years of data collected in all pure water detector phases of Super-Kamiokande (SK) I-V. A theoretical study predicts proton decay rates without assuming a particular grand unified theory and suggests that three-body proton decays involving two pions can have decay rates comparable to those of $p \to e^{+}π^{0}$ and $p \to μ^{+}π^{0}$. This is the first search for proton decay into a charged anti-lepton and two neutral pions in SK. One data candidate event was found for each of the two decay modes, which is consistent with the expected atmospheric neutrino background. We set lower limits on the lifetime of $τ/B(p \to e^{+}π^{0}π^{0}) > 7.2 \times 10^{33}$ years and $τ/B(p \to μ^{+}π^{0}π^{0}) > 4.5 \times 10^{33}$ years at 90 $\%$ confidence level. These limits are more than one order of magnitude higher than those of the previous experiment.
We propose that the enhanced Higgs quartic coupling required by radiatively broken electroweak symmetry (RBEWS) emerges naturally from SO(10) grand unification. Our previous analysis demonstrated that a coupling enhancement factor $k = λ_{\rm enhanced}/λ_{\rm SM}$ leads to absolute vacuum stability with a UV Landau pole near the GUT scale for $k \gtrsim 1.03$. The RBEWS prediction $e_{125} = 7.2$ of Steele and Wang, when properly translated from the Coleman-Weinberg scheme at the electroweak VEV to the $\overline{\rm MS}$ scheme at $M_t$ via scheme conversion and scale-dependent ratio evolution, yields $k(M_t) \approx 6.0$--$6.4$, corresponding to a UV pole at $Λ_{\rm UV} \sim {1.5\text{--}2} \times 10^{16}$~GeV -- remarkably close to the GUT scale $M_{\rm GUT} \sim 2 \times 10^{16}$~GeV. We argue this coincidence is not accidental: the UV pole signals the scale where the Standard Model effective description must be embedded into the full SO(10) structure. We derive threshold corrections from SO(10) scalar sectors containing $\mathbf{10}_H$, $\mathbf{\overline{126}}_H$, and $\mathbf{45}_H$ representations, showing that portal couplings between the light Higgs doublet and heavy GUT scalars can generate enhancement factors of order $k \sim 5$--$10$ at the matching scale. The Coleman-Weinberg mechanism operating within a classically scale-invariant GUT scalar potential provides a dynamical origin for both RBEWS and the hierarchy between $M_{\rm GUT}$ and the electroweak scale.
Recently, the BESIII Collaboration reported the first observation of the decays $χ_{cJ} \to ηηη^\prime$ in order to search for the $1^{-+}$ exotic state $η_1(1855)$. A partial wave analysis of the $ηη^\prime$ invariant mass spectrum shows no significant signal for the $η_1(1855)$. In this work, we, using an effective Lagrangian approach, investigate the processes $χ_{cJ} \to ηηη^\prime$ via the box and triangle loops involving charmed mesons and the scalar meson $f_0(1500)$. Our calculations reproduce well the experimental branching fractions of $χ_{cJ} \to ηηη^\prime$. Furthermore, we present the predictions of the relevant invariant mass spectra of $ηη^\prime$ and $ηη$ produced in the $χ_{c1}$ decay, which seem overall consistent with the BESIII measurements. In the present model, the decay $χ_{c1} \to ηηη^\prime$ is dominated by the triangle and box loop contributions. The consistency between our theoretical results and the BESIII measurements sheds light on the underlying decay mechanism of the $χ_{cJ}$ decaying into light mesons and might be helpful to understand the absence of the $η_1(1855)$ signal in the decay channels $χ_{cJ} \to ηηη^\prime$.
We study the lifetimes and inclusive semileptonic decay widths of doubly heavy baryons within the framework of heavy quark expansion. Our analysis includes next-to-leading-order corrections to the dimension-3, -5, and -6 operators, together with the leading dimension-7 contributions, while the nonperturbative matrix elements are evaluated in a bag model with translationally improved baryon wave functions. We obtain $( τ_{Ξ_{cc}^{++}} , τ_{Ξ_{cc}^{+}} , τ_{Ω_{cc}^{+}} ) = ( 2.67 \pm 0.94,\, 0.47 \pm 0.08,\, 1.79 \pm 0.62 ) \times 10^{-13}\,{\rm s}$ and $( τ_{Ξ_{bb}^{0}} , τ_{Ξ_{bb}^{-}} , τ_{Ω_{bb}^{-}} ) = ( 0.75 \pm 0.11,\, 0.92 \pm 0.15,\, 0.93 \pm 0.15 ) \times 10^{-12}\,{\rm s}$, where the uncertainties here arise from the heavy quark pole masses and the hadronic scale adopted in the quark model. Hence, the lifetime hierarchy patterns are $τ(Ξ_{cc}^{++})>τ(Ω_{cc}^+)>τ(Ξ_{cc}^+)$ and $τ(Ω_{bb}^{-})\simτ(Ξ_{bb}^-)>τ(Ξ_{bb}^0)$ for doubly charmed and bottom baryons, respectively. The $W$-exchange contribution plays a crucial role in generating the large lifetime splitting in the doubly charmed sector and remains phenomenologically important for doubly bottom baryons. In addition to the total lifetimes, we calculate the separate nonleptonic and semileptonic contributions, which allow us to trace the pattern of spectator effects in each baryon channel. We also evaluate the inclusive semileptonic decay widths and the decay width asymmetries, which provide complementary probes of the underlying decay mechanisms.
We show that the $\mathrm{SU}(3)_C \times \mathrm{SU}(3)_L \times \mathrm{U}(1)_N$ model with right-handed neutrinos naturally accommodates a viable sub-MeV dark matter (DM) candidate realized as pseudo-Goldstone boson that acquires tiny mass through gravitational effects. The observed relic abundance is obtained via freeze-in in a low-reheating temperature scenario, without requiring tiny couplings. The model operates at the TeV scale and remains testable at current and future collider experiments.
We show that, in the collinear regime, the fixed--$j$ holographic double deeply virtual Compton scattering (DDVCS) amplitude contains the same hypergeometric hard kernel as the $\pm$-basis Wilson coefficients of perturbative QCD. Starting from the $t$--channel Witten diagram, we derive the closed-string fixed--$j$ amplitude and obtain the even-spin open-string channel by a parallel replacement rule. After holographic collinear factorization, the upper photon vertex is universal and model independent: in the conformal limit it depends only on the pure-AdS bulk wave functions of the two virtual photons and yields an exact Gauss hypergeometric function of $η^2/ξ^2$. The Mellin exponent $δ_X(j)=j+Δ_X(j)-2=2j+γ_X(j)$ is fixed by Witten-diagram $z$-power counting, while all infrared model dependence is isolated in lower hadronic conformal moments. Comparing with the singlet vector Compton form factor in the conformal operator product expansion, we find that at a single matching scale $Q=μ=μ_0=μ_\ast$ the open channel matches the $(+)$ eigenchannel and the closed channel matches the protected $(-)$ eigenchannel. The sharpest anchor is the first physical even moment $j=2$, together with the distinct $\sqrt{j-1}$ and $\sqrt{j-2}$ branch-point structure of the open and closed trajectories. Logarithmic running deforms only the scale dependence, not the channel dictionary. The result is a fixed--$j$, fixed-scale structural matching statement for holographic DDVCS/DVCS, not a claim of all-scale equality or a global fit.
Holographic QCD reproduces the leading short-distance vector-current two-point function in vacuum, fixing the bulk gauge coupling by matching the logarithmic $Q^2$ dependence of the boundary current correlator. We show that this vacuum matching extends to the off-forward hadronic current-current correlator relevant for DDVCS/DVCS. Starting from the fixed-$j$ $t$-channel Witten diagram, we derive a factorized holographic Compton amplitude whose ultraviolet photon vertex is universal and model independent, while all infrared sensitivity is isolated in hadronic conformal moments. In the conformal limit this upper vertex depends only on the pure-AdS bulk wave functions of the virtual photons and yields an exact Gauss hypergeometric kernel. In the collinear window and at a single matching scale $Q=μ=μ_0=μ_\ast$, this kernel matches exactly the $\pm$-basis Wilson coefficients of the singlet conformal operator product expansion in perturbative QCD. The channel dictionary is fixed dynamically: the closed-string branch matches the protected $(-)$ eigenchannel, while the open-string branch matches the unprotected $(+)$ eigenchannel, with the first physical even moment $j=2$ and the distinct $\sqrt{j-2}$ versus $\sqrt{j-1}$ branch points providing the sharpest anchor. The result is therefore an exact fixed-scale matching statement for the hadronic current-current correlator in the fixed-$j$ channel. It identifies the holographic DDVCS/DVCS amplitude as a hadronic generalization of the familiar vacuum current-correlator matching.
We illustrate, via a simplified model, a scenario in which the baryon-asymmetry and, possibly the dark matter component of the Universe are simultaneously generated by the decay of a WIMP-like mother particle, in turn produced non-thermally during an epoch of Early Matter domination. We first consider the standard evolution of the Universe and introduce TeV-scale BSM particles, finding that this paradigm cannot produce enough baryon asymmetry. This deficiency can be resolved by considering a non-standard scenario, with a matter-dominated phase prior to radiation-domination. Finally, we include a dark matter candidate, which is non-thermally produced during the Early Matter domination. Our results demonstrate an interesting common origin of baryon asymmetry and Dark Matter, with the particle masses lying within the collider-detectable range, thanks to the presence of non-standard evolution in the early Universe.
We present a hybrid study that combines a concise review of scalar-field cosmology with new analytic developments that integrate averaging reductions for oscillatory regimes with dynamical-systems techniques. For oscillatory fields, we derive an averaging reduction that yields an effective slow system whose time averages control dissipation; introducing uniform derivative bounds, Barbalat/LaSalle arguments, and a finite-dimensional center/stable manifold reduction, we carry out late-time analysis of the models. We prove persistence of equilibria, decay estimates, and local invariant manifolds under small $C^k$ perturbations of $χ(φ)$ and $G(a)$, quantify how averaged dissipation lifts to the full oscillatory dynamics with an $\mathcal{O}(H)$ error, and provide numerical examples. In addition to asymptotic reductions, we obtain exact quadrature solutions in general relativistic, anisotropic, and brane-world settings, yielding closed-form expressions for $t(a)$, $φ(a)$, and $H(a)$ and enabling analytic computation of inflationary observables.
Energy correlators offer a clean probe of quantum chromodynamics, serving as an ideal laboratory to rigorously investigate non-perturbative power corrections. The recent discovery that linear corrections exhibit a universal anomalous scaling points to a deep, underlying theoretical structure. We uncover the quantum field-theoretic origin of this phenomenon in the energy-energy correlator using light-ray operators. Through an explicit loop calculation, we derive the one-loop anomalous dimension, revealing that the dijet operator must be combined with a specific triple-jet component. This provides a first-principles framework that connects operator theory with high-precision collider phenomenology.
The $U(1)_A$ symmetry of the massless QCD Lagrangian is explicitly broken by the axial anomaly, but it may be effectively restored at finite temperature. Determining the temperature at which this occurs is important for understanding the chiral transition and the structure of the QCD phase diagram. A commonly used probe of effective $U(1)_A$ restoration is the degeneracy of flavour non-singlet pseudoscalar and scalar susceptibilities. Using anisotropic lattice QCD ensembles with Wilson-clover fermions generated by the \textsc{Fastsum} collaboration, we study this degeneracy through hadronic correlation functions over a wide range of temperatures. The fine temporal resolution of our Generation 3 ensembles allows us to determine the temperature at which the pseudoscalar and scalar channels become degenerate. We find evidence for the effective restoration of $U(1)_A$ symmetry at $T_{U(1)_A}=319(22)$ MeV, well above the chiral crossover temperature.
We present GreyRing, a new model for the post-merger signal in black-hole binary coalescences based on the greybody factor of the remnant. The model accurately reproduces the full frequency-domain ringdown signal of a large set of comparable-mass, aligned-spin numerical relativity waveforms, achieving mismatches of order ${\cal O}(10^{-6})$ for the dominant $(\ell,m)=(2,2)$ mode, and typically outperforming state-of-the-art time-domain models. Building on this model, we introduce a novel consistency test of strong gravity based on the greybody factor: the remnant mass and spin inferred from GreyRing can be compared with those obtained through standard black hole spectroscopy. This agnostic test relies exclusively on the post-merger signal and does not require the inclusion of overtones or the choice of very early ringdown starting times, combining the advantages of inspiral-merger-ringdown consistency tests and traditional black hole spectroscopy. We apply the test to GW250114 and find that the remnant mass and spin inferred from GreyRing are consistent with those measured from the full signal. Remarkably, the inferred parameters can be measured with a precision comparable to, or slightly better than, that achieved with standard black-hole spectroscopy. Our greybody-factor waveform model allows for new precision tests of strong gravity using the ringdown signal.
In this paper we study axion-like particles (ALPs) with lepton-flavour-violating (LFV) couplings in the mass regime above the muon threshold, $m_a>m_μ$, where the strong bound from the exotic muon decay $μ\to ea$ no longer apply and the decay channel $a\to eμ$ becomes kinematically accessible. In this region, the ALP typically decays promptly, motivating new search strategies based on its production in decays involving virtual muons. We analyse charged-meson and $W$ decays, neutral-current processes such as $Z$ and quarkonium decays, and, when couplings to the third generation are present, LFV $τ$ decays. The subsequent decay $a\to eμ$ leads to striking LFV signatures with negligible Standard Model backgrounds. Combining these production modes with current low-energy constraints, we assess the sensitivity of future high-energy $e^+e^-$ colliders, flavour factories such as Belle II and STCF, fixed-target experiments such as NA62, and proton beam-dump facilities such as SHiP. Overall, our results identify LFV ALP production in meson, gauge-boson, quarkonium and $τ$ decays (with displaced vertices) as a promising and largely unexplored avenue to test ALP interactions with charged leptons above the muon mass threshold.
We systematically study the spin correlations and quantum entanglement in transversely polarized electron-positron collisions. We find that the $s$-channel QED process $e^-e^+\to f\bar f$ produces a maximally entangled state in the entire phase space when the initial beams are transversely polarized, while the quantum magic varies in different phase space points for the maximally entangled Bell states. For electroweak processes, the spin configuration of final states depends on chiral couplings, and the entanglement is also greatly enhanced by transverse polarization as in the QED process. For Bhabha scattering with additional $t$-channel contributions, the transverse polarization still increases the final state entanglement, although with some dilution. The sensitive dependence of final spin states on the transverse polarization makes the beam polarization a powerful tool for generating and controlling quantum entanglement in collider experiments, opening up new opportunities for quantum information studies at high-energy colliders.
We calculate the spin density matrix of a back-to-back quark-antiquark pair inclusively produced in electron-nucleus scattering, taking into account the gluon saturation effect and the linearly polarized gluon distribution. We then investigate concurrence and stabilizer Rényi entropy, quantifying entanglement, Bell-nonlocality, and magic. We find that the linearly polarized gluon distribution tends to enhance the entanglement of a heavy quark pair when the total and relative transverse momenta of the pair are orthogonal.
We present the TQ4Q2.0 fragmentation functions for the production of all-heavy (fully heavy) $S$-wave tetraquarks ($T_{4Q}$) with scalar ($0^{++}$), axial-vector ($1^{+-}$), and tensor ($2^{++}$) quantum numbers in high-energy hadronic collisions. This work extends the previous TQ4Q1.1 framework by incorporating nonconstituent heavy-quark contributions and introducing a replica-based uncertainty-quantification strategy derived from multi-scale variations (MHOUs). The construction follows a nonrelativistic QCD factorization approach, combining gluon- and heavy-quark-initiated fragmentation channels at leading power. Initial-scale inputs are modeled through updated potential-inspired wave functions, while the subsequent DGLAP evolution is performed via the threshold-aware HF-NRevo scheme. A comprehensive systematic analysis of uncertainties is carried out, with contributions from color-composite long-distance matrix elements (LDMEs) and perturbative multiscale inputs. The resulting TQ4Q2.0 grids, publicly released in LHAPDF6 format, provide the first complete phenomenological set for all-heavy exotics, enabling precise studies of all-charm tetraquark production and jet-associated observables within the JETHAD environment. This article completes the high-energy resummation-driven generation of the TQ4Q program and establishes a definitive baseline for future collider-oriented analyses of all-heavy multiquark dynamics.
Hard scattering events in high-energy collisions produce highly virtual partons that subsequently fragment into collimated hadronic cascades. When such partonic showers evolve in a QCD medium, as in deep-inelastic scattering or heavy-ion collisions, the resulting multi-particle distributions encode information about the surrounding matter. Decades of theoretical developments have led to a consistent and order-by-order improvable perturbative description of the shower. This description needs, however, the non-perturbative input that encodes the structure of the hadronic matter. The determination of such input remains challenging within conventional computational approaches, thereby limiting the applicability of the approach. In this work, we develop a framework that employs quantum simulation techniques to compute multi-particle processes in such environments by mapping partonic cross-sections to quantum circuits. As benchmarks, we analyze dipole formation and the QCD antenna radiation pattern at leading order in the strong coupling constant, comparing the results with analytic estimates in simplified limits. The quantum circuit formulation here introduced naturally extends to higher perturbative orders and enables amplitude-level computations in complex matter backgrounds. This provides a systematic foundation for applying quantum information science methods to study multi-particle dynamics in QCD media.
False-vacuum decay between two morphologically distinct supersolid phases via bubble nucleation is studied in a uniform dipolar gas confined to the plane. Starting from a metastable honeycomb state, the formation of stripe phase domains is simulated numerically by means of a stochastic projected extended Gross-Pitaevskii equation. The speed of bubble growth is analyzed in relation to the multiple speeds of sound of the supersolid, and is found to be set by the slowest of these sounds. The vacuum decay rate is numerically extracted and compared against a minimal effective model for the Coleman bounce solution connecting the two supersolid orders. Our results establish dipolar supersolids as a novel and versatile platform for studying false-vacuum decay. This setting offers a rich structure of metastable states and collective excitations that come into play in the decay. Furthermore, here, in contrast to previous studies, bubble formation occurs directly in the real-space density and can be probed with \textit{in situ} imaging.
In this work, we study the motion of massive test particles and the gravitational--wave emission associated with periodic trajectories around a magnetically charged black hole immersed in a \textit{Hernquist} dark matter halo. We begin by analyzing the effective potential and the conditions for stable motion, with particular attention to the marginally bound radius and the innermost stable circular orbit. Our results show that the dark matter parameters, namely the halo density and scale radius, enlarge the allowed region and generally shift the relevant characteristic radii and angular momenta toward larger values. In contrast, the magnetic charge partially counterbalances this behavior. We then examine periodic trajectories through the rational number $q$, which characterizes the relation between the azimuthal and radial frequencies, and construct representative zoom--whirl configurations together with their precessing counterparts. Finally, we investigate the imprints of dark matter and magnetic monopole charge on the gravitational--wave polarizations in the extreme mass--ratio regime.
A method for selecting and/or rejecting leptons from charm semileptonic decays based on the tagging of the secondary vertex using a hadron track is introduced. The method is developed for dimuon Drell-Yan measurements in LHCb using full simulations in proton-proton collisions at $\sqrt{s}=13.6$ TeV. We focus on the invariant mass range between 2.9 and 5 GeV/$c^2$ with single muon transverse momentum larger than 1 GeV/$c$. A novel strategy is detailed for background rejection, achieving an improvement of the signal over background of a factor $\sim 4$ at an efficiency of 81% with minimal bias on the Drell-Yan signal properties. Moreover, a second approach is presented for the construction of unbiased background-pure samples of single muons from charm decays, achieving a charm efficiency of 21.4% at a Drell-Yan efficiency of 1.1%.
Sommerfeld-enhanced annihilation cross sections in the presence of nearly zero-energy bound states can become so large that perturbative partial-wave unitarity appears to be violated. Previous literature incorporated the short-distance annihilation potential self-consistently into the computation of the Schrödinger wave function at the origin, leading to the unitarization of the Sommerfeld effect in vacuum. We employ non-relativistic effective field theory methods and the Keldysh-Schwinger formalism to additionally include pair-creation effects in the self-consistent computation of four-point correlation functions, which renders the unitarization temperature dependent. Up to small thermal corrections in the non-relativistic and dilute regime of the pairs, we confirm the previous results based on the Schrödinger equation approach for scattering states in vacuum. For the first time, we analyze bound-state contributions beyond their leading decay via annihilation. Interestingly, our self-consistent computation of the four-point correlation function shows that bound states remain on-shell in their out-of-equilibrium decay, even though their spectral functions take the form of Breit-Wigner distributions due to finite decay widths. While this may appear paradoxical, it aligns with expectations from earlier results based on exact analytic solutions of the Kadanoff-Baym equations for a decaying elementary particle in a thermal environment.
The evolution of a charged lepton in the field of an electromagnetic plane wave can be described as a superposition of Volkov states. Here we demonstrate that imposing specific momentum correlations among Volkov states produces a spatiotemporally structured wavepacket whose probability-density peak travels at an arbitrary, tailored velocity. This velocity can be chosen independently of both the field amplitude and the velocity expectation value. The imposed momentum correlations modify the expectation-value trajectory, providing a measurable signature of the arbitrary velocity within a physical observable.
We study in QCD the $\overline{\mathrm{MS}}$ renormalization of three-quark operators with up to two covariant derivatives, which are related to $N=0,1,2$ Mellin moments of baryonic light-cone distributions amplitudes. Apart from general three-quark operators, we also consider those corresponding to spin 3/2 and 1/2 states. We present in analytic form the renormalization constants and anomalous dimensions of these operators through three loops, confirming previous two- and three-loop results for $N=0$. Furthermore, we evaluate through two loops their amputated four-point Green's functions with RI${}^\prime$/MOM four-momentum assignment, which are required for the matching of lattice results with perturbative calculations. We work in linear covariant gauge and find the anomalous dimensions to be gauge independent as expected.
Observation of baryon number violation (BNV) in laboratory experiments would constitute unambiguous evidence for physics beyond the Standard Model. We propose dedicated searches for \textit{apparent} BNV in charm-baryon decays, $Λ_c^+\to M^+ +$ missing energy ($M=π, K$) where the missing energy stems from a resonance. These channels have not been explored experimentally so far, despite the relatively clean environment potentially provided by near $Λ_c^+\overlineΛ_c^-$ threshold production at $e^+e^-$ colliders. Performing state-of-the-art Monte Carlo simulations for the proposed Super Tau-Charm Facility (STCF), we evaluate the signal efficiencies and derive projected model-independent sensitivities under the assumption of negligible background. We further interpret these sensitivities within two theoretical frameworks: a sterile-neutrino-extended low-energy effective field theory ($ν$LEFT) and R-parity-violating (RPV) supersymmetry. With an integrated luminosity of 1 ab$^{-1}$, STCF can probe new-physics scales of several TeV in the $ν$LEFT description and constrain the RPV model parameter $λ''_{212}/m^2_{\tilde{q}}$ down to about $0.1~\mathrm{TeV}^{-2}$. Our results demonstrate that STCF provides a highly competitive opportunity for probing BNV interactions in rare charm-baryon decays.
We propose a radiative Dirac neutrino mass model stabilized by a non-invertible fusion rule originating from a $Z_3 \times Z_3'$ gauging. The imposed symmetry forbids tree-level Yukawa couplings and ensures that neutrino masses are generated only at the one-loop level through the exchange of exotic fermions and inert scalars. This minimal framework simultaneously accommodates neutrino masses and mixings consistent with current oscillation data, while providing a viable dark matter candidate. We analyze lepton flavor violating processes and lepton anomalous magnetic moments, finding that all contributions remain well below present experimental bounds. In the dark matter sector, the bosonic singlet emerges as a promising candidate with relic density compatible with cosmological observations, whereas the fermionic option is strongly disfavored due to suppressed annihilation cross sections. Our study demonstrates that non-invertible fusion rules can serve as a powerful organizing principle for constructing minimal and phenomenologically consistent extensions of the Standard Model, linking neutrino physics and dark matter within a unified radiative framework.
We present an updated analysis of the gigaelectronvolt (GeV) gamma-ray emission from the shell-type supernova remnant (SNR) RX J0852.0-4622 (Vela Jr) using 15 yr of Fermi Large Area Telescope (Fermi-LAT) data. We quantitatively model the GeV morphology and find that it is best described by the masked H.E.S.S. shell template, indicating that the embedded pulsar wind nebula (PWN) contributes little to the GeV flux. The 0.1-500 GeV spectrum is well fitted by a hard power law with a photon index of $1.77 \pm 0.03$ and connects smoothly to the teraelectronvolt (TeV) spectrum, confirming previous results with improved precision. We further construct an independent eROSITA shell template and derive the 1-5 keV X-ray spectral energy distribution (SED) of the whole remnant, which provides new constraints on the synchrotron emission. We model the multi-wavelength (MWL) SED with a pure leptonic model and a hybrid lepton-hadron model. While the pure leptonic model reproduces the overall broadband shape, the hybrid model provides a better statistical description of the same dataset, supporting a mixed-origin picture in which the hadronic contribution is mainly relevant in the GeV band and the TeV emission remains predominantly leptonic.
Vector-like leptons are non-chiral, colorless fermions from new physics beyond the Standard Model, appearing in many theoretical extensions. We investigate the prospect for detecting the single production of a singlet vector-like lepton that mixes with the $τ$ lepton at the Large Hadron Collider. The corresponding final states are classified as the three- and four-lepton search channels. The machine learning algorithm XGBoost is employed to enhance signal-background discrimination. Our analysis indicates that, at $\sqrt{s} = 14~\mathrm{TeV}$ with an integrated luminosity of $3000~\mathrm{fb}^{-1}$, the expected $2σ$ exclusion limits in the three- and four-lepton channels can reach vector-like lepton masses up to $620~\mathrm{GeV}$ and $490~\mathrm{GeV}$, respectively. These findings demonstrate that machine learning techniques can substantially improve the sensitivity of collider searches for vector-like leptons.
We review the constraints on baryon inhomogeneities derived from measurements of the deuterium abundance, $D/H$, and apply them to a range of baryogenesis models. In particular, we derive bounds on electroweak baryogenesis as well as on more exotic scenarios. Our results show that, across most of the relevant parameter space, electroweak baryogenesis remains largely unconstrained by current and foreseeable $D/H$ measurements. By contrast, the constraints on alternative scenarios are significantly stronger and can exclude regions of parameter space that would otherwise remain viable.
Based on a global fit to experimental measurements of the pion electromagnetic form factor and parton distribution functions (PDFs), we report a data-driven determination of the unpolarized quark generalized parton distributions (GPDs) for the case of pion in the zero-skewness limit ($ξ= 0$). The form factor is parameterized using a flexible functional form constrained by data and embedded into a GPD framework constructed from collinear PDFs and a profile function encoding transverse dynamics. This approach provides a unified description of the pion's electromagnetic structure and its spatial parton distributions. We present the extracted pion GPDs and their impact-parameter-space interpretations, offering new insights into the internal structure of the lightest QCD bound state and providing essential input for future electron-ion collider studies via the Sullivan process, as well as for the exclusive $π^+$ electroproduction at the 12~GeV Jefferson Lab program, pion-induced exclusive measurements at COMPASS, proposed pion-beam experiments at AMBER, and phenomenological and lattice investigations of the structure of the meson.
We investigate a realistic non-supersymmetric hybrid inflation model incorporating right-handed neutrinos and assess its viability in light of recent cosmological observations. At tree level, the inflaton potential yields a blue-tilted scalar spectrum, which is disfavored by current data from Planck and ACT that instead support a red tilt. We show that including one-loop quantum corrections, arising from generic couplings required for reheating, significantly modifies the potential, flattening it at large field values. This leads to a red-tilted spectral index ($n_s < 1$) and a suppressed tensor-to-scalar ratio $r$, both consistent with observational constraints. To ensure theoretical control, we focus on sub-Planckian field values, where the effective field theory description remains valid. The coupling of the inflaton to right-handed neutrinos naturally facilitates efficient reheating and enables the generation of the baryon asymmetry via non-thermal leptogenesis. We further explore the model's parameter space using a multi-output random forest classifier, achieving prediction accuracies in the range of $87.5\%$ to $98.9\%$. Our analysis shows that approximately $15\%$ of the parameter space satisfies at least one current experimental constraint, underscoring the essential role of quantum corrections in reconciling particle physics models with precision cosmology, and highlighting the effectiveness of machine learning techniques in probing complex theoretical frameworks.
We investigate the nonlinear response of flow harmonics $v_2,v_4$ to initial-state eccentricities $ε_2,ε_4$ within the Gubser-flow framework. By extending the perturbative solutions of Gubser flow, we derive analytic nonlinear response relations connecting the eccentricities $ε_2,ε_4$ to the flow harmonics $v_2,v_4$. Our results reproduce the well-known result $v_4/v_2^2 \to 1/2$ in large transverse momentum $p_T$ limit. Furthermore, we study the effects of a mismatch between the participant and reaction planes. We find that the conventional nonlinear response coefficients acquire an additional factor determined by the participant-plane angles, which is often approximated as statistical noise driven by event-by-event fluctuations. This factor can modify both the strength but even the sign of the effective nonlinear response coefficient, making it sensitive to the initial configuration of the colliding nuclei. Our study provides new analytical insight into the origin of collective phenomena in relativistic heavy-ion collisions.
In the fermionic liquids, the Fermi surface is topologically stable,\cite{Volovik2003} which is at the origin of the applicability of the Landau theory of Fermi liquid (LFL). The LFL exists under special condition, when the Green's function has a pole with nonzero residue $Z$. Otherwise one has non-Landau Fermi liquid (NLFL), such as Luttinger liquid, which is described by the same topological invariant. It appears that in general this topological invariant is the property of the fermionic particle, i.e. the particle charge (or the electric charge of electron) is equivalent to the topological charge of the fermion. The conservation of the fermionic charge is equivalent to the conservation of the topological charge. We consider the application of this topological charge to the Landau theory of Fermi liquids. We also consider the application to non-Fermi liquids and crystalline insulators in relation to the Luttinger theorem.
We investigate how the transition density \(ρ_{tr}\) affects hybrid constructions of the neutron-star equation of state (EoS) in which a nucleonic description at low densities is matched to a model-agnostic high-density extension based on a speed-of-sound parametrization. Using four representative nucleonic models--Taylor expansion, \(\frac{n}{3}\) expansion, Skyrme, and relativistic mean-field--built from identical nuclear matter parameters, we isolate the impact of the low-density EoS and the transition density on neutron star observables. We find that, within the present smooth-matching prescription, neutron star properties such as radii and tidal deformabilities retain significant sensitivity to the choice of low-density EoS for commonly adopted transition densities around \(ρ_{tr} \approx 2ρ_0\), even when the same high-density parametrization is employed. This residual dependence arises from differences in the matching conditions at \(ρ_{tr}\), which propagate into the high-density extension, so different low-density inputs lead to different effective high-density EoSs. These findings are robust across two distinct speed-of-sound parametrizations. Quantitatively, the model spread in radius and tidal deformability at $1.4\,M_\odot$ exceeds the current observational uncertainty by factors of $\sim 1.8$ and $\sim 1.4$ at $ρ_{\mathrm{tr}} \approx 2ρ_0$, whereas these factors reduce to $\sim 1.05$ and $\sim 0.4$ at $ρ_{\mathrm{tr}} = ρ_0$. Lowering the transition density, therefore, systematically diminishes the spread among models and leads to more consistent predictions. Our results demonstrate that the widely used choice \(ρ_{tr} \approx 2ρ_0\) does not guarantee model independence in hybrid EoS constructions, and should be treated as an explicit source of systematic uncertainty when inferring dense matter properties from neutron star observations.
We investigate resonant leptogenesis in a two-triplet Type-II seesaw framework and demonstrate a coherent and predictive connection between neutrino mass generation, baryogenesis, and charge lepton flavor violation (LFV). In the presence of quasi-degenerate scalar triplets, self-energy effects induce a resonant enhancement of the CP asymmetry, enabling successful baryogenesis at the TeV scale. We construct Yukawa couplings consistent with neutrino oscillation data and perform a comprehensive numerical analysis by solving the Boltzmann equations across a wide parameter space. We find that viable solutions arise only within a restricted region characterized by near-resonant mass splittings and moderate-to-strong washout. In this regime, successful leptogenesis is achieved through resonant enhancement, which compensates for suppressed Yukawa couplings. A key prediction of the framework is that the allowed parameter space dynamically favors small Yukawa couplings, leading to strongly suppressed LFV rates. The near-absence of observable LFV signals therefore emerges as a direct consequence of the dynamics responsible for baryogenesis. Our results highlight a distinctive feature of the two-triplet Type-II scenario: the simultaneous realization of resonant enhancement and LFV suppression within a unified and testable framework.
We propose that the enhanced Higgs quartic coupling required by radiatively broken electroweak symmetry (RBEWS) emerges naturally from SO(10) grand unification. Our previous analysis demonstrated that a coupling enhancement factor $k = λ_{\rm enhanced}/λ_{\rm SM}$ leads to absolute vacuum stability with a UV Landau pole near the GUT scale for $k \gtrsim 1.03$. The RBEWS prediction $e_{125} = 7.2$ of Steele and Wang, when properly translated from the Coleman-Weinberg scheme at the electroweak VEV to the $\overline{\rm MS}$ scheme at $M_t$ via scheme conversion and scale-dependent ratio evolution, yields $k(M_t) \approx 6.0$--$6.4$, corresponding to a UV pole at $Λ_{\rm UV} \sim {1.5\text{--}2} \times 10^{16}$~GeV -- remarkably close to the GUT scale $M_{\rm GUT} \sim 2 \times 10^{16}$~GeV. We argue this coincidence is not accidental: the UV pole signals the scale where the Standard Model effective description must be embedded into the full SO(10) structure. We derive threshold corrections from SO(10) scalar sectors containing $\mathbf{10}_H$, $\mathbf{\overline{126}}_H$, and $\mathbf{45}_H$ representations, showing that portal couplings between the light Higgs doublet and heavy GUT scalars can generate enhancement factors of order $k \sim 5$--$10$ at the matching scale. The Coleman-Weinberg mechanism operating within a classically scale-invariant GUT scalar potential provides a dynamical origin for both RBEWS and the hierarchy between $M_{\rm GUT}$ and the electroweak scale.
Recently, the BESIII Collaboration reported the first observation of the decays $χ_{cJ} \to ηηη^\prime$ in order to search for the $1^{-+}$ exotic state $η_1(1855)$. A partial wave analysis of the $ηη^\prime$ invariant mass spectrum shows no significant signal for the $η_1(1855)$. In this work, we, using an effective Lagrangian approach, investigate the processes $χ_{cJ} \to ηηη^\prime$ via the box and triangle loops involving charmed mesons and the scalar meson $f_0(1500)$. Our calculations reproduce well the experimental branching fractions of $χ_{cJ} \to ηηη^\prime$. Furthermore, we present the predictions of the relevant invariant mass spectra of $ηη^\prime$ and $ηη$ produced in the $χ_{c1}$ decay, which seem overall consistent with the BESIII measurements. In the present model, the decay $χ_{c1} \to ηηη^\prime$ is dominated by the triangle and box loop contributions. The consistency between our theoretical results and the BESIII measurements sheds light on the underlying decay mechanism of the $χ_{cJ}$ decaying into light mesons and might be helpful to understand the absence of the $η_1(1855)$ signal in the decay channels $χ_{cJ} \to ηηη^\prime$.
We study the lifetimes and inclusive semileptonic decay widths of doubly heavy baryons within the framework of heavy quark expansion. Our analysis includes next-to-leading-order corrections to the dimension-3, -5, and -6 operators, together with the leading dimension-7 contributions, while the nonperturbative matrix elements are evaluated in a bag model with translationally improved baryon wave functions. We obtain $( τ_{Ξ_{cc}^{++}} , τ_{Ξ_{cc}^{+}} , τ_{Ω_{cc}^{+}} ) = ( 2.67 \pm 0.94,\, 0.47 \pm 0.08,\, 1.79 \pm 0.62 ) \times 10^{-13}\,{\rm s}$ and $( τ_{Ξ_{bb}^{0}} , τ_{Ξ_{bb}^{-}} , τ_{Ω_{bb}^{-}} ) = ( 0.75 \pm 0.11,\, 0.92 \pm 0.15,\, 0.93 \pm 0.15 ) \times 10^{-12}\,{\rm s}$, where the uncertainties here arise from the heavy quark pole masses and the hadronic scale adopted in the quark model. Hence, the lifetime hierarchy patterns are $τ(Ξ_{cc}^{++})>τ(Ω_{cc}^+)>τ(Ξ_{cc}^+)$ and $τ(Ω_{bb}^{-})\simτ(Ξ_{bb}^-)>τ(Ξ_{bb}^0)$ for doubly charmed and bottom baryons, respectively. The $W$-exchange contribution plays a crucial role in generating the large lifetime splitting in the doubly charmed sector and remains phenomenologically important for doubly bottom baryons. In addition to the total lifetimes, we calculate the separate nonleptonic and semileptonic contributions, which allow us to trace the pattern of spectator effects in each baryon channel. We also evaluate the inclusive semileptonic decay widths and the decay width asymmetries, which provide complementary probes of the underlying decay mechanisms.
We present a self-consistent framework for heavy-quark transport in the quark-gluon plasma across the QCD crossover region. By synthesizing perturbative and non-perturbative interactions into a unified interaction kernel, we circumvent the traditional reliance on arbitrary soft-hard momentum separation scales. The interaction is governed by an in-medium effective potential, incorporating short-range Yukawa screening and long-range confining string contributions, both rigorously constrained by the latest lattice QCD data. Our results reveal that the non-perturbative string tension is indispensable for capturing the extreme opacity of the medium near the critical temperature $T_c$. Specifically, our model predicts a spatial diffusion coefficient of $2πT D_s \approx 0.5 \sim 1.7$, demonstrating a striking quantitative agreement with the recent lattice QCD extractions. Ultimately, our results provide a robust dynamical interpretation of the strong heavy-quark coupling near the QCD crossover and offer a unified framework for describing heavy-flavor transport in hot and dense QCD matter.
Monitoring binomial proportions across multiple independent streams is a critical challenge in Statistical Process Control (SPC), with applications from manufacturing to cybersecurity. While EWMA charts offer sensitivity to small shifts, existing implementations rely on asymptotic variance approximations that fail during early-phase monitoring. We introduce a Cumulative Standardized Binomial EWMA (CSB-EWMA) chart that overcomes this limitation by deriving the exact time-varying variance of the EWMA statistic for binary multiple-stream data, enabling adaptive control limits that ensure statistical rigor from the first sample. Through extensive simulations, we identify optimal smoothing (λ) and limit (L) parameters to achieve target in-control average run length (ARL0) of 370 and 500. The CSB-EWMA chart demonstrates rapid shift detection across both ARL0 targets, with out-of-control average run length (ARL1) dropping to 3-7 samples for moderate shifts (δ=0.2), and exhibits exceptional robustness across different data distributions, with low ARL1 Coefficients of Variation (CV < 0.10 for small shifts) for both ARL0 = 370 and 500. This work provides practitioners with a distribution-free, sensitive, and theoretically sound tool for early change detection in binomial multiple-stream processes.
This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is still poorly understood from a theoretical standpoint. The t-SNE algorithm produces visualizations by minimizing the Kullback-Leibler divergence between similarity matrices representing the high dimensional data and its low dimensional representation. We prove that as the number of data points $n \to \infty$, after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent as the number of data points $n \to \infty$ and the similarity graph remains sparse with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density function in the visualization space. These two terms represent the continuum limits of the attraction and repulsion forces in the t-SNE algorithm. Due to the lack of convexity in the continuum variational problem, the question of well-posedeness is only partially resolved. We show that when both dimensions are $1$, the problem admits a unique smooth minimizer, along with an infinite number of discontinuous minimizers (interpreted in a relaxed sense). This aligns well with the empirically observed ability of t-SNE to separate data in seemingly arbitrary ways in the visualization. The energy is also very closely related to the famously ill-posed Perona-Malik equation, which is used for denoising and simplifying images. We present numerical results validating the continuum limit, provide some preliminary results about the delicate nature of the limiting energetic problem in higher dimensions, and highlight several problems for future work.
Modern data analyses frequently encounter settings where samples of variables are contaminated by measurement error. Ignoring measurement noise can substantially degrade statistical inference, while existing correction techniques are often computationally costly and inefficient. Recent advances in kernel methods, particularly those based on Maximum Mean Discrepancy (MMD), have enabled flexible, distribution-free inference, yet typically assume precise data and overlook contamination by measurement error. In this work, we introduce a novel framework for inference with samples corrupted by potentially heteroscedastic noise from a known distribution. Central to our approach is the convolutional MMD (convMMD), which compares distributions after noise convolution and retains metric validity under standard kernel conditions. We establish finite-sample deviation bounds that are unaffected by measurement error and prove an equivalence between testing under noise and kernel smoothing. Leveraging these insights, we introduce a convMMD-based estimator for inference with noisy, heteroscedastic observations. We establish its consistency and asymptotic normality, and provide an efficient implementation using stochastic gradient descent. We demonstrate the practical effectiveness of our approach through simulations and applications in astronomy and social sciences.
We study offline-online reinforcement learning in linear mixture Markov decision processes (MDPs) under environment shift. In the offline phase, data are collected by an unknown behavior policy and may come from a mismatched environment, while in the online phase the learner interacts with the target environment. We propose an algorithm that adaptively leverages offline data. When the offline data are informative, either due to sufficient coverage or small environment shift, the algorithm provably improves over purely online learning. When the offline data are uninformative, it safely ignores them and matches the online-only performance. We establish regret upper bounds that explicitly characterize when offline data are beneficial, together with nearly matching lower bounds. Numerical experiments further corroborate our theoretical findings.
We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transformers with bidirectional attention and permutation-symmetric input token configurations by deriving recurrence relations for activation statistics and APJNs across layers. Our theory predicts how attention modifies the asymptotic behavior of the APJN at large depth and matches APJNs measured in deep vision transformers. The criticality picture known from residual networks carries over to transformers: the pre-LayerNorm architecture exhibits power-law APJN growth, whereas transformers with LayerNorm replaced by elementwise $\tanh$-like nonlinearities have stretched-exponential APJN growth, indicating that the latter are subcritical. Applied to Dynamic Tanh (DyT) and Dynamic erf (Derf) transformers, the theory explains why these architectures can be more sensitive to initialization and optimization choices and require careful tuning for stable training.
Deep learning underpins a wide range of applications in MRI, including reconstruction, artifact removal, and segmentation. However, progress has been driven largely by public datasets focused on brain and knee imaging, shaping how models are trained and evaluated. As a result, careful studies of the reliability of these models across diverse anatomical settings remain limited. In this work, we introduce MosaicMRI, a large and diverse collection of fully sampled raw musculoskeletal (MSK) MR measurements designed for training and evaluating machine-learning-based methods. MosaicMRI is the largest open-source raw MSK MRI dataset to date, comprising 2,671 volumes and 80,156 slices. The dataset offers substantial diversity in volume orientation (e.g., axial, sagittal), imaging contrasts (e.g., PD, T1, T2), anatomies (e.g., spine, knee, hip, ankle, and others), and numbers of acquisition coils. Using VarNet as a baseline for accelerated reconstruction task, we perform a comprehensive set of experiments to study scaling behavior with respect to both model capacity and dataset size. Interestingly, models trained on the combined anatomies significantly outperform anatomy-specific models in low-sample regimes, highlighting the benefits of anatomical diversity and the presence of exploitable cross-anatomical correlations. We further evaluate robustness and cross-anatomy generalization by training models on one anatomy (e.g., spine) and testing them on another (e.g., knee). Notably, we identify groups of body parts (e.g., foot and elbow) that generalize well with each other, and highlight that performance under domain shifts depends on both training set size, anatomy, and protocol-specific factors.
We study the problem of identifying change points in high-dimensional generalized linear models, and propose an approach based on sample-weighted empirical risk minimization. Our method, Weighted ERM, encodes priors on the change points via weights assigned to each sample, to obtain weighted versions of standard estimators such as M-estimators and maximum-likelihood estimators. Under mild assumptions on the data, we obtain a precise asymptotic characterization of the performance of our method for general Gaussian designs, in the high-dimensional limit where the number of samples and covariate dimension grow proportionally. We show how this characterization can be used to efficiently construct a posterior distribution over change points. Numerical experiments on both simulated and real data illustrate the efficacy of Weighted ERM compared to existing approaches, demonstrating that sample weights constructed with weakly informative priors can yield accurate change point estimators. Our method is implemented as an open-source package, weightederm, available in Python and R.
We consider the problem of clustering nested or hierarchical data, where observations are grouped and there are both group-level and observation-level variables. In our motivating OneK1K dataset, observations consist of single-cell RNA-sequencing (scRNA-seq) data from 982 individuals (groups), totaling 1.27 million cells (observations), along with individual-specific genotype data. This type of data would enable the identification of cell types and the investigation of how genetic variations among individuals influence differences in cell-type profiles. Our goal, therefore, is to jointly cluster cells and individuals to capture the heterogeneity across both levels using cell-specific gene expressions as well as individual-specific genotypes. However, existing grouped clustering methods do not incorporate group-level variables, thereby limiting their ability to capture the heterogeneity of genotypes in our motivating application. To address this, we propose the Nested Atoms Model (NAM), a new Bayesian nonparametric approach that enables the desired two-layered clustering, accounting for both group-level and observation-level variables. To scale NAM for high-dimensional data, we develop a fast variational Bayesian inference algorithm. Simulations show that NAM outperforms existing methods that ignore group-level variables. Applied to the OneK1K dataset, NAM identifies clusters of genetically similar individuals with homogeneous cell-type profiles. The resulting cell clusters align with known immune cell types based on differential gene expression, underscoring the ability of NAM to capture nested heterogeneity and provide biologically meaningful insights.
Measurement-based quantum computation (MBQC) is a framework for quantum information processing in which a computational task is carried out through one-qubit measurements on a highly entangled resource state. Due to the indeterminacy of the outcomes of a quantum measurement, the random outcomes of these operations, if not corrected, yield a variational quantum channel family. Traditionally, this randomness is corrected through classical processing in order to ensure deterministic unitary computations. Recently, variational measurement-based quantum computation (VMBQC) has been introduced to exploit this measurement-induced randomness to gain an advantage in generative modeling. A limitation of this approach is that the corresponding channel model has twice as many parameters compared to the unitary model, scaling as $N \times D$, where $N$ is the number of logical qubits (width) and $D$ is the depth of the VMBQC model. This can often make optimization more difficult and may lead to poorly trainable models. In this paper, we present a restricted VMBQC model that extends the unitary setting to a channel-based one using only a single additional trainable parameter. We show, both numerically and algebraically, that this minimal extension is sufficient to generate probability distributions that cannot be learned by the corresponding unitary model.
In optimization problems, some variable subsets may have a joint non-linear or non-monotonical influence on the function value. Therefore, knowledge of variable dependencies may be crucial for effective optimization, and many state-of-the-art optimizers leverage it to improve performance. However, some real-world problem instances may be the subject of noise of various origins. In such a case, variable dependencies relevant to optimization may be hard or impossible to tell using dependency checks sufficient for problems without noise, making highly effective operators, e.g., Partition Crossover (PX), useless. Therefore, we use Statistical Linkage Learning (SLL) to decompose problems with noise and propose a new SLL-dedicated mask construction algorithm. We prove that if the quality of the SLL-based decomposition is sufficiently high, the proposed clustering algorithm yields masks equivalent to PX masks for the noise-free instances. The experiments show that the optimizer using the proposed mechanisms remains equally effective despite the noise level and outperforms state-of-the-art optimizers for the problems with high noise.
Artificial intelligence (AI) is moving increasingly beyond prediction to support decisions in complex, uncertain, and dynamic environments. This shift creates a natural intersection with operations research and management sciences (OR/MS), which have long offered conceptual and methodological foundations for sequential decision-making under uncertainty. At the same time, recent advances in deep learning, including feedforward neural networks, LSTMs, transformers, and deep reinforcement learning, have expanded the scope of data-driven modeling and opened new possibilities for large-scale decision systems. This tutorial presents an OR/MS-centered perspective on deep learning for sequential decision-making under uncertainty. Its central premise is that deep learning is valuable not as a replacement for optimization, but as a complement to it. Deep learning brings adaptability and scalable approximation, whereas OR/MS provides the structural rigor needed to represent constraints, recourse, and uncertainty. The tutorial reviews key decision-making foundations, connects them to the major neural architectures in modern AI, and discusses leading approaches to integrating learning and optimization. It also highlights emerging impact in domains such as supply chains, healthcare and epidemic response, agriculture, energy, and autonomous operations. More broadly, it frames these developments as part of a wider transition from predictive AI toward decision-capable AI and highlights the role of OR/MS in shaping the next generation of integrated learning--optimization systems.
As generative models enable rapid creation of high-fidelity images, societal concerns about misinformation and authenticity have intensified. A promising remedy is multi-bit image watermarking, which embeds a multi-bit message into an image so that a verifier can later detect whether the image is generated by someone and further identify the source by decoding the embedded message. Existing approaches often fall short in capacity, resilience to common image distortions, and theoretical justification. To address these limitations, we propose ADD (Add, Dot, Decode), a multi-bit image watermarking method with two stages: learning a watermark to be linearly combined with the multi-bit message and added to the image, and decoding through inner products between the watermarked image and the learned watermark. On the standard MS-COCO benchmark, we demonstrate that for the challenging task of 48-bit watermarking, ADD achieves 100\% decoding accuracy, with performance dropping by at most 2\% under a wide range of image distortions, substantially smaller than the 14\% average drop of state-of-the-art methods. In addition, ADD achieves substantial computational gains, with 2-fold faster embedding and 7.4-fold faster decoding than the fastest existing method. We further provide a theoretical analysis explaining why the learned watermark and the corresponding decoding rule are effective.
Diffusion-based models on continuous spaces have seen substantial recent progress through the mathematical framework of gradient flows, leveraging the Wasserstein-2 (${W}_2$) metric via the Jordan-Kinderlehrer-Otto (JKO) scheme. Despite the increasing popularity of diffusion models on discrete spaces using continuous-time Markov chains, a parallel theoretical framework based on gradient flows has remained elusive due to intrinsic challenges in translating the ${W}_2$ distance directly into these settings. In this work, we propose the first computational approach addressing these challenges, leveraging an appropriate metric $W_K$ on the simplex of probability distributions, which enables us to interpret widely used discrete diffusion paths, such as the discrete heat equation, as gradient flows of specific free-energy functionals. Through this theoretical insight, we introduce a novel methodology for learning diffusion dynamics over discrete spaces, which recovers the underlying functional directly by leveraging first-order optimality conditions for the JKO scheme. The resulting method optimizes a simple quadratic loss, trains extremely fast, does not require individual sample trajectories, and only needs a numerical preprocessing computing $W_K$-geodesics. We validate our method through extensive numerical experiments on synthetic data, showing that we can recover the underlying functional for a variety of graph classes.
Conformal selection (CS) uses calibration data to identify test inputs whose unobserved outcomes are likely to satisfy a pre-specified minimal quality requirement, while controlling the false discovery rate (FDR). Existing methods fix the target FDR level before observing data, which prevents the user from adapting the balance between number of selected test inputs and FDR to downstream needs and constraints based on the available data. For example, in genomics or neuroimaging, researchers often inspect the distribution of test statistics, and decide how aggressively to pursue candidates based on observed evidence strength and available follow-up resources. To address this limitation, we introduce {post-hoc CS} (PH-CS), which generates a path of candidate selection sets, each paired with a data-driven false discovery proportion (FDP) estimate. PH-CS lets the user select any operating point on this path by maximizing a user-specified utility, arbitrarily balancing selection size and FDR. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS is proved to provide a finite-sample post-hoc reliability guarantee whereby the ratio between estimated FDP level and true FDP is, on average, upper bounded by $1$, so that the average estimated FDP is, to first order, a valid upper bound on the true FDR. PH-CS is extended to control quality defined in terms of a general risk. Experiments on synthetic and real-world datasets demonstrate that, unlike CS, PH-CS can consistently satisfy user-imposed utility constraints while producing reliable FDP estimates and maintaining competitive FDR control.
Feature importance methods using unrestricted permutations are flawed due to extrapolation errors; such errors appear in all non-trivial variable importance approaches. We propose three new approaches: conditional model reliance and Knockoffs with Gaussian transformation, and restricted ALE plot designs. Theoretical and numerical results show our strategies reduce/eliminate extrapolation.
We analyze two widely used local attribution methods, Local Shapley Values and LIME, which aim to quantify the contribution of a feature value $x_i$ to a specific prediction $f(x_1, \dots, x_p)$. Despite their widespread use, we identify fundamental limitations in their ability to reliably detect locally important features, even under ideal conditions with exact computations and independent features. We argue that a sound local attribution method should not assign importance to features that neither influence the model output (e.g., features with zero coefficients in a linear model) nor exhibit statistical dependence with functionality-relevant features. We demonstrate that both Local SV and LIME violate this fundamental principle. To address this, we propose R-LOCO (Regional Leave Out COvariates), which bridges the gap between local and global explanations and provides more accurate attributions. R-LOCO segments the input space into regions with similar feature importance characteristics. It then applies global attribution methods within these regions, deriving an instance's feature contributions from its regional membership. This approach delivers more faithful local attributions while avoiding local explanation instability and preserving instance-specific detail often lost in global methods.
Changes in input distribution can induce shifts in the average predictions of machine learning models. Such prediction shifts may impact downstream business outcomes (e.g. a bank's loan approval rate), so understanding their causes can be crucial. We propose \ours{}: a Shapley value method for attributing prediction shifts to changes in the conditional probabilities of interpretable subgroups of data, where these subgroups are defined by the structure of decision trees. We initially apply this method to single decision trees, providing exact explanations based on conditional probability changes at split nodes. Next, we extend it to tree ensembles by selecting the most explanatory tree and accounting for residual effects. Finally, we propose a model-agnostic variant using surrogate trees grown with a novel objective function, allowing application to models like neural networks. While exact computation can be intensive, approximation techniques enable practical application. We show that \ours{} provides simple, faithful, and near-complete explanations of prediction shifts across model classes, aiding model monitoring in dynamic environments.
Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.
We develop parameter-free algorithms for unconstrained online learning with regret guarantees that scale with the gradient variation $V_T(u) = \sum_{t=2}^T \|\nabla f_t(u)-\nabla f_{t-1}(u)\|^2$. For $L$-smooth convex loss, we provide fully-adaptive algorithms achieving regret of order $\widetilde{O}(\|u\|\sqrt{V_T(u)} + L\|u\|^2+G^4)$ without requiring prior knowledge of comparator norm $\|u\|$, Lipschitz constant $G$, or smoothness $L$. The update in each round can be computed efficiently via a closed-form expression. Our results extend to dynamic regret and find immediate implications to the stochastically-extended adversarial (SEA) model, which significantly improves upon the previous best-known result [Wang et al., 2025].
This paper reorganizes the current manuscript around the DPO versus DDO-RM preference-optimization project and focuses on two parts: the algorithmic view and the preliminary held-out benchmark. The benchmark asks a narrow question: even in a minimal pairwise chosen-versus-rejected setting, can a reward-guided decision-distribution update outperform a direct pairwise objective? We compare Direct Preference Optimization (DPO) against DDO-RM on EleutherAI/pythia-410m using HuggingFaceH4/ultrafeedback\_binarized, evaluate on the held-out test\_prefs split, and report results for seeds 42, 13, and 3407. Algorithmically, DDO-RM treats each prompt as a finite decision problem over candidate responses. Instead of optimizing only a binary chosen-rejected relation, it forms a policy distribution over candidates, centers reward-model scores under that distribution, and distills a reward-guided target distribution back into the policy. In the current public benchmark, DDO-RM improves mean pair accuracy from 0.5238 to 0.5602, AUC from 0.5315 to 0.5382, and mean margin from 0.1377 to 0.5353 relative to DPO. These are encouraging but still preliminary results: the study covers one model family, one dataset, one held-out evaluation split, and three seeds.
K-means clustering is a workhorse of unsupervised learning, but it is notoriously brittle to outliers, distribution shifts, and limited sample sizes. Viewing k-means as Lloyd--Max quantization of the empirical distribution, we develop a distributionally robust variant that protects against such pathologies. We posit that the unknown population distribution lies within a Wasserstein-2 ball around the empirical distribution. In this setting, one seeks cluster centers that minimize the worst-case expected squared distance over this ambiguity set, leading to a minimax formulation. A tractable dual yields a soft-clustering scheme that replaces hard assignments with smoothly weighted ones. We propose an efficient block coordinate descent algorithm with provable monotonic decrease and local linear convergence. Experiments on standard benchmarks and large-scale synthetic data demonstrate substantial gains in outlier detection and robustness to noise.
Generalized linear mixed-effects models (GLMMs) are widely used to analyze grouped and hierarchical data. In a GLMM, each response is assumed to follow an exponential-family distribution where the natural parameter is given by a linear function of observed covariates and a latent group-specific random effect. Since exact marginalization over the random effects is typically intractable, model parameters are estimated by maximizing an approximate marginal likelihood. In this paper, we replace the linear function with neural networks. The result is a more flexible model, the neural generalized mixed-effects model (NGMM), which captures complex relationships between covariates and responses. To fit NGMM to data, we introduce an efficient optimization procedure that maximizes the approximate marginal likelihood and is differentiable with respect to network parameters. We show that the approximation error of our objective decays at a Gaussian-tail rate in a user-chosen parameter. On synthetic data, NGMM improves over GLMMs when covariate-response relationships are nonlinear, and on real-world datasets it outperforms prior methods. Finally, we analyze a large dataset of student proficiency to demonstrate how NGMM can be extended to more complex latent-variable models.
Data leakage remains a recurrent source of optimistic bias in biomedical machine learning studies. Standard row-wise cross-validation and globally estimated preprocessing steps are often inappropriate for data with repeated measurements, study-level heterogeneity, batch effects, or temporal dependencies. This paper describes bioLeak, an R package for constructing leakage-aware resampling workflows and for auditing fitted models for common leakage mechanisms. The package provides leakage-aware split construction, train-fold-only preprocessing, cross-validated model fitting, nested hyperparameter tuning, post hoc leakage audits, and HTML reporting. The implementation supports binary classification, multiclass classification, regression, and survival analysis, with task-specific metrics and S4 containers for splits, fits, audits, and inflation summaries. The simulation artifacts show how apparent performance changes under controlled leakage mechanisms, and the case study illustrates how guarded and leaky pipelines can yield materially different conclusions on multi-study transcriptomic data. The emphasis throughout is on software design, reproducible workflows, and interpretation of diagnostic output.
We present XANE(3), a physics-based E(3)-equivariant graph neural network for predicting X-ray absorption near-edge structure (XANES) spectra directly from atomic structures. The model combines tensor-product message passing with spherical harmonic edge features, absorber-query attention pooling, custom equivariant layer normalization, adaptive gated residual connections, and a spectral readout based on a multi-scale Gaussian basis with an optional sigmoidal background term. To improve line-shape fidelity, training is performed with a composite objective that includes pointwise spectral reconstruction together with first- and second-derivative matching terms. We evaluate the model on a dataset of 5,941 FDMNES simulations of iron oxide surface facets and obtain a spectrum mean squared error of $1.0 \times 10^{-3}$ on the test set. The model accurately reproduces the main edge structure, relative peak intensities, pre-edge features, and post-edge oscillations. Ablation studies show that the derivative-aware objective, custom equivariant normalization, absorber-conditioned attention pooling, adaptive gated residual mixing, and global background term each improve performance. Interestingly, a capacity-matched scalar-only variant achieves comparable pointwise reconstruction error but reduced derivative-level fidelity, indicating that explicit tensorial channels are not strictly required for low intensity error on this dataset, although they remain beneficial for capturing finer spectral structure. These results establish XANE(3) as an accurate and efficient surrogate for XANES simulation and offer a promising route toward accelerated spectral prediction, ML-assisted spectroscopy, and data-driven materials discovery.
Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations. Project page, code, and dataset available at https://maveryn.github.io/vlm-fix/.
Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding. SOLARIS proactively precomputes user-item interaction embeddings by predicting which user-item pairs are likely to appear in future requests, and asynchronously generating their foundation model representations ahead of time. This approach decouples the costly foundation model inference from the latency-critical serving path, enabling real-time knowledge transfer from models previously considered too expensive for online use. Deployed across Meta's advertising system serving billions of daily requests, SOLARIS achieves 0.67% revenue-driving top-line metrics gain, demonstrating its effectiveness at scale.
We present parameter-interpolated dynamic mode decomposition (piDMD), a parametric reduced-order modeling framework that embeds known parameter-affine structure directly into the DMD regression step. Unlike existing parametric DMD methods which interpolate modes, eigenvalues, or reduced operators and can be fragile with sparse training data or multi-dimensional parameter spaces, piDMD learns a single parameter-affine Koopman surrogate reduced order model (ROM) across multiple training parameter samples and predicts at unseen parameter values without retraining. We validate piDMD on fluid flow past a cylinder, electron beam oscillations in transverse magnetic fields, and virtual cathode oscillations -- the latter two being simulated using an electromagnetic particle-in-cell (EMPIC) method. Across all benchmarks, piDMD achieves accurate long-horizon predictions and improved robustness over state-of-the-art interpolation-based parametric DMD baselines, with less training samples and with multi-dimensional parameter spaces.
We introduce compute-grounded reasoning (CGR), a design paradigm for spatial-aware research agents in which every answerable sub-problem is resolved by deterministic computation before a language model is asked to generate. Spatial Atlas instantiates CGR as a single Agent-to-Agent (A2A) server that handles two challenging benchmarks: FieldWorkArena, a multimodal spatial question-answering benchmark spanning factory, warehouse, and retail environments, and MLE-Bench, a suite of 75 Kaggle machine learning competitions requiring end-to-end ML engineering. A structured spatial scene graph engine extracts entities and relations from vision descriptions, computes distances and safety violations deterministically, then feeds computed facts to large language models, thereby avoiding hallucinated spatial reasoning. Entropy-guided action selection maximizes information gain per step and routes queries across a three-tier frontier model stack (OpenAI + Anthropic). A self-healing ML pipeline with strategy-aware code generation, a score-driven iterative refinement loop, and a prompt-based leak audit registry round out the system. We evaluate across both benchmarks and show that CGR yields competitive accuracy while maintaining interpretability through structured intermediate representations and deterministic spatial computations.
Monitoring binomial proportions across multiple independent streams is a critical challenge in Statistical Process Control (SPC), with applications from manufacturing to cybersecurity. While EWMA charts offer sensitivity to small shifts, existing implementations rely on asymptotic variance approximations that fail during early-phase monitoring. We introduce a Cumulative Standardized Binomial EWMA (CSB-EWMA) chart that overcomes this limitation by deriving the exact time-varying variance of the EWMA statistic for binary multiple-stream data, enabling adaptive control limits that ensure statistical rigor from the first sample. Through extensive simulations, we identify optimal smoothing (λ) and limit (L) parameters to achieve target in-control average run length (ARL0) of 370 and 500. The CSB-EWMA chart demonstrates rapid shift detection across both ARL0 targets, with out-of-control average run length (ARL1) dropping to 3-7 samples for moderate shifts (δ=0.2), and exhibits exceptional robustness across different data distributions, with low ARL1 Coefficients of Variation (CV < 0.10 for small shifts) for both ARL0 = 370 and 500. This work provides practitioners with a distribution-free, sensitive, and theoretically sound tool for early change detection in binomial multiple-stream processes.
Designing robust reinforcement learning (RL) agents in the presence of imperfect reward signals remains a core challenge. In practice, agents are often trained with proxy rewards that only approximate the true objective, leaving them vulnerable to reward hacking, where high proxy returns arise from unintended or exploitative behaviors. Recent work formalizes this issue using r-correlation between proxy and true rewards, but existing methods like occupancy-regularized policy optimization (ORPO) optimize against a fixed proxy and do not provide strong guarantees against broader classes of correlated proxies. In this work, we formulate reward hacking as a robust policy optimization problem over the space of all r-correlated proxy rewards. We derive a tractable max-min formulation, where the agent maximizes performance under the worst-case proxy consistent with the correlation constraint. We further show that when the reward is a linear function of known features, our approach can be adapted to incorporate this prior knowledge, yielding both improved policies and interpretable worst-case rewards. Experiments across several environments show that our algorithms consistently outperform ORPO in worst-case returns, and offer improved robustness and stability across different levels of proxy-true reward correlation. These results show that our approach provides both robustness and transparency in settings where reward design is inherently uncertain. The code is available at https://github.com/ZixuanLiu4869/reward_hacking.
Traditional machine learning depends on high-precision arithmetic and near-ideal hardware assumptions, which is increasingly challenged by variability in aggressively scaled semiconductor devices. Compute-in-memory (CIM) architectures alleviate data-movement bottlenecks and improve energy efficiency yet introduce nonlinear distortions and reliability concerns. We address these issues with a hardware-aware optimization framework based on Hyperdimensional Computing (HDC), systematically compensating for non-ideal similarity computations in CIM. Our approach formulates encoding as an optimization problem, minimizing the Frobenius norm between an ideal kernel and its hardware-constrained counterpart, and employs a joint optimization strategy for end-to-end calibration of hypervector representations. Experimental results demonstrate that our method when applied to QuantHD achieves 84\% accuracy under severe hardware-induced perturbations, a 48\% increase over naive QuantHD under the same conditions. Additionally, our optimization is vital for graph-based HDC reliant on precise variable-binding for interpretable reasoning. Our framework preserves the accuracy of RelHD on the Cora dataset, achieving a 5.4$\times$ accuracy improvement over naive RelHD under nonlinear environments. By preserving HDC's robustness and symbolic properties, our solution enables scalable, energy-efficient intelligent systems capable of classification and reasoning on emerging CIM hardware.
The tumor microenvironment (TME) plays a central role in cancer progression, treatment response, and patient outcomes, yet large-scale, consistent, and quantitative TME characterization from routine hematoxylin and eosin (H&E)-stained histopathology remains scarce. We introduce OpenTME, an open-access dataset of pre-computed TME profiles derived from 3,634 H&E-stained whole-slide images across five cancer types (bladder, breast, colorectal, liver, and lung cancer) from The Cancer Genome Atlas (TCGA). All outputs were generated using Atlas H&E-TME, an AI-powered application built on the Atlas family of pathology foundation models, which performs tissue quality control, tissue segmentation, cell detection and classification, and spatial neighborhood analysis, yielding over 4,500 quantitative readouts per slide at cell-level resolution. OpenTME is available for non-commercial academic research on Hugging Face. We will continue to expand OpenTME over time and anticipate it will serve as a resource for biomarker discovery, spatial biology research, and the development of computational methods for TME analysis.
Robust explanations are increasingly required for user trust in enterprise NLP, yet pre-deployment validation is difficult in the common case of black-box deployment (API-only access) where representation-based explainers are infeasible and existing studies provide limited guidance on whether explanations remain stable under real user noise, especially when organizations migrate from encoder classifiers to decoder LLMs. To close this gap, we propose a unified black-box robustness evaluation framework for token-level explanations based on leave-one-out occlusion, and operationalize explanation robustness with top-token flip rate under realistic perturbations (swap, deletion, shuffling, and back-translation) at multiple severity levels. Using this protocol, we conduct a systematic cross-architecture comparison across three benchmark datasets and six models spanning encoder and decoder families (BERT, RoBERTa, Qwen 7B/14B, Llama 8B/70B; 64,800 cases). We find that decoder LLMs produce substantially more stable explanations than encoder baselines (73% lower flip rates on average), and that stability improves with model scale (44% gain from 7B to 70B). Finally, we relate robustness improvements to inference cost, yielding a practical cost-robustness tradeoff curve that supports model and explanation selection prior to deployment in compliance-sensitive applications.
The analysis of DNA sequences has become critical in numerous fields, from evolutionary biology to understanding gene regulation and disease mechanisms. While deep neural networks can achieve remarkable predictive performance, they typically operate as black boxes. Contrasting these black boxes, axis-aligned decision trees offer a promising direction for interpretable DNA sequence analysis, yet they suffer from a fundamental limitation: considering individual raw features in isolation at each split limits their expressivity, which results in prohibitive tree depths that hinder both interpretability and generalization performance. We address this challenge by introducing DEFT, a novel framework that adaptively generates high-level sequence features during tree construction. DEFT leverages large language models to propose biologically-informed features tailored to the local sequence distributions at each node and to iteratively refine them with a reflection mechanism. Empirically, we demonstrate that DEFT discovers human-interpretable and highly predictive sequence features across a diverse range of genomic tasks.
Block-wise diffusion language models (DLMs) generate multiple tokens in any order, offering a promising alternative to the autoregressive decoding pipeline. However, they still remain bottlenecked by memory-bound attention in long-context scenarios. Naive sparse attention fails on DLMs due to a KV Inflation problem, where different queries select different prefix positions, making the union of accessed KV pages large. To address this, we observe that between consecutive denoising steps, only a small fraction of active tokens exhibit significant hidden-state changes, while the majority of stable tokens remain nearly constant. Based on this insight, we propose LOSA (Locality-aware Sparse Attention), which reuses cached prefix-attention results for stable tokens and applies sparse attention only to active tokens. This substantially shrinks the number of KV indices that must be loaded, yielding both higher speedup and higher accuracy. Across multiple block-wise DLMs and benchmarks, LOSA preserves near-dense accuracy while significantly improving efficiency, achieving up to +9 points in average accuracy at aggressive sparsity levels while maintaining 1.54x lower attention density. It also achieves up to 4.14x attention speedup on RTX A6000 GPUs, demonstrating the effectiveness of the proposed method.
Deep learning models may converge to suboptimal solutions despite strong validation accuracy, masking an optimization failure we term Trajectory Deviation. This is because as training proceeds, models can abandon high generalization states for specific data sub-populations, thus discarding previously learned latent features without triggering classical overfitting signals. To address this problem we introduce VISTA, an online self-distillation framework that enforces consistency along the optimization trajectory. Using a validation-informed Marginal Coverage score, VISTA identifies expert anchors, which are earlier model states that retain specialized competence over distinct data regions. A coverage-weighted ensemble of these anchors is integrated online during training, regularizing the loss landscape and preserving mastered knowledge. When evaluated across multiple benchmarks, VISTA demonstrates improved robustness and generalization over standard training and prior self-distillation methods, while a lightweight implementation reduces storage overhead by 90% without performance loss.
This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is still poorly understood from a theoretical standpoint. The t-SNE algorithm produces visualizations by minimizing the Kullback-Leibler divergence between similarity matrices representing the high dimensional data and its low dimensional representation. We prove that as the number of data points $n \to \infty$, after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent as the number of data points $n \to \infty$ and the similarity graph remains sparse with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density function in the visualization space. These two terms represent the continuum limits of the attraction and repulsion forces in the t-SNE algorithm. Due to the lack of convexity in the continuum variational problem, the question of well-posedeness is only partially resolved. We show that when both dimensions are $1$, the problem admits a unique smooth minimizer, along with an infinite number of discontinuous minimizers (interpreted in a relaxed sense). This aligns well with the empirically observed ability of t-SNE to separate data in seemingly arbitrary ways in the visualization. The energy is also very closely related to the famously ill-posed Perona-Malik equation, which is used for denoising and simplifying images. We present numerical results validating the continuum limit, provide some preliminary results about the delicate nature of the limiting energetic problem in higher dimensions, and highlight several problems for future work.
We resolve a long-standing open question, about the existence of a constant-factor approximation algorithm for the average-case \textsc{Decision Tree} problem with uniform probability distribution over the hypotheses. We answer the question in the affirmative by providing a simple polynomial-time algorithm with approximation ratio of $\frac{2}{1-\sqrt{(e+1)/(2e)}}+ε<11.57$. This improves upon the currently best-known, greedy algorithm which achieves $O(\log n/{\log\log n})$-approximation. The first key ingredient in our analysis is the usage of a decomposition technique known from problems related to \textsc{Hierarchical Clustering} [SODA '17, WALCOM '26], which allows us to decompose the optimal decision tree into a series of objects called separating subfamilies. The second crucial idea is to reduce the subproblem of finding a \textsc{Separating Subfamily} to an instance of the \textsc{Maximum Coverage} problem. To do so, we analyze the properties of cutting cliques into small pieces, which represent pairs of hypotheses to be separated. This allows us to obtain a good approximation for the \textsc{Separating Subfamily} problem, which then enables the design of the approximation algorithm for the original problem.
Predicting the functional impact of single amino acid substitutions (SAVs) is central to understanding genetic disease and engineering therapeutic proteins. While protein language models and structure-based methods have achieved strong performance on this task, they systematically neglect protein dynamics; residue flexibility, correlated motions, and allosteric coupling are well-established determinants of mutational tolerance in structural biology, yet have not been incorporated into supervised variant effect predictors. We present TriFit, a multimodal framework that integrates sequence, structure, and protein dynamics through a four-expert Mixture-of-Experts (MoE) fusion module with trimodal cross-modal contrastive learning. Sequence embeddings are extracted via masked marginal scoring with ESM-2 (650M); structural embeddings from AlphaFold2-predicted C-alpha geometries; and dynamics embeddings from Gaussian Network Model (GNM) B-factors, mode shapes, and residue-residue cross-correlations. The MoE router adaptively weights modality combinations conditioned on the input, enabling protein-specific fusion without fixed modality assumptions. On the ProteinGym substitution benchmark (217 DMS assays, 696k SAVs), TriFit achieves AUROC 0.897 +/- 0.0002, outperforming all supervised baselines including Kermut (0.864) and ProteinNPT (0.844), and the best zero-shot model ESM3 (0.769). Ablation studies confirm that dynamics provides the largest marginal contribution over pairwise modality combinations, and TriFit achieves well-calibrated probabilistic outputs (ECE = 0.044) without post-hoc correction.
Large language models map semantically related prompts to similar internal representations -- a phenomenon interpretable as attractor-like dynamics. We ask whether the identity document of a persistent cognitive agent (its cognitive_core) exhibits analogous attractor-like behavior. We present a controlled experiment on Llama 3.1 8B Instruct, comparing hidden states of an original cognitive_core (Condition A), seven paraphrases (Condition B), and seven structurally matched controls (Condition C). Mean-pooled states at layers 8, 16, and 24 show that paraphrases converge to a tighter cluster than controls (Cohen's d > 1.88, p < 10^{-27}, Bonferroni-corrected). Replication on Gemma 2 9B confirms cross-architecture generalizability. Ablations suggest the effect is primarily semantic rather than structural, and that structural completeness appears necessary to reach the attractor region. An exploratory experiment shows that reading a scientific description of the agent shifts internal state toward the attractor -- closer than a sham preprint -- distinguishing knowing about an identity from operating as that identity. These results provide representational evidence that agent identity documents induce attractor-like geometry in LLM activation space.
In-context learning (ICL) performance depends critically on which demonstrations are placed in the prompt, yet most existing selectors prioritize heuristic notions of relevance or diversity and provide limited insight into the coverage of a demonstration set. We propose Unseen Coverage Selection (UKS), a training-free, subset-level coverage prior motivated by the principle that a good demonstration set should expose the model to latent cluster unrevealed by the currently selected subset. UCS operationalizes this idea by (1) inducing discrete latent clusters from model-consistent embeddings and (2) estimating the number of unrevealed clusters within a candidate subset via a Smoothed Good--Turing estimator from its empirical frequency spectrum. Unlike previous selection methods, UCS is coverage-based and training-free, and can be seamlessly combined with both query-dependent and query-independent selection baselines via a simple regularized objective. Experiments on multiple intent-classification and reasoning benchmarks with frontier Large Language Models show that augmenting strong baselines with UCS consistently improves ICL accuracy by up to 2-6% under the same selection budget, while also yielding insights into task- and model-level latent cluster distributions. Code is available at https://github.com/Raina-Xin/UCS.
Modern large language models generate text autoregressively, producing tokens one at a time. To study the learnability of such systems, Joshi et al. (COLT 2025) introduced a PAC-learning framework for next-token generators, the primitive underlying autoregressive models. In this framework, an unknown next-token generator maps a sequence of tokens to the next token and is iteratively applied for $T$ steps, producing a chain of tokens whose final token constitutes the model's output. The learning task is to learn the input-output mapping induced by this autoregressive process. Depending on the available supervision, training examples may reveal only the final output (End-to-End supervision) or the entire generated chain (Chain-of-Thought supervision). This raises two natural questions: how the sample complexity depends on the generation length $T$, and how much Chain-of-Thought supervision can reduce this dependence. In this work we give a nearly complete answer to both questions by uncovering a taxonomy of how the sample complexity scales with $T$. For End-to-End learning, we show that the landscape is remarkably rich: subject to mild conditions, essentially any growth rate $r(T)$ between constant and linear can arise as the sample complexity, and combined with the linear upper bound of Joshi et al., this yields an essentially complete characterization. In contrast, under Chain-of-Thought supervision we show that the sample complexity is independent of $T$, demonstrating that access to intermediate reasoning steps can eliminate the dependence on the generation length altogether. Our analysis introduces new combinatorial tools, and as corollaries we resolve several open questions posed by Joshi et al. regarding the dependence of learnability on the generation length and the role of Chain-of-Thought supervision.
Bayesian optimization (BO) has for sequential optimization of expensive black-box functions demonstrated practicality and effectiveness in many real-world settings. Meta-Bayesian optimization (meta-BO) focuses on improving the sample efficiency of BO by making use of information from related tasks. Although meta-BO is sample-efficient when task structure transfers, poor alignment between meta-training and test tasks can cause suboptimal queries to be suggested during online optimization. To this end, we propose a simple meta-BO algorithm that utilizes related-task information when determined useful, falling back to lookahead otherwise, within a unified framework. We demonstrate competitiveness of our method with existing approaches on function optimization tasks, while retaining strong performance in low task-relatedness regimes where test tasks share limited structure with the meta-training set.
The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We propose a rigorous loss-driven approach to Bayesian active learning that allows data acquisition to directly target the loss associated with a given decision problem. In particular, we show how any loss can be used to derive a unique objective for optimal data acquisition. Critically, we then show that any loss taking the form of a weighted Bregman divergence permits analytic computation of a central component of its corresponding objective, making the approach applicable in practice. In regression and classification experiments with a range of different losses, we find our approach reduces test losses relative to existing techniques.
We study offline-online reinforcement learning in linear mixture Markov decision processes (MDPs) under environment shift. In the offline phase, data are collected by an unknown behavior policy and may come from a mismatched environment, while in the online phase the learner interacts with the target environment. We propose an algorithm that adaptively leverages offline data. When the offline data are informative, either due to sufficient coverage or small environment shift, the algorithm provably improves over purely online learning. When the offline data are uninformative, it safely ignores them and matches the online-only performance. We establish regret upper bounds that explicitly characterize when offline data are beneficial, together with nearly matching lower bounds. Numerical experiments further corroborate our theoretical findings.
We introduce Graph Concept Bottleneck (GCB) as a new paradigm for self-explainable text-attributed graph learning. GCB maps graphs into a subspace, concept bottleneck, where each concept is a meaningful phrase, and predictions are made based on the activation of these concepts. Unlike existing interpretable graph learning methods that primarily rely on subgraphs as explanations, the concept bottleneck provides a new form of interpretation. To refine the concept space, we apply the information bottleneck principle to focus on the most relevant concepts. This not only yields more concise and faithful explanations but also explicitly guides the model to "think" toward the correct decision. We empirically show that GCB achieves intrinsic interpretability with accuracy on par with black-box Graph Neural Networks. Moreover, it delivers better performance under distribution shifts and data perturbations, showing improved robustness and generalizability, benefitting from concept-guided prediction.
In this paper, we propose the geometric algebra-informed neural radiance fields (GAI-NeRF), a novel framework for wireless channel prediction that leverages geometric algebra attention mechanisms to capture ray-object interactions in complex propagation environments. Our approach incorporates global token representations, drawing inspiration from transformer architectures in language and vision domains, to aggregate learned spatial-electromagnetic features and enhance scene understanding. We identify limitations in conventional static ray tracing modules that hinder model generalization and address this challenge through a new ray tracing architecture. This design enables effective generalization across diverse wireless scenarios while maintaining computational efficiency. Experimental results demonstrate that GAI-NeRF achieves superior performance in channel prediction tasks by combining geometric algebra principles with neural scene representations, offering a promising direction for next-generation wireless communication systems. Moreover, GAI-NeRF greatly outperforms existing methods across multiple wireless scenarios. To ensure comprehensive assessment, we further evaluate our approach against multiple benchmarks using newly collected real-world indoor datasets tailored for single-scene downstream tasks and generalization testing, confirming its robust performance in unseen environments and establishing its high efficacy for wireless channel prediction.
Coherent nonlinear wave dynamics are often strongly shaped by a compact set of physically meaningful descriptors of the initial state. Traditional neural operators typically treat the input-output mapping as a largely black-box high-dimensional regression problem, without explicitly exploiting this structured physical context. Common feature-integration strategies usually rely on direct concatenation or FiLM-style affine modulation in hidden latent spaces. Here we introduce a different paradigm, loosely inspired by the complementary roles of state evolution and physically meaningful observables in quantum mechanics: the wave field is learned through a standard DeepONet state pathway, while compact physical descriptors follow a parallel conditioning pathway and act as residual modulation factors on the state prediction. Based on this idea, we develop a Multi-Head Residual-Gated DeepONet (MH-RG), which combines a pre-branch residual modulator, a branch residual gate, and a trunk residual gate with a low-rank multi-head mechanism to capture multiple complementary conditioned response patterns without prohibitive parameter growth. We evaluate the framework on representative benchmarks including highly nonlinear conservative wave dynamics and dissipative trapped dynamics and further perform detailed mechanistic analyses of the learned multi-head gating behavior. Compared with feature-augmented baselines, MH-RG DeepONet achieves consistently lower error while better preserving phase coherence and the fidelity of physically relevant dynamical quantities.
Epileptic seizure detection from EEG signals remains challenging due to the high dimensionality and nonlinear, potentially stochastic, dynamics of neural activity. In this work, we investigate whether features derived from topological data analysis (TDA) can improve the classification of brain states in preictal, ictal and interictal iEEG recordings from epilepsy patients using multichannel data. We analyze data from 55 patients, significantly larger than many previous studies that rely on patient-specific models. Persistence diagrams derived from iEEG signals are vectorized using several TDA representations, including Carlsson coordinates, persistence images, and template functions. To understand how topological representations interact with modern machine learning pipelines, we conduct a large-scale ablation study across multiple iEEG frequency bands, dimensionality reduction techniques, feature representations, and classifier architectures. Our experiments show that dimension-reduced topological representations achieve up to 80\% balanced accuracy for three-class classification. Interestingly, classical machine learning models perform comparably to deep learning models, achieving up to 79.17\% balanced accuracy, suggesting that carefully designed topological features can substantially reduce model complexity requirements. In contrast, pipelines preserving the full multichannel feature structure exhibit severe overfitting due to the high-dimensional feature space. These findings highlight the importance of structure-preserving dimensionality reduction when applying topology-based representations to multichannel neural data.
We introduce INDOTABVQA, a benchmark for evaluating cross-lingual Table Visual Question Answering (VQA) on real-world document images in Bahasa Indonesia. The dataset comprises 1,593 document images across three visual styles (bordered, borderless, and colorful) with one or more than one tables, and 1,593 question-answer sets in four languages: Bahasa Indonesia, English, Hindi, and Arabic. This enables evaluation of Vision-Language Models (VLMs) in both monolingual (Bahasa documents with Bahasa questions) and cross-lingual settings (Bahasa documents with questions in other languages). We benchmark leading open-source VLMs (Qwen2.5-VL, Gemma-3, LLaMA-3.2) and GPT-4o and reveal substantial performance gaps, particularly on structurally complex tables and in low-resource languages. Fine-tuning a compact 3B and LoRA-finetuned 7B model on our dataset yields 11.6% and 17.8% improvements in accuracy. Providing explicit table region coordinates as additional input further improves performance by 4-7%, demonstrating the value of Spatial priors for table-based reasoning. Our findings underscore the importance of language-diverse, domain-specific datasets and demonstrate that targeted fine-tuning can significantly enhance VLM performance on specialized document understanding tasks. INDOTABVQA provides a valuable resource for advancing research in cross-lingual, structure-aware document understanding, especially in underrepresented regions of the world. Full dataset can be accessed in huggingface at: https://huggingface.co/datasets/NusaBharat/INDOTABVQA}
Identifying and understanding the features that a deep network (DN) extracts from its inputs to produce its outputs is a focal point of interpretability research. The Linear Representation Hypothesis (LRH) identifies features in terms of the linear directions formed by the inputs in a DN's latent space. However, the LRH is limited as it abstracts away from individual components (e.g., neurons and layers), is susceptible to identifying spurious features, and cannot be applied across sub-components (e.g., multiple layers). In this paper, we introduce the Linear Centroids Hypothesis (LCH) as a new framework for identifying the features of a DN. The LCH posits that features correspond to linear directions of centroids, which are vector summarizations of the functional behavior of a DN in a local region of its input space. Interpretability studies under the LCH can leverage existing LRH tools, such as sparse autoencoders, by applying them to the DN's centroids rather than to its latent activations. We demonstrate that doing so yields sparser feature dictionaries for DINO vision transformers, which also perform better on downstream tasks. The LCH also inspires novel approaches to interpretability; for example, LCH can readily identify circuits in GPT2-Large. For code to study the LCH https://github.com/ThomasWalker1/LinearCentroidsHypothesis .
Self-driving laboratories promise to accelerate materials discovery. Yet current automated solid-state synthesis platforms are limited to ambient conditions, thereby precluding their use for air-sensitive materials. Here, we present A-Lab for Glovebox Powder Solid-state Synthesis (A-Lab GPSS), a robotic platform capable of synthesizing and characterizing air-sensitive inorganic materials under strict air-free conditions. By integrating an agentic AI framework into the A-Lab GPSS platform, we structure autonomous experimental design through abductive and inductive reasoning. We deploy this platform to explore the vast compositional space of lithium halide spinel solid-state ionic conductors. Across a synthesis campaign comprising 352 samples with diverse compositions, the system explores a broad chemical space, experimentally realizing 72% of the 171 possible pairwise combinations among the 19 metals considered in this study. Over the course of the campaign, the fraction of compositions exhibiting both good ionic conductivity (> 0.05 mS/cm) and high halide spinel phase purity increases from 1.33% in the first 75 agent-proposed samples to 5.33% in the final 75. Furthermore, by inspecting the AI's reasoning processes, we reveal distinct yet complementary discovery strategies: abductive reasoning interrogates abnormal observations within already explored regions, whereas inductive reasoning expands the search into broader, previously unvisited chemical space. This work establishes a scalable platform for the autonomous discovery of complex, air-sensitive solid-state materials.
Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-performance general-purpose CPUs, especially emerging 3D-stacked Static Non-Uniform Cache Architecture (3D S-NUCA) systems. These architectures offer enhanced bandwidth and locality but suffer from severe thermal challenges and uneven cache latencies due to 3D Networks-on-Chip (NoC). Optimal management of thread migration and V/f scaling is non-trivial due to LFM kernel diversity and system heterogeneity. Existing thermal management approaches often rely on oversimplified analytical models and lack adaptability. We propose AILFM, an Active Imitation Learning (AIL)-based scheduling framework that learns near-optimal thermal-aware scheduling policies from Oracle demonstrations with minimal run-time overhead. AILFM accounts for both core-level performance heterogeneity and kernel-specific behavior in LFMs to maintain thermal safety while maximizing performance. Extensive experiments show that AILFM outperforms state-of-the-art baselines and generalizes well across diverse LFM workloads.
Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enabled by data and pipeline parallelism, two techniques that require ultra-high-bandwidth communication. While efficient methods now exist for decentralized data parallelism, pipeline parallelism remains the primary challenge. Recent efforts, such as Subspace Models (SM), have claimed up to 100x activation compression but rely on complex constrained optimization and diverge from true end-to-end training. In this paper, we propose a different approach, based on an architecture designed from the ground up to be native to low-bandwidth communication environments while still applicable to any standard transformer-based architecture. We call this architecture the Residual Bottleneck Model or ResBM, it introduces a residual encoder-decoder bottleneck module across pipeline boundaries that can be trained end-to-end as part of the model's parameters while preserving an explicit low-rank identity path. We show that ResBMs achieve state-of-the-art 128x activation compression without significant loss in convergence rates and without significant memory or compute overhead.
High-fidelity numerical simulation of subsurface flow is computationally intensive, especially for many-query tasks such as uncertainty quantification and data assimilation. Deep learning (DL) surrogates can significantly accelerate forward simulations, yet constructing them requires substantial machine learning (ML) expertise - from architecture design to hyperparameter tuning - that most domain scientists do not possess. Furthermore, the process is predominantly manual and relies heavily on heuristic choices. This expertise gap remains a key barrier to the broader adoption of DL surrogate techniques. For this reason, we present AutoSurrogate, a large-language-model-driven multi-agent framework that enables practitioners without ML expertise to build high-quality surrogates for subsurface flow problems through natural-language instructions. Given simulation data and optional preferences, four specialized agents collaboratively execute data profiling, architecture selection from a model zoo, Bayesian hyperparameter optimization, model training, and quality assessment against user-specified thresholds. The system also handles common failure modes autonomously, including restarting training with adjusted configurations when numerical instabilities occur and switching to alternative architectures when predictive accuracy falls short of targets. In our setting, a single natural-language sentence can be sufficient to produce a deployment-ready surrogate model, with minimum human intervention required at any intermediate stage. We demonstrate the utility of AutoSurrogate on a 3D geological carbon storage modeling task, mapping permeability fields to pressure and CO$_2$ saturation fields over 31 timesteps. Without any manual tuning, AutoSurrogate is able to outperform expert-designed baselines and domain-agnostic AutoML methods, demonstrating strong potential for practical deployment.
Diabetes devices, including Continuous Glucose Monitoring (CGM), Smart Insulin Pens, and Automated Insulin Delivery systems, generate rich time-series data widely used in research and machine learning. However, inconsistent data formats across sources hinder sharing, integration, and analysis. We present DIAX (DIAbetes eXchange), a standardized JSON-based format for unifying diabetes time-series data, including CGM, insulin, and meal signals. DIAX promotes interoperability, reproducibility, and extensibility, particularly for machine learning applications. An open-source repository provides tools for dataset conversion, cross-format compatibility, visualization, and community contributions. DIAX is a translational resource, not a data host, ensuring flexibility without imposing data-sharing constraints. Currently, DIAX is compatible with other standardization efforts and supports major datasets (DCLP3, DCLP5, IOBP2, PEDAP, T1Dexi, Loop), totaling over 10 million patient-hours of data. https://github.com/Center-for-Diabetes-Technology/DIAX
An OS kernel that runs LLM inference internally can read logit distributions before any text is generated -- and act on them as a governance primitive. I present ProbeLogits, a kernel-level operation that performs a single forward pass and reads specific token logits to classify agent actions as safe or dangerous, with zero learned parameters. On a 260-prompt OS action benchmark (9 categories including adversarial attacks), ProbeLogits achieves F1=0.980, Precision=1.000, and Recall=0.960 using a general-purpose 7B model at 4-bit quantization. On ToxicChat (1,000 human-annotated real conversations), it achieves F1=0.790 at default calibration strength $α$=1.0, improving to F1=0.837 at $α$=0.5 -- 89% of Llama Guard 3's F1~0.939 with zero learned parameters. A key design contribution is the calibration strength $α$, which serves as a deployment-time policy knob rather than a learned hyperparameter. By adjusting $α$, the OS can enforce strict policies for privileged operations ($α\geq 0.8$, maximizing recall) or relaxed policies for conversational agents ($α$=0.5, maximizing precision). Contextual calibration improves accuracy from 64.8% to 97.3% on the custom benchmark. I implement ProbeLogits within Anima OS, a bare-metal x86_64 OS written in 80,400 lines of Rust. Because agent actions must pass through kernel-mediated host functions, ProbeLogits enforcement operates below the WASM sandbox boundary, making it significantly harder to circumvent than application-layer classifiers. Each classification costs 65ms on 7B -- fast enough for per-action governance. I also show that treating KV cache as process state enables checkpoint, restore, and fork operations analogous to traditional process management. To my knowledge, no prior system exposes LLM logit vectors as OS-level governance primitives.
Our ability to predict, control, and ultimately understand complex systems rests on discovering the equations that govern their dynamics. Identifying these equations directly from noisy, limited observations has therefore become a central challenge in data-driven science, yet existing library-based sparse regression methods force a compromise between automation, statistical rigor, and computational efficiency. Here we develop Bayesian-ARGOS, a hybrid framework that reconciles these demands by combining rapid frequentist screening with focused Bayesian inference, enabling automated equation discovery with principled uncertainty quantification at a fraction of the computational cost of existing methods. Tested on seven chaotic systems under varying data scarcity and noise levels, Bayesian-ARGOS outperforms two state-of-the-art methods in most scenarios. It surpasses SINDy in data efficiency for all systems and noise tolerance for six out of the seven, with a two-order-of-magnitude reduction in computational cost compared to bootstrap-based ARGOS. The probabilistic formulation additionally enables a suite of standard statistical diagnostics, including influence analysis and multicollinearity detection that expose failure modes otherwise opaque. When integrated with representation learning (SINDy-SHRED) for high dimensional sea surface temperature reconstruction, Bayesian-ARGOS increases the yield of valid latent equations with significantly improved long horizon stability. Bayesian-ARGOS thus provides a principled, automated, and computationally efficient route from scarce and noisy observations to interpretable governing equations, offering a practical framework for equation discovery across scales, from benchmark chaotic systems to the latent dynamics underlying global climate patterns.
Time-series forecasting aims to predict future values by modeling temporal dependencies in historical observations. It is a critical component of many real-world systems, where accurate forecasts improve operational efficiency and help mitigate uncertainty and risk. More recently, machine learning (ML), and especially deep learning (DL)-based models, have gained widespread adoption for time-series forecasting, but they remain vulnerable to adversarial attacks. However, many state-of-the-art attack methods are not directly applicable in time-series settings, where storing complete historical data or performing attacks at every time step is often impractical. This paper proposes an adversarial attack framework for time-series forecasting under an online bounded-buffer setting, leveraging an informed and selective attack strategy. By selectively targeting time steps where the model exhibits high confidence and the expected prediction error is maximal, our framework produces fewer but substantially more effective attacks. Experiments show that our framework can increase the prediction error up to 2.42x, while performing attacks in fewer than 10% of time steps.
Using FlowBoost, a closed-loop deep generative optimization framework for extremal structure discovery, we investigate $\ell^p$-generalizations of the finite free Stam inequality for real-rooted polynomials under finite free additive convolution $\boxplus_n$. At $p=2$, FlowBoost finds the Hermite pair as the unique equality case and reveals the spectral structure of the linearized convolution map at this extremal point. As a result, we conjecture that the singular values of the doubly stochastic coupling matrix $E_n$ on the mean-zero subspace are ${2^{-k/2}:k=1,\ldots,n-1}$, independent of $n$. Conditional on this conjecture, we obtain a sharp local stability constant and the finite free CLT convergence rate, both uniform in $n$. We introduce a one-parameter family of $p$-Stam inequalities using $\ell^p$-Fisher information and prove that the Hermite pair itself violates the inequality for every $p>2$, with the sign of the deficit governed by the $\ell^p$-contraction ratio of $E_n$. Systematic computation via FlowBoost supports the conjecture that $p^*\!=2$ is the sharp critical exponent. For $p<2$, the extremal configurations undergo a bifurcation, meaning that they become non-matching pairs with bimodal root structure, converging back to the Hermite diagonal only as $p\to 2^-$. Our findings demonstrate that FlowBoost, can be an effective tool of mathematical discovery in infinite-dimensional extremal problems.
Modern machine learning methods have been proposed to detect life in extraterrestrial samples, drawing on their ability to distinguish biotic from abiotic samples based on training models using natural and synthetic organic molecular mixtures. Here we show using Artificial Life that such methods are easily fooled into detecting life with near 100% confidence even if the analyzed sample is not capable of life. This is due to modern machine learning methods' propensity to be easily fooled by out-of-distribution samples. Because extra-terrestrial samples are very likely out of the distribution provided by terrestrial biotic and abiotic samples, using AI methods for life detection is bound to yield significant false positives.
While next-token prediction (NTP) has been the standard objective for training language models, it often struggles to capture global structure in reasoning tasks. Multi-token prediction (MTP) has recently emerged as a promising alternative, yet its underlying mechanisms remain poorly understood. In this paper, we study how MTP facilitates reasoning, with a focus on planning. Empirically, we show that MTP consistently outperforms NTP on both synthetic graph path-finding tasks and more realistic reasoning benchmarks, such as Countdown and boolean satisfiability problems. Theoretically, we analyze a simplified two-layer Transformer on a star graph task. We prove that MTP induces a two-stage reverse reasoning process: the model first attends to the end node and then reconstructs the path by tracing intermediate nodes backward. This behavior arises from a gradient decoupling property of MTP, which provides a cleaner training signal compared to NTP. Ultimately, our results highlight how multi-token objectives inherently bias optimization toward robust and interpretable reasoning circuits.
The stable operation of autonomous off-grid photovoltaic systems requires solar forecasting algorithms that respect atmospheric thermodynamics. Contemporary deep learning models consistently exhibit critical anomalies, primarily severe temporal phase lags during cloud transients and physically impossible nocturnal power generation. To resolve this divergence between data-driven modeling and deterministic celestial mechanics, this research introduces the Thermodynamic Liquid Manifold Network. The methodology projects 22 meteorological and geometric variables into a Koopman-linearized Riemannian manifold to systematically map complex climatic dynamics. The architecture integrates a Spectral Calibration unit and a multiplicative Thermodynamic Alpha-Gate. This system synthesizes real-time atmospheric opacity with theoretical clear-sky boundary models, structurally enforcing strict celestial geometry compliance. This completely neutralizes phantom nocturnal generation while maintaining zero-lag synchronization during rapid weather shifts. Validated against a rigorous five-year testing horizon in a severe semi-arid climate, the framework achieves an RMSE of 18.31 Wh/m2 and a Pearson correlation of 0.988. The model strictly maintains a zero-magnitude nocturnal error across all 1826 testing days and exhibits a sub-30-minute phase response during high-frequency optical transients. Comprising exactly 63,458 trainable parameters, this ultra-lightweight design establishes a robust, thermodynamically consistent standard for edge-deployable microgrid controllers.
We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transformers with bidirectional attention and permutation-symmetric input token configurations by deriving recurrence relations for activation statistics and APJNs across layers. Our theory predicts how attention modifies the asymptotic behavior of the APJN at large depth and matches APJNs measured in deep vision transformers. The criticality picture known from residual networks carries over to transformers: the pre-LayerNorm architecture exhibits power-law APJN growth, whereas transformers with LayerNorm replaced by elementwise $\tanh$-like nonlinearities have stretched-exponential APJN growth, indicating that the latter are subcritical. Applied to Dynamic Tanh (DyT) and Dynamic erf (Derf) transformers, the theory explains why these architectures can be more sensitive to initialization and optimization choices and require careful tuning for stable training.
The stable operation of off-grid photovoltaic systems requires accurate, computationally efficient solar forecasting. Contemporary deep learning models often suffer from massive computational overhead and physical blindness, generating impossible predictions. This paper introduces the Physics-Informed State Space Model (PISSM) to bridge the gap between efficiency and physical accuracy for edge-deployed microcontrollers. PISSM utilizes a dynamic Hankel matrix embedding to filter stochastic sensor noise by transforming raw meteorological sequences into a robust state space. A Linear State Space Model replaces heavy attention mechanisms, efficiently modeling temporal dependencies for parallel processing. Crucially, a novel Physics-Informed Gating mechanism leverages the Solar Zenith Angle and Clearness Index to structurally bound outputs, ensuring predictions strictly obey diurnal cycles and preventing nocturnal errors. Validated on a multi-year dataset for Omdurman, Sudan, PISSM achieves superior accuracy with fewer than 40,000 parameters, establishing an ultra-lightweight benchmark for real-time off-grid control.
We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on synthetic simulated data improves performance on IPhO (International Physics Olympiad) problems by 5-10 percentage points across model sizes. These results demonstrate that physics simulators can act as scalable data generators, enabling LLMs to acquire deep physical reasoning skills beyond the limitations of internet-scale QA data. Code available at: https://sim2reason.github.io/.
Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached, attention-head behavior stabilizes, leading to constant behavior across recurrences. Empirically, we discover that recurrent blocks learn stages of inference that closely mirror those of feedforward models, repeating these stages in depth with each iteration. We study how recurrent block size, input injection, and normalization influence the emergence and stability of these cyclic fixed points. We believe these findings help translate mechanistic insights into practical guidance for architectural design.
GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers from environment instability and closed pipelines, evaluation protocols drift silently across works, and trained agents rarely reach real users on real devices. We present \textbf{ClawGUI}, an open-source framework addressing these three gaps within a single harness. \textbf{ClawGUI-RL} provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. \textbf{ClawGUI-Eval} enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8\% reproduction against official baselines. \textbf{ClawGUI-Agent} brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms with hybrid CLI-GUI control and persistent personalized memory. Trained end to end within this pipeline, \textbf{ClawGUI-2B} achieves 17.1\% Success Rate on MobileWorld GUI-Only, outperforming the same-scale MAI-UI-2B baseline by 6.0\%.
Automation underpins progress across scientific and industrial disciplines. Yet, automating tasks requiring interpretation of abstract visual information remain challenging. For example, crystal alignment strongly relies on humans with the ability to comprehend diffraction patterns. Here we introduce an autonomous system that aligns single crystals without access to crystallography and diffraction theory. Using a model-free reinforcement learning framework, an agent learns to identify and navigate towards high-symmetry orientations directly from Laue diffraction patterns. Despite the absence of human supervision, the agent develops human-like strategies to achieve time-efficient alignment across different crystal symmetry classes. With this, we provide a computational framework for intelligent diffractometers. As such, our approach advances the development of automated experimental workflows in materials science.
We set out to train behavioral dispositions (self-verification, uncertainty acknowledgment, feedback integration) into small language models (0.6B to 2.3B effective parameters) through a four-stage all-MIT distillation pipeline, with follow-on experiments on inference-time attention-head interventions and a frozen-base confidence-gated sidecar. An internal draft reported +33.9-point MCAS and +15.3-point HumanEval gains on a Qwen3-0.6B student; a second-pass sanity check falsified both numbers before publication. The HumanEval delta was a truncation artifact (n_predict=512) that inverted to -8.0 points at n_predict=1024; the MCAS gain disappeared under apples-to-apples scoring. That falsification triggered three subsequent arcs. Across (1) SFT/DPO LoRA on three model families and two domains, (2) inference-time attention-head tempering on o_proj, and (3) a training-free frozen-base sidecar reading the final-token hidden state h_last, we find no operator that moves judge-measured disposition without damaging content or collapsing into stylistic mimicry. The failure is consistent across five models (Qwen3-0.6B, Qwen3-1.7B, Qwen3.5-0.8B, Gemma 4 E2B, and SmolLM2-1.7B-Instruct). A within-distribution cross-validation pass (AUC=0.683) collapsed to chance on fresh prompts (AUC=0.516). We contribute a three-arc negative result with mechanism, a two-failure-mode taxonomy for linear h_last probes, and an honest falsification pipeline that converts the class of false positives we ourselves produced into publishable negatives. As an independent finding, Gemma 4 E2B exhibits near-complete confidence-correctness decoupling on the Chef domain (assertion asymmetry -0.009; the model asserts at 91% regardless of correctness).
Deep learning underpins a wide range of applications in MRI, including reconstruction, artifact removal, and segmentation. However, progress has been driven largely by public datasets focused on brain and knee imaging, shaping how models are trained and evaluated. As a result, careful studies of the reliability of these models across diverse anatomical settings remain limited. In this work, we introduce MosaicMRI, a large and diverse collection of fully sampled raw musculoskeletal (MSK) MR measurements designed for training and evaluating machine-learning-based methods. MosaicMRI is the largest open-source raw MSK MRI dataset to date, comprising 2,671 volumes and 80,156 slices. The dataset offers substantial diversity in volume orientation (e.g., axial, sagittal), imaging contrasts (e.g., PD, T1, T2), anatomies (e.g., spine, knee, hip, ankle, and others), and numbers of acquisition coils. Using VarNet as a baseline for accelerated reconstruction task, we perform a comprehensive set of experiments to study scaling behavior with respect to both model capacity and dataset size. Interestingly, models trained on the combined anatomies significantly outperform anatomy-specific models in low-sample regimes, highlighting the benefits of anatomical diversity and the presence of exploitable cross-anatomical correlations. We further evaluate robustness and cross-anatomy generalization by training models on one anatomy (e.g., spine) and testing them on another (e.g., knee). Notably, we identify groups of body parts (e.g., foot and elbow) that generalize well with each other, and highlight that performance under domain shifts depends on both training set size, anatomy, and protocol-specific factors.
Continuous diffusion has been the foundation of high-fidelity, controllable, and few-step generation of many data modalities such as images. However, in language modeling, prior continuous diffusion language models (DLMs) lag behind discrete counterparts due to the sparse data space and the underexplored design space. In this work, we close this gap with LangFlow, the first continuous DLM to rival discrete diffusion, by connecting embedding-space DLMs to Flow Matching via Bregman divergence, alongside three key innovations: (1) we derive a novel ODE-based NLL bound for principled evaluation of continuous flow-based language models; (2) we propose an information-uniform principle for setting the noise schedule, which motivates a learnable noise scheduler based on a Gumbel distribution; and (3) we revise prior training protocols by incorporating self-conditioning, as we find it improves both likelihood and sample quality of embedding-space DLMs with effects substantially different from discrete diffusion. Putting everything together, LangFlow rivals top discrete DLMs on both the perplexity (PPL) and the generative perplexity (Gen. PPL), reaching a PPL of 30.0 on LM1B and 24.6 on OpenWebText. It even exceeds autoregressive baselines in zero-shot transfer on 4 out of 7 benchmarks. LangFlow provides the first clear evidence that continuous diffusion is a promising paradigm for language modeling. Homepage: https://github.com/nealchen2003/LangFlow
Kullback-Leibler (KL) divergence is a fundamental concept in information theory that quantifies the discrepancy between two probability distributions. In the context of Variational Autoencoders (VAEs), it serves as a central regularization term, imposing structure on the latent space and thereby enabling the model to exhibit generative capabilities. In this work, we present a detailed derivation of the closed-form expression for the KL divergence between Gaussian distributions, a case of particular importance in practical VAE implementations. Starting from the general definition for continuous random variables, we derive the expression for the univariate case and extend it to the multivariate setting under the assumption of diagonal covariance. Finally, we discuss the interpretation of each term in the resulting expression and its impact on the training dynamics of the model.
We study the spatiotemporal patterns of density fluctuations in $^{16,24}$O and $^{48}$Ca using nuclear interactions from chiral effective field theory and the time-dependent coupled-cluster method. We find that two-particle-two-hole excitations generate small-amplitude fluctuations that are fast, short-ranged and of stochastic character.
We present the TQ4Q2.0 fragmentation functions for the production of all-heavy (fully heavy) $S$-wave tetraquarks ($T_{4Q}$) with scalar ($0^{++}$), axial-vector ($1^{+-}$), and tensor ($2^{++}$) quantum numbers in high-energy hadronic collisions. This work extends the previous TQ4Q1.1 framework by incorporating nonconstituent heavy-quark contributions and introducing a replica-based uncertainty-quantification strategy derived from multi-scale variations (MHOUs). The construction follows a nonrelativistic QCD factorization approach, combining gluon- and heavy-quark-initiated fragmentation channels at leading power. Initial-scale inputs are modeled through updated potential-inspired wave functions, while the subsequent DGLAP evolution is performed via the threshold-aware HF-NRevo scheme. A comprehensive systematic analysis of uncertainties is carried out, with contributions from color-composite long-distance matrix elements (LDMEs) and perturbative multiscale inputs. The resulting TQ4Q2.0 grids, publicly released in LHAPDF6 format, provide the first complete phenomenological set for all-heavy exotics, enabling precise studies of all-charm tetraquark production and jet-associated observables within the JETHAD environment. This article completes the high-energy resummation-driven generation of the TQ4Q program and establishes a definitive baseline for future collider-oriented analyses of all-heavy multiquark dynamics.
A method for selecting and/or rejecting leptons from charm semileptonic decays based on the tagging of the secondary vertex using a hadron track is introduced. The method is developed for dimuon Drell-Yan measurements in LHCb using full simulations in proton-proton collisions at $\sqrt{s}=13.6$ TeV. We focus on the invariant mass range between 2.9 and 5 GeV/$c^2$ with single muon transverse momentum larger than 1 GeV/$c$. A novel strategy is detailed for background rejection, achieving an improvement of the signal over background of a factor $\sim 4$ at an efficiency of 81% with minimal bias on the Drell-Yan signal properties. Moreover, a second approach is presented for the construction of unbiased background-pure samples of single muons from charm decays, achieving a charm efficiency of 21.4% at a Drell-Yan efficiency of 1.1%.
We investigate the nonlinear response of flow harmonics $v_2,v_4$ to initial-state eccentricities $ε_2,ε_4$ within the Gubser-flow framework. By extending the perturbative solutions of Gubser flow, we derive analytic nonlinear response relations connecting the eccentricities $ε_2,ε_4$ to the flow harmonics $v_2,v_4$. Our results reproduce the well-known result $v_4/v_2^2 \to 1/2$ in large transverse momentum $p_T$ limit. Furthermore, we study the effects of a mismatch between the participant and reaction planes. We find that the conventional nonlinear response coefficients acquire an additional factor determined by the participant-plane angles, which is often approximated as statistical noise driven by event-by-event fluctuations. This factor can modify both the strength but even the sign of the effective nonlinear response coefficient, making it sensitive to the initial configuration of the colliding nuclei. Our study provides new analytical insight into the origin of collective phenomena in relativistic heavy-ion collisions.