MyArxiv Feed
High Energy Physics - Experimental
☆ Scale-anomaly-induced confining pressure within hadrons
The effect of the QCD scale anomaly on the internal pressure distribution of hadrons is studied based on the trace-traceless decomposition of the energy-momentum tensor. Using recent model-independent results of gravitational form factors as input, the pressure distributions of both pions and nucleons are analyzed in the instant form and the light-front form. It is found that, in all cases, the scale anomaly dominantly generates the confining pressure. This result suggests that the phenomenon is a universal feature, independent of models, types of hadrons, and the choice of form.
comment: 7 pages, 2 figures
☆ Search for $t\bar tt\bar tW$ Production at $\sqrt{s} = 13$ TeV Using a Modified Graph Neural Network at the LHC
The simultaneous production of four top quarks in association with a ($W$) boson at $(\sqrt{s} = 13)$ TeV is an rare SM process with a next-to-leading-order (NLO) cross-section of $(6.6^{+2.4}_{-2.6} {ab})$\cite{saiel}. Identifying this process in the fully hadronic decay channel is particularly challenging due to overwhelming backgrounds from $t\bar{t}, t\bar{t}W, t\bar{t}Z$, and triple-top production processes. This study introduces a modified physics informed Neural Network, a hybrid graph neural network (GNN) enhancing event classification. The proposed model integrates Graph layers for particle-level features, a custom Multi Layer Perceptron(MLP) based global stream with a quantum circuit and cross-attention fusion to combine local and global representations. Physics-informed Loss function enforce jet multiplicity constraints, derived from event decay dynamics. Benchmarked against conventional methods, the GNN achieves a signal significance $(S/\sqrt{S+B})$ of $0.174$ and ROC-AUC of 0.974, surpassing BDT's significance of $0.148$ and ROC of $0.913$, while Xgboost achieves a significance of $0.149$ and ROC of $0.920$. The classification models are trained on Monte Carlo (MC) simulations, with events normalized using cross-section-based reweighting to reflect their expected contributions in a dataset corresponding to $350\;$fb$^{-1}$ of integrated luminosity. This enhanced approach offers a framework for precision event selection at the LHC, leveraging high dimensional statistical learning and physics informed inference to tackle fundamental HEP challenges, aligning with ML developments.
comment: 14 pages
Jet substructure measurements elucidating partonic evolution in $p$+$p$ collisions at RHIC
Jets are multiscale objects that connect partons to hadrons, making jet substructure measurements crucial for probing both perturbative and non-perturbative processes in QCD. At STAR, a variety of jet substructure observables, such as SoftDrop groomed splittings and N-Point Energy Correlators (ENC), provide insights into parton evolution and hadronization mechanisms. SoftDrop-groomed observables and ENCs both connect measurement to fundamental QCD at the parton level, allowing for comparisons to first-principles theoretical calculations. Additionally, by also including charge information, as in the charge-weighted ENC, details about the hadronization mechanism can be obtained. In these proceedings, we present preliminary results on measurements of SoftDrop observables and ENCs across different jet momenta and radii in $p$+$p$ collisions at $\sqrt{s}$~=~200~GeV using STAR data.
comment: 6 pages, 7 figures, 25th ZIM\'ANYI SCHOOL WINTER WORKSHOP ON HEAVY ION PHYSICS
☆ Fast prediction of the hydrodynamic QGP evolution in ultra-relativistic heavy-ion collisions using Fourier Neural Operators
Recent research in machine learning has employed neural networks to learn mappings between function spaces on bounded domains termed ``neural operators''. As such, these operators can provide alternatives to standard numerical methods for partial differential equation (PDE) solutions. In particular, the Fourier Neural Operator (FNO) has been shown to map solutions for classical fluid flow problems with accuracy competitive with traditional PDE solvers and with much greater computing speed. This paper explores the first application of FNOs to model ultra-relativistic hydrodynamic flow of the quark-gluon plasma (QGP) generated in relativistic heavy-ion collisions. The application in ultra-relativistic flow is novel relative to classical flow, due to the hydrodynamic evolution of the QGP occurring in femtometer-scaled explosions characterized by rapid expansion cooling. In this study we investigate the applicability of FNOs as computationally fast alternatives to standard numerical PDE solvers. The FNO predictions are evaluated by comparing to standard PDE solutions, using \MUSIC in the \JETSCAPE Monte Carlo event generator framework. The performance of calculating established experimental observables for flow and jet quenching using FNOs in the MC framework are also reported.
comment: 15 pages, 11 figures, submitted to Physics Review C, 9 pages of appendices, 9 figures in appendices
☆ Branching ratios and CP asymmetries of $B^0 \to η_c f_0$ in the improved perturbative QCD formalism
Motivated by the idea of fragmented scalar glueball, we investigate the decays $B^0 \to \eta_c f_0$ within the improved perturbative QCD (iPQCD) framework by including the known next-to-leading order corrections. Here, $B^0$ and $f_0$ denote the neutral $B_{d,s}^0$ mesons and the light scalar mesons $f_0(500, 980, 1370, 1500)$ under the $q\bar q$ assignment. The {\it CP}-averaged branching ratios (BRs) and the {\it CP} asymmetries of $B^0 \to \eta_c f_0$ are evaluated with the $f_0(500)[f_0(1370)]-f_0(980)[f_0(1500)]$ mixing in quark-flavor basis. For effective comparisons with the near-future measurements, we further derive the $B^0 \to \eta_c f_0 (\to \pi^+ \pi^-/K^+ K^-)$ BRs under the narrow-width approximation. ${\rm BR}(B_s^0 \to \eta_c f_0(980) (\to \pi^+ \pi^-))= (2.87^{+1.38}_{-1.29}) \times 10^{-4}$ obtained in the iPQCD formalism agrees with the available measurements and predictions within uncertainties. Large BRs of $B_s^0 \to \eta_c f_0(1500) (\to \pi^+ \pi^-/K^+ K^-)$ and large direct {\it CP} asymmetries of $B^0 \to \eta_c f_0(1370, 1500)$ are accessible in the LHCb and Belle-II experiments. The experimental tests of these iPQCD predictions would help us to understand the nature of these light scalars more deeply and provide evidences to decipher $f_0(1500)$ as a primary or fragmented scalar glueball potentially.
comment: 18 pages, 6 figures, 4 tables
☆ Subthreshold parameters of $ππ$ scattering revisited
Using the most recent experimental data and lattice QCD calculations of $\pi\pi$ scattering lengths, while employing dispersive representations of the amplitude based on Roy equations, we compute the subthreshold parameters of this process. We use Monte Carlo sampling to numerically model the probability distribution of the results based on all uncertainties in the inputs. We also investigate the dependence of the results on a theoretical correlation between the $\pi\pi$ scattering lengths $a^0_0$ and $a^2_0$, which was previously established in the framework of two-flavour chiral perturbation theory.
comment: 21 pages, 2 figures
☆ Latest neutrino results from the FASER experiment and their implications for forward hadron production
The muon puzzle -- an excess of muons relative to simulation predictions in ultra-high-energy cosmic-ray air showers -- has been reported by many experiments. This suggests that forward particle production in hadronic interactions is not fully understood. Some of the scenarios proposed to resolve this predict reduced production of forward neutral pions and enhanced production of forward kaons (or other particles). The FASER experiment at the LHC is located 480 m downstream of the ATLAS interaction point and is sensitive to neutrinos and muons, which are the decay products of forward charged pions and kaons. In this study, the latest measurements of electron and muon neutrino fluxes are presented using the data corresponding to 9.5 $\mathrm{fb^{-1}}$ and 65.6 $\mathrm{fb^{-1}}$ of proton-proton collisions with $\sqrt{s}=13.6~\mathrm{TeV}$ by the FASER$\nu$ and the FASER electronic detector, respectively. These fluxes are compared with predictions from recent hadronic interaction models, including EPOS-LHCr, SIBYLL 2.3e, and QGSJET 3. The predictions are generally consistent with the measured fluxes from FASER, although some discrepancies appear in certain energy bins. More precise flux measurements with additional data will follow soon, enabling validation of pion, kaon, and charm meson production with finer energy binning, reduced uncertainties, and multi-differential analyses.
comment: 10 pages, 2 figures, Presented to the 39th International Cosmic Ray Conference (ICRC2025)
☆ Quantum-enhanced dark matter detection using Schrödinger cat states
Quantum metrology enables sensitive dark matter detection, particularly using nonclassical states, such as Schr\"odinger cat states featuring sub-Planck interference structures in microwave cavities. Here, we report the first experimental application of four-component Schr\"odinger cat states within a high-quality superconducting microwave cavity to detect dark photons, a potential dark matter candidate. We demonstrate an 8.1-fold enhancement in the signal photon rate and constrain the dark photon kinetic mixing angle to an unprecedented $\epsilon < 7.32 \times 10^{-16}$ near 6.44~GHz (26.6~$\mu$eV). By employing a parametric sideband drive to actively tune the cavity frequency, we achieve dark photon searches and background subtraction across multiple frequency bins, yielding a sensitivity at the $10^{-16}$ level within a 100~kHz bandwidth. Our Schr\"odinger's cat-assisted detection (SCaD) scheme demonstrates a substantial improvement over previous results, promising potential implications in quantum-enhanced searches for new physics.
comment: 20 pages, 17 figures, 5 tables
☆ Revealing chiral-odd two-meson generalized distribution amplitudes in $e^- e^+ \to (ππ) (ππ)$ reactions
We demonstrate that chiral-odd dimeson generalized distribution amplitudes (CO-GDAs)-nonperturbative objects encoding the transition of a quark-antiquark pair into two mesons-can be accessed in high-energy $e^- e^+$ annihilation into two meson pairs, each with a relatively low invariant mass. While chiral-even GDAs contribute to the leading one-photon amplitude, the chiral-odd sector enters via two-photon exchange. We show that the interference between these amplitudes leads to measurable effects at BES III or future tau-charm factories. This work opens a direct path to experimentally probing the long-missing chiral-odd sector of meson structure-specifically, the anomalous tensorial magnetic moment of spin-zero mesons such as the pion.
comment: 6 pages, 4 figures
☆ Review on recent results of J/$ψ$ production at STAR
Studying the production of J/$\psi$ (bound state of charm and anti-charm quark) in proton-proton collisions gives an opportunity to test quantum chromodynamics (QCD) calculations, as the production of J/$\psi$ involves both perturbative and non-perturbative processes. However, theoretical calculations are still unable to fully and simultaneously explain experimental results, such as polarization and $p_\text{T}$ spectra measured in different kinematic regimes and colliding energies. More studies are needed to investigate J/$\psi$ production mechanism. In heavy-ion collisions, charmonia can be used to study the properties of the medium as they are expected to dissociate in the medium when the Debye radius, inversely proportional to the medium temperature, becomes smaller than their size. Other competing effects, such as recombination, have also been found to modify the observed J/$\psi$ yield in heavy-ion collisions. We review recent measurements of the J/$\psi$ production in proton-proton and heavy-ion collisions at various collision energies measured with the STAR experiment at RHIC. The data are compared with recent model calculations on charmonia production.
☆ Kilo-scale point-source inference using Parametric Cataloging
The estimation of the number of point-sources in the sky is one the oldest problems in astronomy, yet an easy and efficient method for estimating the uncertainty on these counts is still an open problem. Probabilistic cataloging solves the general point-source inference problem, but the trans-dimensional nature of the inference method requires a bespoke approach that is difficult to scale. Here it is shown that probabilistic cataloging can be performed in a fixed-dimensional framework called Parametric Cataloging under mild assumptions on some of the priors. The method requires only a simple reparameterization of the flux coordinates, yielding an accessible method that can be implemented in most probabilistic programming environments. As the parameter space is fixed-dimensional, off the shelf gradient based samplers can be employed which allows the method to scale to tens of thousands of sources.
comment: 7 pages, 4 figures
☆ Production of Jets at STAR
Jets serve as an important tool to probe QCD both in the vacuum and in the hot and dense medium. The STAR experiment at RHIC plays a key role in studying QCD phenomena across different collision systems ($p$+$p$, $p$+A, A+A), offering access to a kinematic regime that complements that of the LHC. Building on recent jet and event activity studies at STAR, we present recent measurements on charged-particle jets at $\sqrt{s_{\mathrm{NN}}}~=~200$ GeV. In $p$+Au collisions, we explore event activity (EA) measured in the Au-going direction and its correlation with particle production at mid-rapidity. While soft particle production increases with EA, high-$p_{\mathrm{T}}$ jets are found to be inversely related to EA. Ratios of $p_{\mathrm{T}}$ imbalance and azimuthal dijet separation between high- and low-EA events show no significant differences, suggesting no strong evidence of jet quenching in high-EA $p$+Au collisions. In Au+Au collisions, we report semi-inclusive measurements of jets recoiling from $\gamma$ and $\pi^0$ triggers, using mixed-event techniques to subtract background and study jet suppression, intra-jet broadening, and acoplanarity. Additionally, we present inclusive charged-particle jet spectra corrected for background fluctuations, extending the kinematic reach of previous measurements. These results provide crucial insight into the modification of jets in the medium and contribute to a deeper understanding of QCD in heavy-ion collisions.
comment: 8 pages, 5 figures, proceedings from 24th Zimanyi School Winter Workshop On Heavy Ion Physics
☆ Probing Cosmic Ray Composition and Muon-philic Dark Matter via Muon Tomography
This work presents a novel cosmic-ray scattering experiment employing a Resistive Plate Chambers (RPC) muon tomography system. By introducing the scattering angle between incident and outgoing cosmic-ray tracks as a key observable, this approach enables simultaneous studies of secondary cosmic-ray composition and searching for new physics. During a 63-day campaign, 1.18 million cosmic ray scattering events were recorded and analyzed. By performing combined template fits to the observed angular distribution, particle abundances are measured, for example, resolving the electron component at $\sim 2\%$ precision. Furthermore, constraints are established on elastic muon-dark matter (DM) scattering cross-sections for muon-philic dark matter. At the 95\% confidence level, the limit reaches 1.62 $\times$ $10^{-17}$ $\rm{cm}^{2}$ for 1 GeV slow DM, demonstrating sensitivity limit to light muon-coupled slow DM.
comment: PKMu Experiment Project-1 with Cosmic Muons, 6 pages, 4 figures
♻ ☆ Data-parallel leading-order event generation in MadGraph5_aMC@NLO
The CUDACPP plugin for MadGraph5_aMC@NLO aims to accelerate leading order tree-level event generation by providing the MadEvent event generator with data-parallel helicity amplitudes. These amplitudes are written in templated C++ and CUDA, allowing them to be compiled for CPUs supporting SSE4, AVX2, and AVX-512 instruction sets as well as CUDA- and HIP-enabled GPUs. Using SIMD instruction sets, CUDACPP-generated amplitude routines routines are shown to speed up linearly with SIMD register size, and GPU offloading is shown to provide acceleration beyond that of SIMD instructions. Additionally, the resulting speed-up in event generation perfectly aligns with predictions from measured runtime fractions spent in amplitude routines, and proper GPU utilisation can speed up high-multiplicity QCD processes by an order of magnitude when compared to optimal CPU usage in server-grade CPUs.
comment: 40 pages, 22 figures
♻ ☆ Exploring ultra-high energy neutrino experiments through the lens of the transport equation
We develop a first-principles formalism, based on the transport equation in the line-of-sight approximation, to link the expected number of muons at neutrino telescopes to the flux of neutrinos at the Earth's surface. We compute the distribution of muons inside Earth, arising from the up-scattering of neutrinos close to the detector, as well as from the decay of taus produced farther away. This framework allows one to account for systematic uncertainties, as well as to clarify the assumptions behind definitions commonly used in the literature, such as the effective area. We apply this formalism to analyze the high-energy muon event recorded by KM3NeT, with a reconstructed energy of $ 120^{+110}_{-60} \, \mathrm{PeV}$ and an elevation angle of $\left(0.54\pm 2.4\right)^\circ$, in comparison with the non-observation of similar events by IceCube. We find a $3.1\,\sigma$ tension between the two experiments, assuming a diffuse neutrino source with a power-law energy dependence. Combining both datasets leads to a preference for a very low number of expected events at KM3NeT, in stark contrast to the observed data. The tension increases both in the case of a diffuse source peaking at the KM3NeT energy and of a steady point source, whereas a transient source may reduce the tension down to $1.6\,\sigma$. The formalism allows one to treat potential beyond-the-Standard-Model sources of muons, and we speculate on this possibility to explain the tension.
comment: 47 pages, 16 figures, 3 tables
♻ ☆ On the Simulation of Hidden Parton Showers in the Conformal Window
We consider confining Hidden Valley/Dark Sector theories containing many dark quark flavors. These theories are in the ``conformal window'': they reach an infrared fixed point when their quarks are massless, and have unfamiliar confinement when the quark masses are non-zero but small. Their jets of hidden hadrons may be quite different from those familiar from QCD, but their details cannot currently be simulated even qualitatively. This is partly due to the use of approximations to the two-loop running coupling in existing event generators' parton showers, which are not broadly applicable across the conformal window. We argue that the exact two-loop running coupling, and a corresponding Sudakov factor employing that coupling, must be implemented in simulation packages in order to allow phenomenological studies of these theories.
comment: 24 pages, 8 figures, version accepted for publication
♻ ☆ Testing New Physics in Oscillations at a Neutrino Factory
A neutrino factory is a potential successor to the upcoming generation of neutrino oscillation experiments and a possible precursor to next-generation muon colliders. Such a machine would provide a well-characterized beam of $\nu_\mu$, $\bar\nu_\mu$, $\nu_e$, and $\bar\nu_e$ neutrinos with comparable statistics. Here we show the sensitivity of a neutrino factory to new oscillation physics scenarios such as vector neutrino non-standard interactions and CPT violation. We study two different potential setups for a neutrino factory with different assumptions on charge identification in the far detector. We find that 10 years of a neutrino factory combined with 10 years of DUNE can improve over most of the current constraints on these scenarios and even over forecasted constraints by 20 years of DUNE. Additionally, we find that a neutrino factory can break degeneracies between the standard oscillation parameters and neutrino non-standard interaction parameters present at DUNE.
comment: 29 pages, 23 figures, 1 table, comments welcome! v2: matches published version
♻ ☆ A critical appraisal of tests of locality and of entanglement versus non-entanglement at colliders
It has been argued more than 30 years ago that it is not possible to test locality at colliders, due to the inability to directly measure non-commutating observables such as spin components in current collider experiments. Recently, there has been a lot of phenomenological and experimental activity around testing locality via Bell-type experiments or entanglement versus non-entanglement in a collider environment. These results seem to evade the earlier no-go theorem by indirectly measuring spin correlations via their relation to angular correlations between momenta. We perform a careful study of the feasibility of such an approach. We scrutinize the relationship between spin and angular correlations in both quantum mechanics and local hidden variable theories. Our conclusion is that it is currently not possible to perform a logically coherent set of experimental measurements at colliders that would allow one to test locality or entanglement versus non-entanglement. This reaffirms the earlier no-go theorem. We stress that the no-go theorem does not apply to measurements of observables inspired from entanglement and Quantum Information Theory to test the Standard Model of particle physics.
comment: 22 pages, no figures
♻ ☆ Dichroic Filter Characterizations
We present here measurements and characterizations of several dichroic filters, which are being used more commonly in nuclear and particle physics for photon-detection. The measurements were performed on filters immersed in several media: air, water, and LAB-based liquid scintillator. Measurements of transmission and reflection properties were made at various angles of incidence, and the data presented here can be used to develop detailed optical models for detector simulations. We find a modified Bragg's Law is a good model for the shift in transmission edge as a function of angle of incidence, across the various media.
comment: 16 pages, 21 figures, minor clarifications and corrections were made in response to feedback from JINST referees. 13 pages, 20 figures
♻ ☆ Transverse spin polarization as a novel probe of medium-induced transverse-momentum-broadening effect
The transverse polarization of $\Lambda$ hyperons within unpolarized jets originates from the transverse-momentum-dependent (TMD) fragmentation function $D_{1T}^\perp (z, p_T, \mu^2)$. In the vacuum environment, the QCD evolution of this TMD fragmentation function is governed by the Collins-Soper equation. However, in the presence of the quark-gluon plasma (QGP) medium, the jet-medium interaction induces a transverse-momentum-broadening effect that modifies the QCD evolution. As a result, the transverse spin polarization of $\Lambda$ hyperons in relativistic heavy-ion collisions differs from that in $pp$ collisions. We demonstrate that this difference serves as a sensitive probe for studying jet-medium interaction, offering a novel perspective through the spin degree of freedom.
comment: 14 pages, 6 figures; additional references added; both transverse momentum broadening and energy loss effects are considered
High Energy Physics - Phenomenology
☆ Scale-anomaly-induced confining pressure within hadrons
The effect of the QCD scale anomaly on the internal pressure distribution of hadrons is studied based on the trace-traceless decomposition of the energy-momentum tensor. Using recent model-independent results of gravitational form factors as input, the pressure distributions of both pions and nucleons are analyzed in the instant form and the light-front form. It is found that, in all cases, the scale anomaly dominantly generates the confining pressure. This result suggests that the phenomenon is a universal feature, independent of models, types of hadrons, and the choice of form.
comment: 7 pages, 2 figures
☆ Features of Charged Lepton Flavor Violation in an $A_4$ Symmetric Neutrino Mass Model
Neutrino flavour oscillations imply that there must be charged lepton flavour violation (CLFV) also. Different neutrino mass models predict different patterns of CLFV decays. Neutrino mass generation through standard see-saw mechanisms leads to the prediction that the branching ratios of meson CLFV decays will always be smaller than the corresponding radiative CLFV decays. In this work, we analyse an interesting neutrino mass model, based on $A_4$ symmetry, in which the symmetry and the symmetry-breaking pattern lead the neutrino mixing matrix to be of tri-bimaximal (TBM) form. In this model, we find that the meson CLFV decay amplitudes are not correlated to the corresponding radiative CLFV amplitudes, unlike in the case of see-saw models. The branching ratios of radiative CLFV decays are predicted to be negligibly small in this model, but those of the meson CLFV decays can be large enough to be observable in the near future.
comment: 21 pages, 3 figures, 6 Tables
☆ Updated Constraints from Electric Dipole Moments in the MSSM with R-Parity Violation
We revisit the electric dipole moments (EDMs) of quarks and leptons in the Minimal Supersymmetric Standard Model (MSSM) with trilinear $R$-parity violation (RPV). In this framework, EDMs are induced at the two-loop level via RPV interactions. We perform a comprehensive recalculation of several classes of Barr-Zee type diagrams in a general $R_\xi$ gauge. While we find general agreement with previous analytic results in the literature, our work provides a valuable independent cross-check of the complicated calculations. We also point out some subtleties in the intermediate steps and in the choice of the flavor basis for the numerical evaluation of the expressions. By confronting the theoretical predictions with the latest experimental limits on EDMs, we derive updated constraints on combinations of RPV couplings. We highlight a sharp, testable correlation between the proton and neutron EDM that emerges within the considered class of RPV models, offering a distinctive signature for future EDM experiments.
☆ Chirality structure of vector like new physics operators in charged current transitions
We investigate the cascade decay $B^{*0}_{s} \rightarrow D_s^-(\rightarrow \tau^-\,\bar\nu_{\tau})\,\ell^{+}\,{\nu}_\ell$ induced by flavor changing charged currents in the context of the Standard Model and in vector-like couplings beyond the Standard Model. We employ the helicity amplitude formalism for analysis and highlight the role of new vector-like couplings in charged current interactions. We find, in particular, that while new left handed chiral-vector like interactions contribute to the branching ratio, they do not affect the forward-backward asymmetry, or the angular observables. On the other hand, the right handed chiral vector-like coupling in the case of this decay contributes to the branching ratio, forward-backward asymmetry and the angular observables. We confirm that this difference in behavior between the left and right handed NP couplings is a general feature of charged current processes with a vector meson going to a pseudoscalar at the tree level in effective weak theory by cross checking with the cascade decays $B^{*+}_c \rightarrow P(\rightarrow P'\,\mu^+\,\nu_{\mu})\,\ell^{+}\,{\nu}_\ell$ where $P$ is $B_s^0$ ($D^0$) and $P'$ is $D_s^{*-}$ ($K^-$).
comment: 14 pages, 9 figures
☆ Anomalous Couplings from the Electroweak Chiral Lagrangian for Off-Shell Higgs in $gg\to Z_L Z_L$
We investigate the production of (longitudinal) $Z$-boson pairs in gluon fusion as a probe of anomalous Higgs couplings. Of particular interest is the kinematic region of large center-of-mass energy, where the Higgs-boson is highly off-shell. We employ the electroweak chiral Lagrangian with a light Higgs, which is the most natural effective field theory (EFT) for this process. We demonstrate this by a detailed analysis of the leading and next-to-leading EFT contributions to the amplitude, at leading order in QCD, emphasizing the role of power counting for a systematic application of the EFT. We show that at leading order the new-physics contributions are described by only two parameters, which depend on three EFT couplings. Subleading effects can be expected to be small within the range of validity of the EFT. Phenomenological implications are briefly discussed.
comment: 25 pages, 8 figures
Jet substructure measurements elucidating partonic evolution in $p$+$p$ collisions at RHIC
Jets are multiscale objects that connect partons to hadrons, making jet substructure measurements crucial for probing both perturbative and non-perturbative processes in QCD. At STAR, a variety of jet substructure observables, such as SoftDrop groomed splittings and N-Point Energy Correlators (ENC), provide insights into parton evolution and hadronization mechanisms. SoftDrop-groomed observables and ENCs both connect measurement to fundamental QCD at the parton level, allowing for comparisons to first-principles theoretical calculations. Additionally, by also including charge information, as in the charge-weighted ENC, details about the hadronization mechanism can be obtained. In these proceedings, we present preliminary results on measurements of SoftDrop observables and ENCs across different jet momenta and radii in $p$+$p$ collisions at $\sqrt{s}$~=~200~GeV using STAR data.
comment: 6 pages, 7 figures, 25th ZIM\'ANYI SCHOOL WINTER WORKSHOP ON HEAVY ION PHYSICS
☆ Branching ratios and CP asymmetries of $B^0 \to η_c f_0$ in the improved perturbative QCD formalism
Motivated by the idea of fragmented scalar glueball, we investigate the decays $B^0 \to \eta_c f_0$ within the improved perturbative QCD (iPQCD) framework by including the known next-to-leading order corrections. Here, $B^0$ and $f_0$ denote the neutral $B_{d,s}^0$ mesons and the light scalar mesons $f_0(500, 980, 1370, 1500)$ under the $q\bar q$ assignment. The {\it CP}-averaged branching ratios (BRs) and the {\it CP} asymmetries of $B^0 \to \eta_c f_0$ are evaluated with the $f_0(500)[f_0(1370)]-f_0(980)[f_0(1500)]$ mixing in quark-flavor basis. For effective comparisons with the near-future measurements, we further derive the $B^0 \to \eta_c f_0 (\to \pi^+ \pi^-/K^+ K^-)$ BRs under the narrow-width approximation. ${\rm BR}(B_s^0 \to \eta_c f_0(980) (\to \pi^+ \pi^-))= (2.87^{+1.38}_{-1.29}) \times 10^{-4}$ obtained in the iPQCD formalism agrees with the available measurements and predictions within uncertainties. Large BRs of $B_s^0 \to \eta_c f_0(1500) (\to \pi^+ \pi^-/K^+ K^-)$ and large direct {\it CP} asymmetries of $B^0 \to \eta_c f_0(1370, 1500)$ are accessible in the LHCb and Belle-II experiments. The experimental tests of these iPQCD predictions would help us to understand the nature of these light scalars more deeply and provide evidences to decipher $f_0(1500)$ as a primary or fragmented scalar glueball potentially.
comment: 18 pages, 6 figures, 4 tables
☆ Subthreshold parameters of $ππ$ scattering revisited
Using the most recent experimental data and lattice QCD calculations of $\pi\pi$ scattering lengths, while employing dispersive representations of the amplitude based on Roy equations, we compute the subthreshold parameters of this process. We use Monte Carlo sampling to numerically model the probability distribution of the results based on all uncertainties in the inputs. We also investigate the dependence of the results on a theoretical correlation between the $\pi\pi$ scattering lengths $a^0_0$ and $a^2_0$, which was previously established in the framework of two-flavour chiral perturbation theory.
comment: 21 pages, 2 figures
☆ One-Loop Calculations in Effective Field Theories with GoSam-3.0
We present a major update of the one-loop generator GoSam, containing performance improvements as well as new features, in particular functionalities that facilitate calculations beyond the Standard Model in Effective Field Theory frameworks.
comment: 23 pages, 2 figures
☆ Quantum-enhanced dark matter detection using Schrödinger cat states
Quantum metrology enables sensitive dark matter detection, particularly using nonclassical states, such as Schr\"odinger cat states featuring sub-Planck interference structures in microwave cavities. Here, we report the first experimental application of four-component Schr\"odinger cat states within a high-quality superconducting microwave cavity to detect dark photons, a potential dark matter candidate. We demonstrate an 8.1-fold enhancement in the signal photon rate and constrain the dark photon kinetic mixing angle to an unprecedented $\epsilon < 7.32 \times 10^{-16}$ near 6.44~GHz (26.6~$\mu$eV). By employing a parametric sideband drive to actively tune the cavity frequency, we achieve dark photon searches and background subtraction across multiple frequency bins, yielding a sensitivity at the $10^{-16}$ level within a 100~kHz bandwidth. Our Schr\"odinger's cat-assisted detection (SCaD) scheme demonstrates a substantial improvement over previous results, promising potential implications in quantum-enhanced searches for new physics.
comment: 20 pages, 17 figures, 5 tables
☆ Revealing chiral-odd two-meson generalized distribution amplitudes in $e^- e^+ \to (ππ) (ππ)$ reactions
We demonstrate that chiral-odd dimeson generalized distribution amplitudes (CO-GDAs)-nonperturbative objects encoding the transition of a quark-antiquark pair into two mesons-can be accessed in high-energy $e^- e^+$ annihilation into two meson pairs, each with a relatively low invariant mass. While chiral-even GDAs contribute to the leading one-photon amplitude, the chiral-odd sector enters via two-photon exchange. We show that the interference between these amplitudes leads to measurable effects at BES III or future tau-charm factories. This work opens a direct path to experimentally probing the long-missing chiral-odd sector of meson structure-specifically, the anomalous tensorial magnetic moment of spin-zero mesons such as the pion.
comment: 6 pages, 4 figures
Neural Posterior Estimation of Neutron Star Equations of State
We present a simulation-based inference (SBI) framework to constrain the neutron star (NS) equation of state (EoS) from astrophysical observations of masses, radii and tidal deformabilities, using Neural posterior estimation (NPE) with Conditional Normalising Flows (CNF). To ensure that the model conforms with reality, physics-informed constraints are embedded directly into the training loss. This enables efficient, likelihood-free inference of full posterior distributions for key thermodynamic quantities-including pressure, squared speed of sound, and the trace anomaly-conditioned on observational data. Our models are trained on synthetic datasets generated from two agnostic EoS priors: polytropic parametrizations (PT) and gaussian process (GP) reconstructions. These datasets span various scenarios, including the presence or absence of tidal deformability information and observational noise. Across all settings, the method produces accurate and well-calibrated posteriors, with uncertainties reduced when tidal deformability constraints are included. Furthermore, we find that the behavior of normalized predictive dispersions is strongly correlated with the maximum central density inside NSs, suggesting that the model can indirectly infer this physically meaningful quantity. The approach generalizes well across EoS families and accurately reconstructs derivative quantities such as the polytropic index, demonstrating its robustness and potential for probing dense matter in NS cores.
comment: 18 pages, 18 figures, 1 table
♻ ☆ Data-parallel leading-order event generation in MadGraph5_aMC@NLO
The CUDACPP plugin for MadGraph5_aMC@NLO aims to accelerate leading order tree-level event generation by providing the MadEvent event generator with data-parallel helicity amplitudes. These amplitudes are written in templated C++ and CUDA, allowing them to be compiled for CPUs supporting SSE4, AVX2, and AVX-512 instruction sets as well as CUDA- and HIP-enabled GPUs. Using SIMD instruction sets, CUDACPP-generated amplitude routines routines are shown to speed up linearly with SIMD register size, and GPU offloading is shown to provide acceleration beyond that of SIMD instructions. Additionally, the resulting speed-up in event generation perfectly aligns with predictions from measured runtime fractions spent in amplitude routines, and proper GPU utilisation can speed up high-multiplicity QCD processes by an order of magnitude when compared to optimal CPU usage in server-grade CPUs.
comment: 40 pages, 22 figures
♻ ☆ Revisiting the connection of baryon number, lepton number, and operator dimension
The effects of heavy new particles beyond the Standard Model can be conveniently captured through higher-dimensional effective operators. As noted long ago by Weinberg, the amount of baryon and lepton number an operator can carry is intricately connected to its mass dimension. We derive an improved inequality for this connection and compare it to explicit operator constructions up to mass dimension 25. For the effective field theory of Standard Model plus right-handed neutrinos, our relationship is even an equality up to high mass dimension.
comment: 4 pages; to appear in PLB
♻ ☆ Exploring ultra-high energy neutrino experiments through the lens of the transport equation
We develop a first-principles formalism, based on the transport equation in the line-of-sight approximation, to link the expected number of muons at neutrino telescopes to the flux of neutrinos at the Earth's surface. We compute the distribution of muons inside Earth, arising from the up-scattering of neutrinos close to the detector, as well as from the decay of taus produced farther away. This framework allows one to account for systematic uncertainties, as well as to clarify the assumptions behind definitions commonly used in the literature, such as the effective area. We apply this formalism to analyze the high-energy muon event recorded by KM3NeT, with a reconstructed energy of $ 120^{+110}_{-60} \, \mathrm{PeV}$ and an elevation angle of $\left(0.54\pm 2.4\right)^\circ$, in comparison with the non-observation of similar events by IceCube. We find a $3.1\,\sigma$ tension between the two experiments, assuming a diffuse neutrino source with a power-law energy dependence. Combining both datasets leads to a preference for a very low number of expected events at KM3NeT, in stark contrast to the observed data. The tension increases both in the case of a diffuse source peaking at the KM3NeT energy and of a steady point source, whereas a transient source may reduce the tension down to $1.6\,\sigma$. The formalism allows one to treat potential beyond-the-Standard-Model sources of muons, and we speculate on this possibility to explain the tension.
comment: 47 pages, 16 figures, 3 tables
♻ ☆ On the Simulation of Hidden Parton Showers in the Conformal Window
We consider confining Hidden Valley/Dark Sector theories containing many dark quark flavors. These theories are in the ``conformal window'': they reach an infrared fixed point when their quarks are massless, and have unfamiliar confinement when the quark masses are non-zero but small. Their jets of hidden hadrons may be quite different from those familiar from QCD, but their details cannot currently be simulated even qualitatively. This is partly due to the use of approximations to the two-loop running coupling in existing event generators' parton showers, which are not broadly applicable across the conformal window. We argue that the exact two-loop running coupling, and a corresponding Sudakov factor employing that coupling, must be implemented in simulation packages in order to allow phenomenological studies of these theories.
comment: 24 pages, 8 figures, version accepted for publication
♻ ☆ Testing New Physics in Oscillations at a Neutrino Factory
A neutrino factory is a potential successor to the upcoming generation of neutrino oscillation experiments and a possible precursor to next-generation muon colliders. Such a machine would provide a well-characterized beam of $\nu_\mu$, $\bar\nu_\mu$, $\nu_e$, and $\bar\nu_e$ neutrinos with comparable statistics. Here we show the sensitivity of a neutrino factory to new oscillation physics scenarios such as vector neutrino non-standard interactions and CPT violation. We study two different potential setups for a neutrino factory with different assumptions on charge identification in the far detector. We find that 10 years of a neutrino factory combined with 10 years of DUNE can improve over most of the current constraints on these scenarios and even over forecasted constraints by 20 years of DUNE. Additionally, we find that a neutrino factory can break degeneracies between the standard oscillation parameters and neutrino non-standard interaction parameters present at DUNE.
comment: 29 pages, 23 figures, 1 table, comments welcome! v2: matches published version
♻ ☆ Dispersion relation of the neutrino plasma: Unifying fast, slow, and collisional instabilities
In neutrino-dense astrophysical environments, these particles exchange flavor through a coherent weak field, forming a collisionless neutrino plasma with collective flavor dynamics. Instabilities, which grow and affect the environment, may arise from neutrino-neutrino refraction alone (fast limit), vacuum energy splittings caused by masses (slow limit), or neutrino-matter scattering (collisional limit). We present a comprehensive analytical description of the dispersion relation governing these unstable modes. Treating vacuum energy splittings and collision rates as small perturbations, we construct a unified framework for fast, slow, and collisional instabilities. We classify modes into gapped, where collective excitations are already present in the fast limit but rendered unstable by slow or collisional effects, and gapless, which are purely generated by these effects. For each class, we derive approximate dispersion relations for generic energy and angle distributions, which reveal the order of magnitude of the growth rates and the nature of the instabilities without solving directly the dispersion relation. This approach confirms that slow and collisionally unstable waves generally grow much more slowly than they oscillate. Consequently, the common fast-mode approximation of local evolution within small boxes is unjustified. Even for fast modes, neglecting large-distance propagation of growing waves, as usually done, may be a poor approximation. Our unified framework provides an intuitive understanding of the linear phase of flavor evolution across all regimes and paves the way for a quasi-linear treatment of the instability's nonlinear development.
comment: 49 pages, 7 figures; added analytical discussion and numerical validation of narrow slow instabilities
♻ ☆ A critical appraisal of tests of locality and of entanglement versus non-entanglement at colliders
It has been argued more than 30 years ago that it is not possible to test locality at colliders, due to the inability to directly measure non-commutating observables such as spin components in current collider experiments. Recently, there has been a lot of phenomenological and experimental activity around testing locality via Bell-type experiments or entanglement versus non-entanglement in a collider environment. These results seem to evade the earlier no-go theorem by indirectly measuring spin correlations via their relation to angular correlations between momenta. We perform a careful study of the feasibility of such an approach. We scrutinize the relationship between spin and angular correlations in both quantum mechanics and local hidden variable theories. Our conclusion is that it is currently not possible to perform a logically coherent set of experimental measurements at colliders that would allow one to test locality or entanglement versus non-entanglement. This reaffirms the earlier no-go theorem. We stress that the no-go theorem does not apply to measurements of observables inspired from entanglement and Quantum Information Theory to test the Standard Model of particle physics.
comment: 22 pages, no figures
♻ ☆ Transverse spin polarization as a novel probe of medium-induced transverse-momentum-broadening effect
The transverse polarization of $\Lambda$ hyperons within unpolarized jets originates from the transverse-momentum-dependent (TMD) fragmentation function $D_{1T}^\perp (z, p_T, \mu^2)$. In the vacuum environment, the QCD evolution of this TMD fragmentation function is governed by the Collins-Soper equation. However, in the presence of the quark-gluon plasma (QGP) medium, the jet-medium interaction induces a transverse-momentum-broadening effect that modifies the QCD evolution. As a result, the transverse spin polarization of $\Lambda$ hyperons in relativistic heavy-ion collisions differs from that in $pp$ collisions. We demonstrate that this difference serves as a sensitive probe for studying jet-medium interaction, offering a novel perspective through the spin degree of freedom.
comment: 14 pages, 6 figures; additional references added; both transverse momentum broadening and energy loss effects are considered
Machine Learning - Statistics
☆ Formal Bayesian Transfer Learning via the Total Risk Prior
In analyses with severe data-limitations, augmenting the target dataset with information from ancillary datasets in the application domain, called source datasets, can lead to significantly improved statistical procedures. However, existing methods for this transfer learning struggle to deal with situations where the source datasets are also limited and not guaranteed to be well-aligned with the target dataset. A typical strategy is to use the empirical loss minimizer on the source data as a prior mean for the target parameters, which places the estimation of source parameters outside of the Bayesian formalism. Our key conceptual contribution is to use a risk minimizer conditional on source parameters instead. This allows us to construct a single joint prior distribution for all parameters from the source datasets as well as the target dataset. As a consequence, we benefit from full Bayesian uncertainty quantification and can perform model averaging via Gibbs sampling over indicator variables governing the inclusion of each source dataset. We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system and discuss computational techniques to scale our approach to moderately sized datasets. We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model. Finally, we demonstrate superior predictive performance relative to the frequentist baseline on a genetics application, especially when the source data are limited.
☆ Scaled Beta Models and Feature Dilution for Dynamic Ticket Pricing
A novel approach is presented for identifying distinct signatures of performing acts in the secondary ticket resale market by analyzing dynamic pricing distributions. Using a newly curated, time series dataset from the SeatGeek API, we model ticket pricing distributions as scaled Beta distributions. This enables accurate parameter estimation from incomplete statistical data using a hybrid of quantile matching and the method of moments. Incorporating the estimated $\alpha$ and $\beta$ parameters into Random Forest classifiers significantly improves pairwise artist classification accuracy, demonstrating the unique economic signatures in event pricing data. Additionally, we provide theoretical and empirical evidence that incorporating zero-variance (constant-value) features into Random Forest models acts as an implicit regularizer, enhancing feature variety and robustness. This regularization promotes deeper, more varied trees in the ensemble, improving the bias-variance tradeoff and mitigating overfitting to dominant features. These findings are validated on both the new ticket pricing dataset and the standard UCI ML handwritten digits dataset.
comment: 27 pages, 11 figures, 3 tables
☆ DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction
Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and Communications in Medicine (DICOM) files presents a significant barrier to the ethical and secure sharing of imaging datasets. This paper presents a hybrid de-identification framework developed by Impact Business Information Solutions (IBIS) that combines rule-based and AI-driven techniques, and rigorous uncertainty quantification for comprehensive PHI/PII removal from both metadata and pixel data. Our approach begins with a two-tiered rule-based system targeting explicit and inferred metadata elements, further augmented by a large language model (LLM) fine-tuned for Named Entity Recognition (NER), and trained on a suite of synthetic datasets simulating realistic clinical PHI/PII. For pixel data, we employ an uncertainty-aware Faster R-CNN model to localize embedded text, extract candidate PHI via Optical Character Recognition (OCR), and apply the NER pipeline for final redaction. Crucially, uncertainty quantification provides confidence measures for AI-based detections to enhance automation reliability and enable informed human-in-the-loop verification to manage residual risks. This uncertainty-aware deidentification framework achieves robust performance across benchmark datasets and regulatory standards, including DICOM, HIPAA, and TCIA compliance metrics. By combining scalable automation, uncertainty quantification, and rigorous quality assurance, our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research.
comment: 15 pages, 6 figures,
☆ Optimised Feature Subset Selection via Simulated Annealing
We introduce SA-FDR, a novel algorithm for $\ell_0$-norm feature selection that considers this task as a combinatorial optimisation problem and solves it by using simulated annealing to perform a global search over the space of feature subsets. The optimisation is guided by the Fisher discriminant ratio, which we use as a computationally efficient proxy for model quality in classification tasks. Our experiments, conducted on datasets with up to hundreds of thousands of samples and hundreds of features, demonstrate that SA-FDR consistently selects more compact feature subsets while achieving a high predictive accuracy. This ability to recover informative yet minimal sets of features stems from its capacity to capture inter-feature dependencies often missed by greedy optimisation approaches. As a result, SA-FDR provides a flexible and effective solution for designing interpretable models in high-dimensional settings, particularly when model sparsity, interpretability, and performance are crucial.
comment: 12 pages, 2 figures
☆ Barycentric subspace analysis of network-valued data
Certain data are naturally modeled by networks or weighted graphs, be they arterial networks or mobility networks. When there is no canonical labeling of the nodes across the dataset, we talk about unlabeled networks. In this paper, we focus on the question of dimensionality reduction for this type of data. More specifically, we address the issue of interpreting the feature subspace constructed by dimensionality reduction methods. Most existing methods for network-valued data are derived from principal component analysis (PCA) and therefore rely on subspaces generated by a set of vectors, which we identify as a major limitation in terms of interpretability. Instead, we propose to implement the method called barycentric subspace analysis (BSA), which relies on subspaces generated by a set of points. In order to provide a computationally feasible framework for BSA, we introduce a novel embedding for unlabeled networks where we replace their usual representation by equivalence classes of isomorphic networks with that by equivalence classes of cospectral networks. We then illustrate BSA on simulated and real-world datasets, and compare it to tangent PCA.
☆ Directional Ensemble Aggregation for Actor-Critics
Off-policy reinforcement learning in continuous control tasks depends critically on accurate $Q$-value estimates. Conservative aggregation over ensembles, such as taking the minimum, is commonly used to mitigate overestimation bias. However, these static rules are coarse, discard valuable information from the ensemble, and cannot adapt to task-specific needs or different learning regimes. We propose Directional Ensemble Aggregation (DEA), an aggregation method that adaptively combines $Q$-value estimates in actor-critic frameworks. DEA introduces two fully learnable directional parameters: one that modulates critic-side conservatism and another that guides actor-side policy exploration. Both parameters are learned using ensemble disagreement-weighted Bellman errors, which weight each sample solely by the direction of its Bellman error. This directional learning mechanism allows DEA to adjust conservatism and exploration in a data-driven way, adapting aggregation to both uncertainty levels and the phase of training. We evaluate DEA across continuous control benchmarks and learning regimes - from interactive to sample-efficient - and demonstrate its effectiveness over static ensemble strategies.
☆ Overcoming error-in-variable problem in data-driven model discovery by orthogonal distance regression
Despite the recent proliferation of machine learning methods like SINDy that promise automatic discovery of governing equations from time-series data, there remain significant challenges to discovering models from noisy datasets. One reason is that the linear regression underlying these methods assumes that all noise resides in the training target (the regressand), which is the time derivative, whereas the measurement noise is in the states (the regressors). Recent methods like modified-SINDy and DySMHO address this error-in-variable problem by leveraging information from the model's temporal evolution, but they are also imposing the equation as a hard constraint, which effectively assumes no error in the regressand. Without relaxation, this hard constraint prevents assimilation of data longer than Lyapunov time. Instead, the fulfilment of the model equation should be treated as a soft constraint to account for the small yet critical error introduced by numerical truncation. The uncertainties in both the regressor and the regressand invite the use of orthogonal distance regression (ODR). By incorporating ODR with the Bayesian framework for model selection, we introduce a novel method for model discovery, termed ODR-BINDy, and assess its performance against current SINDy variants using the Lorenz63, Rossler, and Van Der Pol systems as case studies. Our findings indicate that ODR-BINDy consistently outperforms all existing methods in recovering the correct model from sparse and noisy datasets. For instance, our ODR-BINDy method reliably recovers the Lorenz63 equation from data with noise contamination levels of up to 30%.
comment: 28 pages, 12 figures, prepared for the Data-driven systems and control: analysis, modelling, optimisation, and stochasticity collection in the journal Mathematics of Control, Signals, and Systems
☆ Optimal Transport Learning: Balancing Value Optimization and Fairness in Individualized Treatment Rules
Individualized treatment rules (ITRs) have gained significant attention due to their wide-ranging applications in fields such as precision medicine, ridesharing, and advertising recommendations. However, when ITRs are influenced by sensitive attributes such as race, gender, or age, they can lead to outcomes where certain groups are unfairly advantaged or disadvantaged. To address this gap, we propose a flexible approach based on the optimal transport theory, which is capable of transforming any optimal ITR into a fair ITR that ensures demographic parity. Recognizing the potential loss of value under fairness constraints, we introduce an ``improved trade-off ITR," designed to balance value optimization and fairness while accommodating varying levels of fairness through parameter adjustment. To maximize the value of the improved trade-off ITR under specific fairness levels, we propose a smoothed fairness constraint for estimating the adjustable parameter. Additionally, we establish a theoretical upper bound on the value loss for the improved trade-off ITR. We demonstrate performance of the proposed method through extensive simulation studies and application to the Next 36 entrepreneurial program dataset.
♻ ☆ Disparate Conditional Prediction in Multiclass Classifiers ICML 2025
We propose methods for auditing multiclass classifiers for fairness under multiclass equalized odds,by estimating the deviation from equalized odds when the classifier is not completely fair. We generalize to multiclass classifiers the measure of Disparate Conditional Prediction (DCP), originally suggested by Sabato & Yom-Tov (2020) for binary classifiers. DCP is defined as the fraction of the population for which the classifier predicts with conditional prediction probabilities that differ from the closest common baseline. We provide new local-optimization methods for estimating the multiclass DCPunder two different regimes,one in which the conditional confusion matrices for each protected sub-population are known, and one in which these cannot be estimated, for instance, because the classifier is inaccessible or because good-quality individual-level data is not available. These methods can be used to detect classifiers that likely treat a significant fraction of the population unfairly. Experiments demonstrate the accuracy of the methods. Code is provided at https://github.com/sivansabato/ DCPmulticlass.
comment: Published at ICML 2025
♻ ☆ Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage
Conformal prediction is a powerful distribution-free framework for constructing prediction sets with coverage guarantees. Classical methods, such as split conformal prediction, provide marginal coverage, ensuring that the prediction set contains the label of a random test point with a target probability. However, these guarantees may not hold uniformly across different subpopulations, leading to disparities in coverage. Prior work has explored coverage guarantees conditioned on events related to the covariates and label of the test point. We present Kandinsky conformal prediction, a framework that significantly expands the scope of conditional coverage guarantees. In contrast to Mondrian conformal prediction, which restricts its coverage guarantees to disjoint groups -- reminiscent of the rigid, structured grids of Piet Mondrian's art -- our framework flexibly handles overlapping and fractional group memberships defined jointly on covariates and labels, reflecting the layered, intersecting forms in Wassily Kandinsky's compositions. Our algorithm unifies and extends existing methods, encompassing covariate-based group conditional, class conditional, and Mondrian conformal prediction as special cases, while achieving a minimax-optimal high-probability conditional coverage bound. Finally, we demonstrate the practicality of our approach through empirical evaluation on real-world datasets.
♻ ☆ Improved Convergence Factor of Windowed Anderson Acceleration for Symmetric Fixed-Point Iterations
This paper studies the commonly utilized windowed Anderson acceleration (AA) algorithm for fixed-point methods, $x^{(k+1)}=q(x^{(k)})$. It provides the first proof that when the operator $q$ is linear and symmetric the windowed AA, which uses a sliding window of prior iterates, improves the root-linear convergence factor over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric Jacobian at a fixed point, a slightly modified AA algorithm is proved to have an analogous root-linear convergence factor improvement over fixed-point iterations. Simulations verify our observations. Furthermore, experiments with different data models demonstrate AA is significantly superior to the standard fixed-point methods for Tyler's M-estimation.
comment: 40 pages, 10 figures
♻ ☆ TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding
We propose TPP-SD, a novel approach that accelerates Transformer temporal point process (TPP) sampling by adapting speculative decoding (SD) techniques from language models. By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop an efficient sampling framework that leverages a smaller draft model to generate multiple candidate events, which are then verified by the larger target model in parallel. TPP-SD maintains the same output distribution as autoregressive sampling while achieving significant acceleration. Experiments on both synthetic and real datasets demonstrate that our approach produces samples from identical distributions as standard methods, but with 2-6$\times$ speedup. Our ablation studies analyze the impact of hyperparameters such as draft length and draft model size on sampling efficiency. TPP-SD bridges the gap between powerful Transformer TPP models and the practical need for rapid sequence sampling.
♻ ☆ Neural-ANOVA: Analytical Model Decomposition using Automatic Integration
The analysis of variance (ANOVA) decomposition offers a systematic method to understand the interaction effects that contribute to a specific decision output. In this paper we introduce Neural-ANOVA, an approach to decompose neural networks into the sum of lower-order models using the functional ANOVA decomposition. Our approach formulates a learning problem, which enables fast analytical evaluation of integrals over subspaces that appear in the calculation of the ANOVA decomposition. Finally, we conduct numerical experiments to provide insights into the approximation properties compared to other regression approaches from the literature.
comment: 6 pages, 3 figures, 3 tables, accepted for publication at MLSP 2025
♻ ☆ Proper scoring rules for estimation and forecast evaluation
Proper scoring rules have been a subject of growing interest in recent years, not only as tools for evaluation of probabilistic forecasts but also as methods for estimating probability distributions. In this article, we review the mathematical foundations of proper scoring rules including general characterization results and important families of scoring rules. We discuss their role in statistics and machine learning for estimation and forecast evaluation. Furthermore, we comment on interesting developments of their usage in applications.
♻ ☆ Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion Modeling
Diffusion models are a state-of-the-art generative modeling framework that transform noise to images via Langevin sampling, guided by the score, which is the gradient of the logarithm of the data distribution. Recent works have shown empirically that the generation quality can be improved when guided by classifier network, which is typically the discriminator trained in a generative adversarial network (GAN) setting. In this paper, we propose a theoretical framework to analyze the effect of the GAN discriminator on Langevin-based sampling, and show that the IPM-GAN optimization can be seen as one of smoothed score-matching, wherein the scores of the data and the generator distributions are convolved with the kernel function associated with the IPM. The proposed approach serves to unify score-based training and optimization of IPM-GANs. Based on these insights, we demonstrate that closed-form kernel-based discriminator guidance, results in improvements (in terms of CLIP-FID and KID metrics) when applied atop baseline diffusion models. We demonstrate these results on the denoising diffusion implicit model (DDIM) and latent diffusion model (LDM) settings on various standard datasets. We also show that the proposed approach can be combined with existing accelerated-diffusion techniques to improve latent-space image generation.
♻ ☆ GrokAlign: Geometric Characterisation and Acceleration of Grokking
A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying webpage (https://thomaswalker1.github.io/blog/grokalign.html) and code (https://github.com/ThomasWalker1/grokalign).
comment: 23 pages, 11 figures, 3 tables
♻ ☆ Tensor Product Neural Networks for Functional ANOVA Model
Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions (commonly referred to as components), is one of the most popular tools for interpretable AI, and recently, various neural networks have been developed for estimating each component in the functional ANOVA model. However, such neural networks are highly unstable when estimating each component since the components themselves are not uniquely defined. That is, there are multiple functional ANOVA decompositions for a given function. In this paper, we propose a novel neural network which guarantees a unique functional ANOVA decomposition and thus is able to estimate each component stably and accurately. We call our proposed neural network ANOVA Tensor Product Neural Network (ANOVA-TPNN) since it is motivated by the tensor product basis expansion. Theoretically, we prove that ANOVA-TPNN can approximate any smooth function well. Empirically, we show that ANOVA-TPNN provide much more stable estimation of each component and thus much more stable interpretation when training data and initial values of the model parameters vary than existing neural networks do.
comment: 45 pages
Machine Learning - Computer Science
☆ SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
Concept Bottleneck Models (CBMs) and other concept-based interpretable models show great promise for making AI applications more transparent, which is essential in fields like medicine. Despite their success, we demonstrate that CBMs struggle to reliably identify the correct concepts under distribution shifts. To assess the robustness of CBMs to concept variations, we introduce SUB: a fine-grained image and concept benchmark containing 38,400 synthetic images based on the CUB dataset. To create SUB, we select a CUB subset of 33 bird classes and 45 concepts to generate images which substitute a specific concept, such as wing color or belly pattern. We introduce a novel Tied Diffusion Guidance (TDG) method to precisely control generated images, where noise sharing for two parallel denoising processes ensures that both the correct bird class and the correct attribute are generated. This novel benchmark enables rigorous evaluation of CBMs and similar interpretable models, contributing to the development of more robust methods. Our code is available at https://github.com/ExplainableML/sub and the dataset at http://huggingface.co/datasets/Jessica-bader/SUB.
comment: Accepted at ICCV 2025
☆ XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding
Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce XSpecMesh, a quality-preserving acceleration method for auto-regressive mesh generation models. XSpecMesh employs a lightweight, multi-head speculative decoding scheme to predict multiple tokens in parallel within a single forward pass, thereby accelerating inference. We further propose a verification and resampling strategy: the backbone model verifies each predicted token and resamples any tokens that do not meet the quality criteria. In addition, we propose a distillation strategy that trains the lightweight decoding heads by distilling from the backbone model, encouraging their prediction distributions to align and improving the success rate of speculative predictions. Extensive experiments demonstrate that our method achieves a 1.7x speedup without sacrificing generation quality. Our code will be released.
☆ SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model
AI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the outcomes of their actions and plans. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of optimal agent in any environment, \modelname overcomes the limitations of autoregressive reasoning by introducing a world model for planning via simulation. The generalized world model is implemented using LLM, which can flexibly plan in a wide range of environments using the concept-rich latent space of natural language. Experiments on difficult web browsing tasks show that \modelname improves the success of flight search from 0\% to 32.2\%. World-model-based planning, in particular, shows consistent advantage of up to 124\% over autoregressive planning, demonstrating the advantage of world model simulation as a reasoning paradigm. We are excited about the possibility for training a single, general agent model based on LLMs that can act superintelligently in all environments. To start, we make SimuRA, a web-browsing agent built on \modelname with pretrained LLMs, available as a research demo for public testing.
☆ Consensus-Driven Active Model Selection
The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of model selection is traditionally answered by collecting and annotating a validation dataset -- a costly and time-intensive process. We propose a method for active model selection, using predictions from candidate models to prioritize the labeling of test data points that efficiently differentiate the best candidate. Our method, CODA, performs consensus-driven active model selection by modeling relationships between classifiers, categories, and data points within a probabilistic framework. The framework uses the consensus and disagreement between models in the candidate pool to guide the label acquisition process, and Bayesian inference to update beliefs about which model is best as more information is collected. We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios. CODA outperforms existing methods for active model selection significantly, reducing the annotation effort required to discover the best model by upwards of 70% compared to the previous state-of-the-art. Code and data are available at https://github.com/justinkay/coda.
comment: ICCV 2025 Highlight. 16 pages, 8 figures
☆ Formal Bayesian Transfer Learning via the Total Risk Prior
In analyses with severe data-limitations, augmenting the target dataset with information from ancillary datasets in the application domain, called source datasets, can lead to significantly improved statistical procedures. However, existing methods for this transfer learning struggle to deal with situations where the source datasets are also limited and not guaranteed to be well-aligned with the target dataset. A typical strategy is to use the empirical loss minimizer on the source data as a prior mean for the target parameters, which places the estimation of source parameters outside of the Bayesian formalism. Our key conceptual contribution is to use a risk minimizer conditional on source parameters instead. This allows us to construct a single joint prior distribution for all parameters from the source datasets as well as the target dataset. As a consequence, we benefit from full Bayesian uncertainty quantification and can perform model averaging via Gibbs sampling over indicator variables governing the inclusion of each source dataset. We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system and discuss computational techniques to scale our approach to moderately sized datasets. We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model. Finally, we demonstrate superior predictive performance relative to the frequentist baseline on a genetics application, especially when the source data are limited.
☆ Scaled Beta Models and Feature Dilution for Dynamic Ticket Pricing
A novel approach is presented for identifying distinct signatures of performing acts in the secondary ticket resale market by analyzing dynamic pricing distributions. Using a newly curated, time series dataset from the SeatGeek API, we model ticket pricing distributions as scaled Beta distributions. This enables accurate parameter estimation from incomplete statistical data using a hybrid of quantile matching and the method of moments. Incorporating the estimated $\alpha$ and $\beta$ parameters into Random Forest classifiers significantly improves pairwise artist classification accuracy, demonstrating the unique economic signatures in event pricing data. Additionally, we provide theoretical and empirical evidence that incorporating zero-variance (constant-value) features into Random Forest models acts as an implicit regularizer, enhancing feature variety and robustness. This regularization promotes deeper, more varied trees in the ensemble, improving the bias-variance tradeoff and mitigating overfitting to dominant features. These findings are validated on both the new ticket pricing dataset and the standard UCI ML handwritten digits dataset.
comment: 27 pages, 11 figures, 3 tables
☆ Improving annotator selection in Active Learning using a mood and fatigue-aware Recommender System
This study centers on overcoming the challenge of selecting the best annotators for each query in Active Learning (AL), with the objective of minimizing misclassifications. AL recognizes the challenges related to cost and time when acquiring labeled data, and decreases the number of labeled data needed. Nevertheless, there is still the necessity to reduce annotation errors, aiming to be as efficient as possible, to achieve the expected accuracy faster. Most strategies for query-annotator pairs do not consider internal factors that affect productivity, such as mood, attention, motivation, and fatigue levels. This work addresses this gap in the existing literature, by not only considering how the internal factors influence annotators (mood and fatigue levels) but also presenting a new query-annotator pair strategy, using a Knowledge-Based Recommendation System (RS). The RS ranks the available annotators, allowing to choose one or more to label the queried instance using their past accuracy values, and their mood and fatigue levels, as well as information about the instance queried. This work bases itself on existing literature on mood and fatigue influence on human performance, simulating annotators in a realistic manner, and predicting their performance with the RS. The results show that considering past accuracy values, as well as mood and fatigue levels reduces the number of annotation errors made by the annotators, and the uncertainty of the model through its training, when compared to not using internal factors. Accuracy and F1-score values were also better in the proposed approach, despite not being as substantial as the aforementioned. The methodologies and findings presented in this study begin to explore the open challenge of human cognitive factors affecting AL.
☆ Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs
Knowledge graphs (KGs) often contain sufficient information to support the inference of new facts. Identifying logical rules not only improves the completeness of a knowledge graph but also enables the detection of potential errors, reveals subtle data patterns, and enhances the overall capacity for reasoning and interpretation. However, the complexity of such rules, combined with the unique labeling conventions of each KG, can make them difficult for humans to understand. In this paper, we explore the potential of large language models to generate natural language explanations for logical rules. Specifically, we extract logical rules using the AMIE 3.5.1 rule discovery algorithm from the benchmark dataset FB15k-237 and two large-scale datasets, FB-CVT-REV and FB+CVT-REV. We examine various prompting strategies, including zero- and few-shot prompting, including variable entity types, and chain-of-thought reasoning. We conduct a comprehensive human evaluation of the generated explanations based on correctness, clarity, and hallucination, and also assess the use of large language models as automatic judges. Our results demonstrate promising performance in terms of explanation correctness and clarity, although several challenges remain for future research. All scripts and data used in this study are publicly available at https://github.com/idirlab/KGRule2NL}{https://github.com/idirlab/KGRule2NL.
☆ DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction
Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and Communications in Medicine (DICOM) files presents a significant barrier to the ethical and secure sharing of imaging datasets. This paper presents a hybrid de-identification framework developed by Impact Business Information Solutions (IBIS) that combines rule-based and AI-driven techniques, and rigorous uncertainty quantification for comprehensive PHI/PII removal from both metadata and pixel data. Our approach begins with a two-tiered rule-based system targeting explicit and inferred metadata elements, further augmented by a large language model (LLM) fine-tuned for Named Entity Recognition (NER), and trained on a suite of synthetic datasets simulating realistic clinical PHI/PII. For pixel data, we employ an uncertainty-aware Faster R-CNN model to localize embedded text, extract candidate PHI via Optical Character Recognition (OCR), and apply the NER pipeline for final redaction. Crucially, uncertainty quantification provides confidence measures for AI-based detections to enhance automation reliability and enable informed human-in-the-loop verification to manage residual risks. This uncertainty-aware deidentification framework achieves robust performance across benchmark datasets and regulatory standards, including DICOM, HIPAA, and TCIA compliance metrics. By combining scalable automation, uncertainty quantification, and rigorous quality assurance, our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research.
comment: 15 pages, 6 figures,
☆ Anomalous Samples for Few-Shot Anomaly Detection
Several anomaly detection and classification methods rely on large amounts of non-anomalous or "normal" samples under the assump- tion that anomalous data is typically harder to acquire. This hypothesis becomes questionable in Few-Shot settings, where as little as one anno- tated sample can make a significant difference. In this paper, we tackle the question of utilizing anomalous samples in training a model for bi- nary anomaly classification. We propose a methodology that incorporates anomalous samples in a multi-score anomaly detection score leveraging recent Zero-Shot and memory-based techniques. We compare the utility of anomalous samples to that of regular samples and study the benefits and limitations of each. In addition, we propose an augmentation-based validation technique to optimize the aggregation of the different anomaly scores and demonstrate its effectiveness on popular industrial anomaly detection datasets.
☆ villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions, an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO, as well as on two real-world robot setups including gripper and dexterous hand manipulation. We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.
comment: Project page: https://aka.ms/villa-x
♻ ☆ GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.
comment: 51 pages (13 pages for the main text, 9 pages for references, and 29 pages for the appendix)
♻ ☆ Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic
Reinforcement Learning (RL) has been widely used for packet routing in communication networks, but traditional RL methods rely on the Markov assumption that the current state contains all necessary information for decision-making. In reality, internet traffic is non-Markovian, and past states do influence routing performance. Moreover, common deep RL approaches use function approximators, such as neural networks, that do not model the spatial structure in network topologies. To address these shortcomings, we design a network environment with non-Markovian traffic and introduce a spatial-temporal RL (STRL) framework for packet routing. Our approach outperforms traditional baselines by more than 19% during training and 7% for inference despite a change in network topology.
♻ ☆ A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values
Reinforcement learning agents can achieve super-human performance in complex decision-making tasks, but their behaviour is often difficult to understand and explain. This lack of explanation limits deployment, especially in safety-critical settings where understanding and trust are essential. We identify three core explanatory targets that together provide a comprehensive view of reinforcement learning agents: behaviour, outcomes, and predictions. We develop a unified theoretical framework for explaining these three elements of reinforcement learning agents through the influence of individual features that the agent observes in its environment. We derive feature influences by using Shapley values, which collectively and uniquely satisfy a set of well-motivated axioms for fair and consistent credit assignment. The proposed approach, Shapley Values for Explaining Reinforcement Learning (SVERL), provides a single theoretical framework to comprehensively and meaningfully explain reinforcement learning agents. It yields explanations with precise semantics that are not only interpretable but also mathematically justified, enabling us to identify and correct conceptual issues in prior explanations. Through illustrative examples, we show how SVERL produces useful, intuitive explanations of agent behaviour, outcomes, and predictions, which are not apparent from observing agent behaviour alone.
♻ ☆ Intersectional Divergence: Measuring Fairness in Regression
Fairness in machine learning research is commonly framed in the context of classification tasks, leaving critical gaps in regression. In this paper, we propose a novel approach to measure intersectional fairness in regression tasks, going beyond the focus on single protected attributes from existing work to consider combinations of all protected attributes. Furthermore, we contend that it is insufficient to measure the average error of groups without regard for imbalanced domain preferences. Accordingly, we propose Intersectional Divergence (ID) as the first fairness measure for regression tasks that 1) describes fair model behavior across multiple protected attributes and 2) differentiates the impact of predictions in target ranges most relevant to users. We extend our proposal demonstrating how ID can be adapted into a loss function, IDLoss, that satisfies convergence guarantees and has piecewise smooth properties that enable practical optimization. Through an extensive experimental evaluation, we demonstrate how ID allows unique insights into model behavior and fairness, and how incorporating IDLoss into optimization can considerably improve single-attribute and intersectional model fairness while maintaining a competitive balance in predictive performance.
♻ ☆ Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies
This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).
comment: 8 pages
♻ ☆ Quantum Transfer Learning for MNIST Classification Using a Hybrid Quantum-Classical Approach
We implement a hybrid quantum-classical model for image classification that compresses MNIST digit images into a low-dimensional feature space and then maps these features onto a 5-qubit quantum state. First, an autoencoder compresses each $28\times28$ image (784 pixels) into a 64-dimensional latent vector, preserving salient features of the digit with minimal reconstruction error. We further reduce the latent representation to 5 principal components using Principal Component Analysis (PCA), to match the 5 available qubits. These 5 features are encoded as rotation angles in a quantum circuit with 5 qubits. The quantum feature map applies single-qubit rotations ($R_y$ gates) proportional to the feature values, followed by a Hadamard gate and a cascade of entangling CNOT gates to produce a non-product entangled state. Measuring the 5-qubit state yields a 32-dimensional probability distribution over basis outcomes, which serves as a quantum-enhanced feature vector for classification. A classical neural network with a softmax output is then trained on these 32-dimensional quantum feature vectors to predict the digit class. We evaluate the hybrid model on the MNIST dataset and compare it to a purely classical baseline that uses the 64-dimensional autoencoder latent features for classification. The results show that the hybrid model can successfully classify digits, demonstrating the feasibility of integrating quantum computing in the classification pipeline, although its accuracy (about 75\% on test data) currently falls below the classical baseline (about 98\% on the same compressed data).
♻ ☆ GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network
Attributed graph clustering holds significant importance in modern data analysis. However, due to the complexity of graph data and the heterogeneity of node attributes, leveraging graph information for clustering remains challenging. To address this, we propose a novel deep graph clustering model, GCL-GCN, specifically designed to address the limitations of existing models in capturing local dependencies and complex structures when dealing with sparse and heterogeneous graph data. GCL-GCN introduces an innovative Graphormer module that combines centrality encoding and spatial relationships, effectively capturing both global and local information between nodes, thereby enhancing the quality of node representations. Additionally, we propose a novel contrastive learning module that significantly enhances the discriminative power of feature representations. In the pre-training phase, this module increases feature distinction through contrastive learning on the original feature matrix, ensuring more identifiable initial representations for subsequent graph convolution and clustering tasks. Extensive experimental results on six datasets demonstrate that GCL-GCN outperforms 14 advanced methods in terms of clustering quality and robustness. Specifically, on the Cora dataset, it improves ACC, NMI, and ARI by 4.94%, 13.01%, and 10.97%, respectively, compared to the primary comparison method MBN.
comment: The source code for this study is available at https://github.com/YF-W/GCL-GCN
♻ ☆ Disparate Conditional Prediction in Multiclass Classifiers ICML 2025
We propose methods for auditing multiclass classifiers for fairness under multiclass equalized odds,by estimating the deviation from equalized odds when the classifier is not completely fair. We generalize to multiclass classifiers the measure of Disparate Conditional Prediction (DCP), originally suggested by Sabato & Yom-Tov (2020) for binary classifiers. DCP is defined as the fraction of the population for which the classifier predicts with conditional prediction probabilities that differ from the closest common baseline. We provide new local-optimization methods for estimating the multiclass DCPunder two different regimes,one in which the conditional confusion matrices for each protected sub-population are known, and one in which these cannot be estimated, for instance, because the classifier is inaccessible or because good-quality individual-level data is not available. These methods can be used to detect classifiers that likely treat a significant fraction of the population unfairly. Experiments demonstrate the accuracy of the methods. Code is provided at https://github.com/sivansabato/ DCPmulticlass.
comment: Published at ICML 2025
♻ ☆ Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks
Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks. However, direct downloading of these models for fine-tuning on the ground is impeded by privacy concerns and limited bandwidth. Satellite federated learning (FL) offers a solution by enabling model fine-tuning directly on-board satellites and aggregating model updates without data downloading. Nevertheless, for large foundation models, the computational capacity of satellites is insufficient to support effective on-board fine-tuning in traditional satellite FL frameworks. To address these challenges, we propose a satellite-ground collaborative federated fine-tuning framework. The key of the framework lies in how to reasonably decompose and allocate model components to alleviate insufficient on-board computation capabilities. During fine-tuning, satellites exchange intermediate results with ground stations or other satellites for forward propagation and back propagation, which brings communication challenges due to the special communication topology of space transmission networks, such as intermittent satellite-ground communication, short duration of satellite-ground communication windows, and unstable inter-orbit inter-satellite links (ISLs). To reduce transmission delays, we further introduce tailored communication strategies that integrate both communication and computing resources. Specifically, we propose a parallel intra-orbit communication strategy, a topology-aware satellite-ground communication strategy, and a latency-minimalization inter-orbit communication strategy to reduce space communication costs. Simulation results demonstrate significant reductions in training time with improvements of approximately 33%.
Programming Languages
☆ SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy
We introduce a neural network layer API and library for sequence modeling, designed for easy creation of sequence models that can be executed both layer-by-layer (e.g., teacher-forced training) and step-by-step (e.g., autoregressive sampling). To achieve this, layers define an explicit representation of their state over time (e.g., a Transformer KV cache, a convolution buffer, an RNN hidden state), and a step method that evolves that state, tested to give identical results to a stateless layer-wise invocation. This and other aspects of the SequenceLayers contract enables complex models to be immediately streamable, mitigates a wide range of common bugs arising in both streaming and parallel sequence processing, and can be implemented in any deep learning library. A composable and declarative API, along with a comprehensive suite of layers and combinators, streamlines the construction of production-scale models from simple streamable components while preserving strong correctness guarantees. Our current implementations of SequenceLayers (JAX, TensorFlow 2) are available at https://github.com/google/sequence-layers.
☆ Kernel-FFI: Transparent Foreign Function Interfaces for Interactive Notebooks
Foreign Function Interfaces (FFIs) are essential for enabling interoperability between programming languages, yet existing FFI solutions are ill-suited for the dynamic, interactive workflows prevalent in modern notebook environments such as Jupyter. Current approaches require extensive manual configuration, introduce significant boilerplate, and often lack support for recursive calls and object-oriented programming (OOP) constructs-features critical for productive, multi-language development. We present Kernel-FFI, a transparent, language-agnostic framework that enables seamless cross-language function calls and object manipulation within interactive notebooks. Kernel-FFI employs source-level transformation to automatically rewrite cross-language invocations, eliminating the need for manual bindings or boilerplate. Kernel-FFI provides robust support for OOP by enabling foreign object referencing and automatic resource management across language boundaries. Furthermore, to address the blocking nature of Jupyter kernels and support recursive and asynchronous foreign calls, we introduce a novel side-channel communication mechanism. Our tool will be open-sourced and available at https://codepod.io/docs/kernel-ffi
☆ NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational Functions
Sparsity detection in black-box functions enables significant computational speedups in gradient-based optimization through Jacobian compression, but existing finite-difference methods suffer from false negatives due to coincidental zero gradients. These false negatives can silently corrupt gradient calculations, leading to difficult-to-diagnose errors. We introduce NaN-propagation, which exploits the universal contamination property of IEEE 754 Not-a-Number floating-point values to trace input-output dependencies through floating-point numerical computations. By systematically contaminating inputs with NaN and observing which outputs become NaN, the method reconstructs conservative sparsity patterns that eliminate false negatives. We demonstrate the approach on an aerospace wing weight model, achieving a 1.52x speedup while detecting dozens of dependencies missed by conventional methods -- a significant improvement since gradient computation is the bottleneck in many optimization workflows. The technique leverages IEEE 754 compliance to work across programming languages and math libraries without modifying existing black-box codes. Advanced strategies including NaN payload encoding enable faster-than-linear time complexity, improving upon existing black-box sparsity detection methods. Practical algorithms are also proposed to mitigate challenges from branching code execution common in engineering applications.
♻ ☆ A Compute-Matched Re-Evaluation of TroVE on MATH
Reusing established theorems and formulas is central to mathematical problem solving, serving as essential building blocks for tackling increasingly complex challenges. Recent work, TroVE, argues that code-generating Large Language Models (LLMs) can benefit similarly on the MATH benchmark by inducing and reusing higher-level toolboxes. By allocating computational budget across an ensemble of three modes -- directly generating code, creating tools, and reusing tools -- TroVE claims to outperform a PRIMITIVE baseline that only performs direct generation. However, recent analysis (Berlot-Attwell et al., 2024) casts doubt on these gains, noting that the tools created are often trivial or rarely reused, suggesting that improvements may stem from self-consistency or self-correction. In this work, we re-evaluate TroVE on MATH, analyze the impact of each of its modes, and show that its benefit does not come from these mechanisms, but simply from a higher computational budget spent for TroVE compared to PRIMITIVE. To this end, we also perform a small correction in the original implementation of TroVE's selection mechanism, boosting TroVE's performance on MATH by 3\% in accuracy. After matching for compute, the benefit of TroVE reduces to a marginal improvement of 1\%, suggesting that this toolbox approach does not provide a significant benefit on MATH.
♻ ☆ CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
Large Language Models (LLMs) have demonstrated exceptional performance in code generation tasks and have become indispensable programming assistants for developers. However, existing code generation benchmarks primarily assess the functional correctness of code generated by LLMs in single-turn interactions. They offer limited insight into LLMs' abilities to generate code that strictly follows users' instructions in multi-turn interaction scenarios. In this paper, we introduce CodeIF-Bench, a benchmark for evaluating the instruction-following capabilities of LLMs in interactive code generation. Specifically, CodeIF-Bench incorporates nine types of verifiable instructions aligned with the real-world software development requirements, which can be independently and objectively validated through specified test cases, facilitating the evaluation of instruction-following capability in multi-turn interactions. In both \textit{Static Conversation} and \textit{Dynamic Conversation} settings, we evaluate the performance of 7 state-of-the-art LLMs and summarize the important factors influencing the instruction-following ability of LLMs in multi-turn interactions, as well as potential directions for improvement.
♻ ☆ CodePod: A Language-Agnostic Hierarchical Scoping System for Interactive Development
Interactive development environments like Jupyter Notebooks enable incremental coding through cells with immediate feedback, but their linear structure and global namespace limit scalability for large software projects. We present CodePod, a hierarchical extension of Jupyter that introduces a novel scoped execution model with formal semantics. Our key contribution is a language-agnostic runtime system that performs source-level transformations to implement hierarchical scoping rules, enabling true incremental evaluation across nested modules without requiring language-specific kernel modifications. We formalize the scoping semantics as a mathematical framework with precise visibility relations and prove key properties including uniqueness of symbol resolution and correctness of the resolution algorithm. A qualitative user study with seven senior developers demonstrates that CodePod enables significant improvements in project scalability compared to Jupyter, with notable reductions in navigation effort. We validate the system's effectiveness on large-scale projects with thousands of lines of code, demonstrating its applicability beyond traditional notebook boundaries. Our tool is open-source and available at https://codepod.io
Machine Learning - Statistics
♻ ☆ AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction
Machine learning interpretation (MLI) has primarily been leveraged to build clinician trust and uncover actionable insights in EHRs. However, the intrinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup-specific modeling. We propose AdaptHetero, a novel MLI-driven framework that transforms interpretability insights into actionable guidance for tailoring model training and evaluation across subpopulations within individual hospital systems. Evaluated on three large-scale EHR datasets: GOSSIS-1-eICU, WiDS, and MIMIC-IV, AdaptHetero consistently identifies heterogeneous model behaviors in predicting ICU mortality, in-hospital death, and hidden hypoxemia. By integrating SHAP-based interpretation and unsupervised clustering, the framework enhances the identification of clinically meaningful subgroup-specific characteristics, leading to improved predictive performance and optimized clinical deployment.
comment: 12 pages, 4 figures
♻ ☆ Two-dimensional Parallel Tempering for Constrained Optimization
Sampling Boltzmann probability distributions plays a key role in machine learning and optimization, motivating the design of hardware accelerators such as Ising machines. While the Ising model can in principle encode arbitrary optimization problems, practical implementations are often hindered by soft constraints that either slow down mixing when too strong, or fail to enforce feasibility when too weak. We introduce a two-dimensional extension of the powerful parallel tempering algorithm (PT) that addresses this challenge by adding a second dimension of replicas interpolating the penalty strengths. This scheme ensures constraint satisfaction in the final replicas, analogous to low-energy states at low temperature. The resulting two-dimensional parallel tempering algorithm (2D-PT) improves mixing in heavily constrained replicas and eliminates the need to explicitly tune the penalty strength. In a representative example of graph sparsification with copy constraints, 2D-PT achieves near-ideal mixing, with Kullback-Leibler divergence decaying as O(1/t). When applied to sparsified Wishart instances, 2D-PT yields orders of magnitude speedup over conventional PT with the same number of replicas. The method applies broadly to constrained Ising problems and can be deployed on existing Ising machines.
comment: Added references in Introduction
♻ ☆ Deciphering interventional dynamical causality from non-intervention complex systems
Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. Delay-embedding technique provides a promising approach. In this study, we propose a framework named Interventional Dynamical Causality (IntDC) in contrast to the traditional Constructive Dynamical Causality (ConDC). ConDC, including Granger causality, transfer entropy and convergence of cross-mapping, measures the causality by constructing a dynamical model without considering interventions. A computational criterion, Interventional Embedding Entropy (IEE), is proposed to measure causal strengths in an interventional manner. IEE is an intervened causal information flow but in the delay-embedding space. Further, the IEE theoretically and numerically enables the deciphering of IntDC solely from observational (non-interventional) time-series data, without requiring any knowledge of dynamical models or real interventions in the considered system. In particular, IEE can be applied to rank causal effects according to their importance and construct causal networks from data. We conducted numerical experiments to demonstrate that IEE can find causal edges accurately, eliminate effects of confounding, and quantify causal strength robustly over traditional indices. We also applied IEE to real-world tasks. IEE performed as an accurate and robust tool for causal analyses solely from the observational data. The IntDC framework and IEE algorithm provide an efficient approach to the study of causality from time series in diverse non-intervention complex systems.
Programming Languages
☆ Abstractions of Sequences, Functions and Operators
We present theoretical and practical results on the order theory of lattices of functions, focusing on Galois connections that abstract (sets of) functions - a topic known as higher-order abstract interpretation. We are motivated by the challenge of inferring closed-form bounds on functions which are defined recursively, i.e. as the fixed point of an operator or, equivalently, as the solution to a functional equation. This has multiple applications in program analysis (e.g. cost analysis, loop acceleration, declarative language analysis) and in hybrid systems governed by differential equations. Our main contribution is a new family of constraint-based abstract domains for abstracting numerical functions, B-bound domains, which abstract a function f by a conjunction of bounds from a preselected set of boundary functions. They allow inferring highly non-linear numerical invariants, which classical numerical abstract domains struggle with. We uncover a convexity property in the constraint space that simplifies, and, in some cases, fully automates, transfer function design. We also introduce domain abstraction, a functor that lifts arbitrary mappings in value space to Galois connections in function space. This supports abstraction from symbolic to numerical functions (i.e. size abstraction), and enables dimensionality reduction of equations. We base our constructions of transfer functions on a simple operator language, starting with sequences, and extending to more general functions, including multivariate, piecewise, and non-discrete domains.
comment: Under consideration for publication in STTT
♻ ☆ Place Capability Graphs: A General-Purpose Model of Rust's Ownership and Borrowing Guarantees
Rust's novel type system has proved an attractive target for verification and program analysis tools, due to the rich guarantees it provides for controlling aliasing and mutability. However, fully understanding, extracting and exploiting these guarantees is subtle and challenging: existing models for Rust's type checking either support a smaller idealised language disconnected from real-world Rust code, or come with severe limitations in terms of precise modelling of Rust borrows, composite types storing them, function signatures and loops. In this paper, we present a novel model of Rust's type-checking called Place Capability Graphs, which lifts these limitations, and which can be directly calculated from the Rust compiler's own programmatic representations and analyses. We demonstrate that our model supports over 97% of Rust functions in the most popular public crates, and show its suitability as a general-purpose basis for verification and program analysis tools by developing promising new prototype versions of the existing Flowistry and Prusti tools.
♻ ☆ The Algebra of Patterns (Extended Version)
Pattern matching is a popular feature in functional, imperative and object-oriented programming languages. Language designers should therefore invest effort in a good design for pattern matching. Most languages choose a first-match semantics for pattern matching; that is, clauses are tried in the order in which they appear in the program until the first one matches. As a consequence, the order in which the clauses appear cannot be arbitrarily changed, which results in a less declarative programming model. The declarative alternative to this is an order-independent semantics for pattern matching, which is not implemented in most programming languages since it requires more verbose patterns. The reason for this verbosity is that the syntax of patterns is usually not expressive enough to express the complement of a pattern. In this paper, we show a principled way to make order-independent pattern matching practical. Our solution consists of two parts: First, we introduce a boolean algebra of patterns which can express the complement of a pattern. Second, we introduce default clauses to pattern matches. These default clauses capture the essential idea of a fallthrough case without sacrificing the property of order-independence.
comment: This revision fixes typos in rules P-Inl, P-Inr and in theorem 13
♻ ☆ Floating-Point Neural Networks Are Provably Robust Universal Approximators
The classical universal approximation (UA) theorem for neural networks establishes mild conditions under which a feedforward neural network can approximate a continuous function $f$ with arbitrary accuracy. A recent result shows that neural networks also enjoy a more general interval universal approximation (IUA) theorem, in the sense that the abstract interpretation semantics of the network using the interval domain can approximate the direct image map of $f$ (i.e., the result of applying $f$ to a set of inputs) with arbitrary accuracy. These theorems, however, rest on the unrealistic assumption that the neural network computes over infinitely precise real numbers, whereas their software implementations in practice compute over finite-precision floating-point numbers. An open question is whether the IUA theorem still holds in the floating-point setting. This paper introduces the first IUA theorem for floating-point neural networks that proves their remarkable ability to perfectly capture the direct image map of any rounded target function $f$, showing no limits exist on their expressiveness. Our IUA theorem in the floating-point setting exhibits material differences from the real-valued setting, which reflects the fundamental distinctions between these two computational models. This theorem also implies surprising corollaries, which include (i) the existence of provably robust floating-point neural networks; and (ii) the computational completeness of the class of straight-line programs that use only floating-point additions and multiplications for the class of all floating-point programs that halt.
comment: 70 pages, 4 figures. Appeared in CAV 2025
Programming Languages
☆ Composable Effect Handling for Programming LLM-integrated Scripts
Implementing LLM-integrated scripts introduces challenges in modularity and performance, as scripts are often coupled to specific LLM implementations and fail to exploit parallelization opportunities. This paper proposes using composable effect handling to separate workflow logic from effectful operations, such as LLM calls, I/O, and concurrency, enabling modularity without sacrificing the opportunity for performance optimization. By treating these operations as abstract interfaces and discharging them via effect handlers, this paper shows that scripts can achieve significant speedups (e.g., 10$\times$ in a Tree-of-Thoughts case study) without compromising modularity. This paper aims to promote composable effect handling as a programming style for LLM scripting.
☆ Fixed-Point-Oriented Programming: A Concise and Elegant Paradigm
Fixed-Point-Oriented Programming (FPOP) is an emerging paradigm designed to streamline the implementation of problems involving self-referential computations. These include graph algorithms, static analysis, parsing, and distributed computing-domains that traditionally require complex and tricky-to-implement work-queue algorithms. Existing programming paradigms lack direct support for these inherently fixed-point computations, leading to inefficient and error-prone implementations. This white paper explores the potential of the FPOP paradigm, which offers a high-level abstraction that enables concise and expressive problem formulations. By leveraging structured inference rules and user-directed optimizations, FPOP allows developers to write declarative specifications while the compiler ensures efficient execution. It not only reduces implementation complexity for programmers but also enhances adaptability, making it easier for programmers to explore alternative solutions and optimizations without modifying the core logic of their program. We demonstrate how FPOP simplifies algorithm implementation, improves maintainability, and enables rapid prototyping by allowing problems to be clearly and concisely expressed. For example, the graph distance problem can be expressed in only two executable lines of code with FPOP, while it takes an order of magnitude more code in other paradigms. By bridging the gap between theoretical fixed-point formulations and practical implementations, we aim to foster further research and adoption of this paradigm.
♻ ☆ Rule-Based Graph Programs Matching the Time Complexity of Imperative Algorithms
We report on recent advances in rule-based graph programming, which allow us to match the time complexity of some fundamental imperative graph algorithms. In general, achieving the time complexity of graph algorithms implemented in conventional languages using a rule-based graph-transformation language is challenging due to the cost of graph matching. Previous work demonstrated that with rooted rules, certain algorithms can be implemented in the graph programming language GP 2 such that their runtime matches the time complexity of imperative implementations. However, this required input graphs to have a bounded node degree and (for some algorithms) to be connected. In this paper, we overcome these limitations by enhancing the graph data structure generated by the GP 2 compiler and exploiting the new structure in programs. We present three case studies: the first program checks whether input graphs are connected, the second program checks whether input graphs are acyclic, and the third program solves the single-source shortest-paths problem for graphs with integer edge-weights. The first two programs run in linear time on (possibly disconnected) input graphs with arbitrary node degrees. The third program runs in time $O(nm)$ on arbitrary input graphs, matching the time complexity of imperative implementations of the Bellman-Ford algorithm. For each program, we formally prove its correctness and time complexity, and provide runtime experiments on various graph classes.
Programming Languages
☆ One Weird Trick to Untie Landin's Knot
In this work, we explore Landin's Knot, which is understood as a pattern for encoding general recursion, including non-termination, that is possible after adding higher-order references to an otherwise terminating language. We observe that this isn't always true -- higher-order references, by themselves, don't lead to non-termination. The key insight is that Landin's Knot relies not primarily on references storing functions, but on unrestricted quantification over a function's environment. We show this through a closure converted language, in which the function's environment is made explicit and hides the type of the environment through impredicative quantification. Once references are added, this impredicative quantification can be exploited to encode recursion. We conjecture that by restricting the quantification over the environment, higher-order references can be safely added to terminating languages, without resorting to more complex type systems such as linearity, and without restricting references from storing functions.
☆ TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
Type inference for dynamic languages like Python is a persistent challenge in software engineering. While large language models (LLMs) have shown promise in code understanding, their type inference capabilities remain underexplored. We introduce TypyBench, a benchmark designed to evaluate LLMs' type inference across entire Python repositories. TypyBench features two novel metrics: TypeSim, which captures nuanced semantic relationships between predicted and ground truth types, and TypeCheck, which assesses type consistency across codebases. Our evaluation of various LLMs on a curated dataset of 50 high-quality Python repositories reveals that, although LLMs achieve decent TypeSim scores, they struggle with complex nested types and exhibit significant type consistency errors. These findings suggest that future research should shift focus from improving type similarity to addressing repository-level consistency. TypyBench provides a foundation for this new direction, offering insights into model performance across different type complexities and usage contexts. Our code and data are available at https://github.com/typybench/typybench.
LLM-Based Repair of Static Nullability Errors
Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-prone. Large language models (LLMs) offer a promising path toward automating these repairs, but naively-prompted LLMs often generate incorrect, contextually-inappropriate edits. Resolving a nullability error demands a deep understanding of how a symbol is used across the codebase, often spanning methods, classes, and packages. We present NullRepair, a system that integrates LLMs into a structured workflow for resolving the errors from a nullability checker. NullRepair's decision process follows a flowchart derived from manual analysis of 200 real-world errors. It leverages static analysis to identify safe and unsafe usage regions of symbols, using error-free usage examples to contextualize model prompts. Patches are generated through an iterative interaction with the LLM that incorporates project-wide context and decision logic. Our evaluation on 12 real-world Java projects shows that NullRepair resolves an average of 72% of the errors that remain after applying a state-of-the-art annotation inference technique. Unlike a naively-prompted LLM, NullRepair also largely preserves program semantics, with all unit tests passing in 10/12 projects after applying every edit proposed by NullRepair, and 98% or more tests passing in the remaining two projects.
☆ Program Analysis for High-Value Smart Contract Vulnerabilities: Techniques and Insights
A widespread belief in the blockchain security community is that automated techniques are only good for detecting shallow bugs, typically of small value. In this paper, we present the techniques and insights that have led us to repeatable success in automatically discovering high-value smart contract vulnerabilities. Our vulnerability disclosures have yielded 10 bug bounties, for a total of over $3M, over high-profile deployed code, as well as hundreds of bugs detected in pre-deployment or under-audit code. We argue that the elements of this surprising success are a) a very high-completeness static analysis approach that manages to maintain acceptable precision; b) domain knowledge, provided by experts or captured via statistical inference. We present novel techniques for automatically inferring domain knowledge from statistical analysis of a large corpus of deployed contracts, as well as discuss insights on the ideal precision and warning rate of a promising vulnerability detector. In contrast to academic literature in program analysis, which routinely expects false-positive rates below 50% for publishable results, we posit that a useful analysis for high-value real-world vulnerabilities will likely flag very few programs (under 1%) and will do so with a high false-positive rate (e.g., 95%, meaning that only one-of-twenty human inspections will yield an exploitable vulnerability).
Programming Languages
☆ The Power of Negation in Higher-Order Datalog
We investigate the expressive power of Higher-Order Datalog$^\neg$ under both the well-founded and the stable model semantics, establishing tight connections with complexity classes. We prove that under the well-founded semantics, for all $k\geq 1$, $(k+1)$-Order Datalog$^\neg$ captures k-EXP, a result that holds without explicit ordering of the input database. The proof of this fact can be performed either by using the powerful existential predicate variables of the language or by using partially applied relations and relation enumeration. Furthermore, we demonstrate that this expressive power is retained within a stratified fragment of the language. Under the stable model semantics, we show that $(k+1)$-Order Datalog$^\neg$ captures co-(k-NEXP) using cautious reasoning and k-NEXP using brave reasoning, again with analogous results for the stratified fragment augmented with choice rules. Our results establish a hierarchy of expressive power, highlighting an interesting trade-off between order and non-determinism in the context of higher-order logic programming: increasing the order of programs under the well-founded semantics can surpass the expressive power of lower-order programs under the stable model semantics.
♻ ☆ Semantics of Sets of Programs
Applications like program synthesis sometimes require proving that a property holds for all of the infinitely many programs described by a grammar - i.e., an inductively defined set of programs. Current verification frameworks overapproximate programs' behavior when sets of programs contain loops, including two Hoare-style logics that fail to be relatively complete when loops are allowed. In this work, we prove that compositionally verifying simple properties for infinite sets of programs requires tracking distinct program behaviors over unboundedly many executions. Tracking this information is both necessary and sufficient for verification. We prove this fact in a general, reusable theory of denotational semantics that can model the expressivity and compositionality of verification techniques over infinite sets of programs. We construct the minimal compositional semantics that captures simple properties of sets of programs and use it to derive the first sound and relatively complete Hoare-style logic for infinite sets of programs. Thus, our methods can be used to design minimally complex, compositional verification techniques for sets of programs.
comment: 47 pages, 8 Figures
♻ ☆ DisQ: A Model of Distributed Quantum Processors
The next generation of distributed quantum processors combines single-location quantum computing and quantum networking techniques to permit large entangled qubit groups to be established through remote processors, and quantum algorithms can be executed distributively. We present DisQ, as the first formal model of distributed quantum processors, and permit the analysis of distributed quantum programs in the new computation environment. The core of DisQ is a distributed quantum programming language that combines the concepts of Chemical Abstract Machine (CHAM) and Markov Decision Processes (MDP) with the objective of providing clearly distinguishing quantum concurrent and distributed behaviors. Based on the DisQ language, we develop a simulation relation, based on classical simulation infrastructure, to check the equivalence of a quantum algorithm and its distributed versions so that users can develop the distributed version of a sequential quantum program via a simulation check.
comment: Version 4