List of Publications

For our code releases, please see our code lists.

Early Drafts/Preprints

Fast and Slow Variational Continual Learning,
S. Paul, Y. Jung, M.E. Khan, S. Swaroop, T. Möllenhoff, M. Mundt [ ArXiv ]
SOAP-Bubbles: Structured Weight Uncertainty for Neural Networks,
A. R. Minut, N. Daheim, M. Miani, M.E. Khan, W. Lin, T. Möllenhoff [ ArXiv ]
Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior,
C. J. Anders, H. Da Silva Gameiro, N. Daheim, M.E. Khan [ ArXiv ]
Knowledge Adaptation as Posterior Correction,
M.E. Khan [ ArXiv ]
Improving LoRA with Variational Learning,
B. Cong, N. Daheim, Y. Shen, R. Yokota, M.E. Khan, T. Möllenhoff [ ArXiv ]
Variational Learning Induces Adaptive Label Smoothing,
S.H Yang, Z. Liu, G.M. Marconi, M.E. Khan, [ ArXiv ]
How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging,
(Preprint) H. Monzón Maldonado, T. Möllenhoff, N. Daheim, I. Gurevych, M.E. Khan [arXiv]
Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time,
(Preprint) G. Wolfer, P. Alquier [arXiv]

2026

SVRG and Beyond via Posterior Correction,
(ICML 2026) N. Daheim, T. Möllenhoff, M. L. Ang, M.E. Khan [ ArXiv ]
Accepted as oral, top 0.7% of all 23,918 submissions
Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks ,
(ICML 2026) K. Nishida, E. M. Kıral, K. Bannai, M.E. Khan, T. Möllenhoff [ ArXiv ]
Joint Model and Data Sparsification via the Marginal Likelihood,
(ICML 2026) A. Timans, T. Möllenhoff, C. A. Naesseth, M.E. Khan, E. Nalisnick
Position: Agentic AI Systems should be making Bayes-Consistent Decisions,
(ICML 2026) With many authors [ SSRN ]
Variational Visual Question Answering,
(TMLR 2026) T.J. Wieczorek, N. Daun, M.E. Khan, M. Rohrbach [ ArXiv ]
Also presented at (SaFeMM-AI at ICCV2025)
Federated ADMM from Bayesian Duality,
(ICLR 2026) T. Möllenhoff^*, S. Swaroop^*, F. Doshi-Velez, M. E. Khan. [OpenReview]
A Stein identity for q-Gaussians with bounded support,
(CPAL 2026) S. Sklaviadis, T. Möllenhoff, M. A. T. Figueiredo, A. Martins, M.E. Khan [ ArXiv ]
Natural Variational Annealing for Multimodal Optimization,
(Information and Inference) T. L. Minh, J. Arbel, T. Möllenhoff, M.E. Khan, F. Forbes [ ArXiv ]

2025

Information Geometry of Variational Bayes,
(Information Geometry) M.E. Khan [ arXiv ]
Variational Learning Finds Flatter Solutions at the Edge of Stability,
(NeurIPS 2025) A. Ghosh, B. Cong, R. Yokota, S. Ravishankar, R. Wang, M. Tao, M.E. Khan, T. Möllenhoff
[ArXiv version] Spotlight presentation (top 688 out of 21575 submissions)
Compact Memory for Continual Logistic Regression,
(NeurIPS 2025) Y. Jung, H. Lee, W. Chen, T. Möllenhoff, Y. Li , J. Lee , M. E. Khan [arXiv]
Also appeared at AABI 2025 [OpenReview]
Optimization Guarantees for Square-Root Natural-Gradient Variational Inference,
(TMLR 2025) N. Kumar, T. Moellenhoff, M.E. Khan, A. Lucchi [OpenReview]
Variance-Aware Estimation of Kernel Mean Embedding,
(JMLR) G. Wolfer, P. Alquier [ Published version ] [arXiv]
Estimating the Data-Influence of Latent Variable Models using Variational Bayes,
(AABI 2025) D. Tailor, M. E. Khan, E. Nalisnick [OpenReview]
Variational Learning Induces Adaptive Label Smoothing,
(AABI 2025) S.-H. Yang, Z. Liu, G. M. Marconi, M. E. Khan [arXiv]
Connecting Federated ADMM to Bayes,
(ICLR 2025) S. Swaroop, M.E. Khan, F. Doshi-Velez [OpenReview]
Uncertainty-Aware Decoding with Minimum Bayes’ Risk,
(ICLR 2025) N. Daheim, C. Meister, T. Möllenhoff, I. Gurevych. [OpenReview]

2024

Variational Low-Rank Adaptation Using IVON,
(Fine-Tuning in Modern ML (FITML) at NeurIPS 2024) B. Cong, N. Daheim, Y. Shen, D. Cremers, R. Yokota, M.E. Khan, T. Möllenhoff [OpenReview] [Code]
Geometric Aspects of Data-Processing of Markov Chains,
(Transactions of Mathematics and Its Applications) G. Wolfer, S. Watanabe [Published version] [arXiv]
Variational Learning is Effective for Large Deep Networks,
(ICML 2024) Y. Shen*, N. Daheim*, B. Cong, P. Nickl, G.M. Marconi, C. Bazan, R. Yokota, I. Gurevych, D. Cremers, M.E. Khan, T. Möllenhoff [arXiv] [Blog] [Code]
Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI,
(ICML 2024) T. Papamarkou, M. Skoularidou, K. Palla, L. Aitchison, J. Arbel, D. Dunson, M. Filippone, V. Fortuin, P. Hennig, J.M.H. Lobato, A. Hubin, A. Immer, T. Karaletsos, M.E. Khan, A. Kristiadi, Y. Li, S. Mandt, C. Nemeth, M.A. Osborne, T.G.J. Rudner, D. Rügamer, Y.W.T., M. Welling, A.G. Wilson, R.uqi Zhang [arXiv]
Improved Estimation of Relaxation Time in Non-reversible Markov Chains,
(Annals of Applied Probability) G. Wolfer, A. Kontorovich [Published version] [arXiv]
Model Merging by Uncertainty-Based Gradient Matching,
(ICLR 2024) N. Daheim, T. Möllenhoff, E. M. Ponti, I. Gurevych, M.E. Khan [arXiv] [Code]
Conformal Prediction via Regression-as-Classification,
(ICLR 2024) E. K. Guha, S. Natarajan, T. Möllenhoff, M.E. Khan, E. Ndiaye [OpenReview] [ArXiv] [Code] [Package]

2023

Improving Continual Learning by Accurate Gradient Reconstructions of the Past,
(TMLR) E. Daxberger, S. Swaroop, K. Osawa, R. Yokota, R. turner, J. M. Hernández-Lobato, M.E. Khan [ OpenReview ] [ Code ]
The Bayesian Learning Rule,
(JMLR) M.E. Khan, H. Rue [ Published version ] [ arXiv ] [ Tweet ]
The Memory Perturbation Equation: Understanding Model’s Sensitivity to Data,
(NeurIPS 2023) P. Nickl, L. Xu, D. Tailor, T. Möllenhoff, M.E. Khan [ arXiv ] [ SlidesLive Video ] [ Poster] [Code]
Bridging the Gap Between Target Networks and Functional Regularization,
(TMLR) A. Piché, V. Thomas, R. Pardinas, J. Marino, G. M. Marconi, C. Pal, M.E. Khan [ Openreview ]
Variational Bayes Made Easy,
(AABI 2023) M.E. Khan [arXiv]
Estimation of Copulas via Maximum Mean Discrepancy,
(JASA) P. Alquier, B.-E. Chérief-Abdellatif, A. Derumigny, J.-D. Fermanian [Journal version] [arXiv]
Empirical and Instance-Dependent Estimation of Markov Chain and Mixing Time,
(Scandinavian Journal of Statistics) G. Wolfer [arXiv] [Journal version]
Systematic Approaches to Generate Reversiblizations of Markov Chains,
(IEEE Transactions on Information Theory) M. C.H. Choi, G. Wolfer [arXiv] [Early Access]
Learning and Identity Testing of Markov Chains,
(Handbook of Statistics, Volume 49) G. Wolfer, A. Kontorovich [Journal version]
Exploiting Inferential Structure in Neural Processes,
(UAI 2023) D. Tailor, M.E. Khan, E. Nalisnick [Published version] [arXiv] [Poster]
Information Geometry of Markov Kernels: a Survey,
in "Advances in Information Geometry: Beyond the Conventional Approach",
(Front. Phys. Sec. Statistical and Computational Physics) G. Wolfer, S. Watanabe [Journal version]
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages,
(ACL 2023) C. M. Bamba, D. Adelani, P. Nabende, J. O. Alabi, T. Sindane, H. Buzaaba,... [arXiv]
Geometric Reduction for Identity Testing of Reversible Markov Chains,
(GSI 2023) G. Wolfer, S. Watanabe [Published version] [arXiv]
Oral presentation.
Simplifying Momentum-based Riemannian Submanifold Optimization,
(ICML 2023) W. Lin, V. Duruisseaux, M. Leok, F. Nielsen, M.E. Khan, M. Schmidt [ ArXiv ]
Memory-Based Dual Gaussian Processes for Sequential Learning,
(ICML 2023) P. E. Chang, P. Verma, S. T. John, A. Solin, M.E. Khan
Dimension-Free Empirical Entropy Estimation,
(IEEE Transactions on Information Theory) D. Cohen, A. Kontorovich, A. Koolyk, G. Wolfer [Journal version] [arXiv]
The Lie-Group Bayesian Learning Rule,
(AISTATS 2023) E. M. Kiral, T. Möllenhoff, M. E. Khan [arXiv] [Code]
SAM as an Optimal Relaxation of Bayes,
(ICLR 2023) T. Möllenhoff, M. E. Khan [arXiv] [Code]
Notable top-5% of all accepted papers.

2022

Sequential Learning in GPs with Memory and Bayesian Leverage Score,
(Continual Lifelong Workshop at ACML 2022) P. Verma, P. E. Chang, A. Solin, M.E. Khan [ OpenReview ]
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition,
(EMNLP 2022) D. Adelani, G. Neubig, S. Ruder, S. Rijhwani, M. Beukman, C. Palen-Michel, C. Lignos, J. Alabi, S. Muhammad, P. Nabende, B. Dione, A. Bukula, R. Mabuya, B. Dossou, B. Sibanda, H. Buzaaba, ..... [arXiv]
Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates,
(NeuReps Workshop at NeurIPS 2022) W. Lin, V. Duruisseaux, M. Leok, F. Nielsen, M.E. Khan, M. Schmidt [ OpenReview ]
Can Calibration Improve Sample Prioritization?,
(HITY Workshop at NeurIPS 2022) G. Tata, G. K. Gudur, G. Chennupati, M.E. Khan [ OpenReview ]
Exploiting Inferential Structure in Neural Processes,
(Workshop on Tractable Probabilistic Modeling at UAI 2022 ) D. Tailor, M.E. Khan, E. Nalisnick [ OpenReview ] [Video] [Poster]
Deviation Inequalities for Stochastic Approximation by Averaging,
(SPA) X. Fan, P. Alquier, P. Doukhan [Published version] [arXiv]
Understanding the Population Structure Correction Regression,
(ICSTA 2022) T. T. Mai, P. Alquier [Published version] [arXiv]
Approximate Bayesian Inference: Reprint of the Special Issue Published in Entropy,
(MDPI Books) P. Alquier (Editor) [Book page]
Tight Risk Bound for High Dimensional Time Series Completion,
(EJS) P. Alquier, N. Marie, A. Rosier [Published version] [arXiv]
Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence,
(Bernoulli) B.E. Chérief-Abdellatif, P. Alquier [Published version] [arXiv]

2021

Knowledge-Adaptation Priors,
(NeurIPS 2021) M.E. Khan, S. Swaroop [Published version] [arXiv] [Slides] [Tweet] [SlidesLive Video] [Code]
Dual Parameterization of Sparse Variational Gaussian Processes,
(NeurIPS 2021) P. Chang, V. Adam, M.E. Khan, A. Solin [Published version] [arXiv] [Code]
Meta-strategy for Learning Tuning Parameters with Guarantees,
(Entropy) D. Meunier, P. Alquier [Published version] [arXiv]
Subset-of-Data Variational Inference for Deep Gaussian-Process Regression,
(UAI 2021) A. Jain, P.K. Srijith, M.E. Khan, [Published version] [arXiv] [Code]
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning,
(ICML 2021) A. Immer, M. Bauer, V. Fortuin, G. Rätsch, M. E. Khan [Published version] [arXiv] [Code]
Tractable Structured Natural Gradient Descent Using Local Parameterizations,
(ICML 2021) W. Lin, F. Nielsen, M. E. Khan, M. Schmidt [Published version] [arXiv]
Non-Exponentially Weighted Aggregation: Regret Bounds for Unbounded Loss Functions,
(ICML 2021) P. Alquier [Published version] [arXiv]
Improving Predictions of Bayesian Neural Networks via Local Linearization,
(AIstats 2021) A. Immer, M. Korzepa, M. Bauer [Published version] [arXiv] [Code]
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix,
(AISTATS 2021) T. Doan, M. Abbana Bennani, B. Mazoure, G. Rabusseau, P. Alquier [Published version] [arXiv]
Simultaneous Dimension Reduction and Clustering via the NMF-EM Algorithm,
(Advances in Data Analysis and Classification) L. Carel, P. Alquier [Published version] [arXiv]

2020

Continual Deep Learning by Functional Regularisation of Memorable Past
(NeurIPS 2020) P. Pan^*, S. Swaroop^*, A. Immer, R. Eschenhagen, R. E. Turner, M.E. Khan [Published version] [ArXiv] [Code] [Poster] Oral presentation, 1% of all submissions (105 out of 9454 submissions).
Approximate Bayesian Inference,
(Entropy) P. Alquier [Paper]
Concentration of tempered posteriors and of their variational approximations,
(Annals of Statistics) P. Alquier, J. Ridgway [Published version] [arXiv]
Fast Variational Learning in State-Space Gaussian Process Models,
(MLSP 2020) P. E. Chang, W. J. Wilkinson, M.E. Khan, A. Solin [Published version] [arXiv]
High-dimensional VAR with low-rank transition,
(Statistics and Computing) P. Alquier, K. Bertin, P. Doukhan, R. Garnier [Published version] [arXiv]
AI for Social Good: Unlocking the Opportunity for Positive Impact,
(Nature Communications 2020) with Nenad Tomašev and many others
[Paper] [Declaration on AI2SG] [Dagstuhl AI4SG 2019] [Press Release]
Training Binary Neural Networks using the Bayesian Learning Rule,
(ICML 2020) X. Meng, R. Bachmann, M.E. Khan [Published version] [arXiv] [Code]
Handling the Positive-Definite Constraint in the Bayesian Learning Rule,
(ICML 2020) W. Lin, M. Schmidt, M.E. Khan [Published version] [arXiv] [Code]
VILD: Variational Imitation Learning with Diverse-quality Demonstrations,
(ICML 2020) V. Tangkaratt, B. Han, M.E. Khan, M. Sugiyama [Published version] [arXiv] [Code]
MMD-Bayes: Bayesian Estimation via Maximum Mean Discrepancy,
(AABI 2019) B.E. Chérief-Abdellatif, P. Alquier [Published version] [arXiv]
Exact Recovery of Low-rank Tensor Decomposition under Reshuffling,
(AAAI 2020) C. Li, M.E. Khan, Z. Sun, G. Niu, B. Han, S. Xie, Q. Zhao [arXiv]

2019

Practical Deep Learning with Bayesian Principles,
(NeurIPS 2019) K. Osawa, S. Swaroop, A. Jain, R. Eschenhagen, R.E. Turner, R. Yokota, M.E. Khan. [Published version] [arXiv] [Code]
Approximate Inference Turns Deep Networks into Gaussian Processes,
(NeurIPS 2019) M.E. Khan, A. Immer, E. Abedi, M. korzepa. [Published version] [arXiv] [Code]
A Generalization Bound for Online Variational Inference (best paper award),
(ACML 2019) B.-E. Chérief-Abdellatif, P. Alquier, M.E. Khan. [Published version] [arXiv]
Matrix factorization for multivariate time series analysis,
(EJS) P. Alquier, N. Marie [Published version] [arXiv]
Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations,
(ICML, 2019) W. Lin, M.E. Khan, M. Schmidt. [arXiv] [Published version] [Code]
Also appeared at the Symposium on Advances in Approximate Bayesian Inference at NeurIPS 2018 [Short Paper]
Stein's Lemma for the Reparameterization Trick with Exponential Family Mixtures,
(ICML workshop on Stein's Method in ML and Stats, 2019) W. Lin, M.E. Khan, M. Schmidt. [arXiv]
Scalable Training of Inference Networks for Gaussian-Process Models,
(ICML, 2019) J. Shi, M.E. Khan, J. Zhu. [arXiv] [ Published version] [Code]
TD-Regularized Actor-Critic Methods,
(Machine Learning , 2019. A short version appeared at EWRL 2018)
S. Parisi, V. Tangkaratt, J. Peters, M.E. Khan. [ Published version] [arXiv] [Short version at EWRL 2018]

2018

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient,
(NeurIPS 2018) A. Miskin, F. Kunstner, D. Nielsen, M. Schmidt, M.E. Khan. [ Published version] [arXiv] [Poster] [3-min Video] [Code]
Natural Variational Continual Learning,
(Continual Learning Workshop at NIPS 2018)
H. Tseran, M.E. Khan, T. Harada, T. Bui [Paper].
Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models,
(ISITA 2018) M.E. Khan and D. Nielsen, [arXiv] [IEEE explore] [Slides]
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam,
(ICML 2018) M.E. Khan, D. Nielsen, V. Tangkaratt, W. Lin, Y. Gal, and A. Srivastava, [ Published version] [arXiv] [Code] [Slides]
Variational Message Passing with Structured Inference Networks,
(ICLR 2018) W. Lin, N. Hubacher, and M.E. Khan, [Paper] [arXiv Version] [Code]
Bayesian Nonparametric Poisson-Process Allocation for Time-Sequence Modeling,
(AI-Stats 2018) H. Ding, M.E. Khan, I. Sato, M. Sugiyama, [Published version] [Code]

2017

Vprop: Variational Inference using RMSprop,
(NIPS 2017, Workshop on Bayesian Deep Learning)
M.E. Khan, Z. Liu, V. Tangkaratt, and Y. Gal [Workshop version] [Poster]
Variational Adaptive-Newton Method for Explorative-Learning,
(NIPS 2017, Workshop on Advances in Approximate Bayesian Inference)
M.E. Khan, W. Lin, V. Tangkaratt, Z. Liu, and D. Nielsen [arXiv Version] [Poster]
Natural-Gradient Stochastic Variational Inference for Non-Conjugate Structured Variational Autoencoder,
(ICML 2017, Workshop on Deep Structure Prediction) W. Lin, M.E. Khan, N. Hubacher, and D. Nielsen [Paper]
Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models,
(AIstats 2017) M.E. Khan and W. Lin [ Published version ] [arXiv ] [Code for Logistic Reg + GPs] [Code for Correlated Topic Model]
SmarPer: Context-Aware and Automatic Runtime-Permissions for Mobile Devices,
(38th IEEE Symposium on Security and Privacy (S&P;), San Jose, CA, USA, May 22-24, 2017)
K. Olejnik, I. I. Dacosta Petrocelli, J. C. Soares Machado, K. Huguenin, M.E. Khan , and J.-P. Hubaux [Published paper] [Code] [SmarPer Homepage]
Gaussian-Process-Based Emulators for Building Performance Simulation,
(Building Simulation 2017) P. Rastogi, M.E. Khan and M. Anderson [Paper] [Building Simulation Data] [Code]