Knowledge Adaptation as Posterior Correction,
M.E. Khan
[ ArXiv ]
Improving LoRA with Variational Learning,
B. Cong, N. Daheim, Y. Shen, R. Yokota, M.E. Khan, T. Möllenhoff
[ ArXiv ]
Variational Learning Induces Adaptive Label Smoothing,
S.H Yang, Z. Liu, G.M. Marconi, M.E. Khan, [ ArXiv ]
How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging,
(Preprint)H. Monzón Maldonado, T. Möllenhoff, N. Daheim, I. Gurevych, M.E. Khan
[arXiv]
Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time,
(Preprint)G. Wolfer, P. Alquier
[arXiv]
2026
SVRG and Beyond via Posterior Correction,
(ICML 2026)N. Daheim, T. Möllenhoff, M. L. Ang, M.E. Khan [ ArXiv ]
Accepted as oral, top 0.7% of all 23,918 submissions
Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks ,
(ICML 2026)K. Nishida, E. M. Kıral, K. Bannai, M.E. Khan, T. Möllenhoff [
ArXiv ]
Joint Model and Data Sparsification via the Marginal Likelihood,
(ICML 2026)A. Timans, T. Möllenhoff, C. A. Naesseth, M.E. Khan, E. Nalisnick
Position: Agentic AI Systems should be making Bayes-Consistent Decisions,
(ICML 2026) With many authors [
SSRN
]
Variational Visual Question Answering,
(TMLR 2026) T.J. Wieczorek, N. Daun, M.E. Khan, M. Rohrbach [ ArXiv ]
Also presented at
(SaFeMM-AI at ICCV2025)
Federated ADMM from Bayesian Duality,
(ICLR 2026)T. Möllenhoff*, S. Swaroop*, F. Doshi-Velez, M. E. Khan.
[OpenReview]
A Stein identity for q-Gaussians with bounded support,
(CPAL 2026) S. Sklaviadis, T. Möllenhoff, M. A. T. Figueiredo, A. Martins, M.E. Khan
[ ArXiv ]
Natural Variational Annealing for Multimodal Optimization,
(Information and Inference) T. L. Minh, J. Arbel, T. Möllenhoff, M.E. Khan, F. Forbes
[ ArXiv ]
Variational Learning Finds Flatter Solutions at the Edge of Stability,
(NeurIPS 2025) A. Ghosh, B. Cong, R. Yokota, S. Ravishankar, R. Wang, M. Tao, M.E. Khan, T. Möllenhoff
[ArXiv version] Spotlight presentation (top 688 out of 21575 submissions)
Compact Memory for Continual Logistic Regression,
(NeurIPS 2025)Y. Jung, H. Lee, W. Chen, T. Möllenhoff, Y. Li , J. Lee , M. E. Khan
[arXiv]
Also appeared at
AABI 2025
[OpenReview]
Optimization Guarantees for Square-Root Natural-Gradient Variational Inference,
(TMLR 2025)N. Kumar, T. Moellenhoff, M.E. Khan, A. Lucchi [OpenReview]
Variance-Aware Estimation of Kernel Mean Embedding,
(JMLR)G. Wolfer, P. Alquier
[
Published version
] [arXiv]
Estimating the Data-Influence of Latent Variable Models using Variational Bayes,
(AABI 2025)D. Tailor, M. E. Khan, E. Nalisnick
[OpenReview]
Variational Learning Induces Adaptive Label Smoothing,
(AABI 2025)S.-H. Yang, Z. Liu, G. M. Marconi, M. E. Khan
[arXiv]
Connecting Federated ADMM to Bayes,
(ICLR 2025)S. Swaroop, M.E. Khan, F. Doshi-Velez
[OpenReview]
Uncertainty-Aware Decoding with Minimum Bayes’ Risk,
(ICLR 2025)N. Daheim, C. Meister, T. Möllenhoff, I. Gurevych.
[OpenReview]
2024
Variational Low-Rank Adaptation Using IVON,
(Fine-Tuning in Modern ML (FITML) at NeurIPS 2024) B. Cong, N. Daheim, Y. Shen, D. Cremers, R. Yokota, M.E. Khan, T. Möllenhoff
[OpenReview] [Code]
Variational Learning is Effective for Large Deep Networks,
(ICML 2024) Y. Shen*, N. Daheim*, B. Cong, P. Nickl, G.M. Marconi, C. Bazan, R. Yokota, I. Gurevych, D. Cremers, M.E. Khan, T. Möllenhoff
[arXiv] [Blog]
[Code]
Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI,
(ICML 2024)
T. Papamarkou, M. Skoularidou, K. Palla, L. Aitchison, J. Arbel, D. Dunson, M. Filippone, V. Fortuin, P. Hennig, J.M.H. Lobato, A. Hubin, A. Immer, T.
Karaletsos, M.E. Khan, A. Kristiadi, Y. Li, S. Mandt, C. Nemeth, M.A. Osborne, T.G.J. Rudner, D. Rügamer, Y.W.T., M. Welling, A.G. Wilson, R.uqi
Zhang
[arXiv]
Model Merging by Uncertainty-Based Gradient Matching,
(ICLR 2024) N. Daheim, T. Möllenhoff, E. M. Ponti, I. Gurevych, M.E. Khan
[arXiv] [Code]
Conformal Prediction via Regression-as-Classification,
(ICLR 2024) E. K. Guha, S. Natarajan, T. Möllenhoff, M.E. Khan, E. Ndiaye
[OpenReview] [ArXiv] [Code] [Package]
2023
Improving Continual Learning by Accurate Gradient Reconstructions of the Past,
(TMLR) E. Daxberger, S. Swaroop, K. Osawa, R. Yokota, R. turner, J. M. Hernández-Lobato, M.E. Khan
[
OpenReview
] [
Code
]
The Memory Perturbation Equation: Understanding Model’s Sensitivity to Data,
(NeurIPS 2023) P. Nickl, L. Xu, D. Tailor, T. Möllenhoff, M.E. Khan
[ arXiv ] [
SlidesLive Video
] [
Poster] [Code]
Bridging the Gap Between Target Networks and Functional Regularization,
(TMLR) A. Piché, V. Thomas, R. Pardinas, J. Marino, G. M. Marconi, C. Pal, M.E. Khan
[
Openreview
]
Variational Bayes Made Easy,
(AABI 2023)M.E. Khan
[arXiv]
Estimation of Copulas via Maximum Mean Discrepancy,
(JASA)P. Alquier, B.-E. Chérief-Abdellatif, A. Derumigny, J.-D. Fermanian
[Journal version] [arXiv]
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages,
(ACL 2023)C. M. Bamba, D. Adelani, P. Nabende, J. O. Alabi, T. Sindane, H. Buzaaba,...
[arXiv]
Geometric Reduction for Identity Testing of Reversible Markov Chains,
(GSI 2023)G. Wolfer, S. Watanabe
[Published version] [arXiv]
Oral presentation.
Simplifying Momentum-based Riemannian Submanifold Optimization,
(ICML 2023)W. Lin, V. Duruisseaux, M. Leok, F. Nielsen, M.E. Khan, M. Schmidt
[
ArXiv
]
Memory-Based Dual Gaussian Processes for Sequential Learning,
(ICML 2023)P. E. Chang, P. Verma, S. T. John, A. Solin, M.E. Khan
The Lie-Group Bayesian Learning Rule,
(AISTATS 2023)E. M. Kiral, T. Möllenhoff, M. E. Khan
[arXiv] [Code]
SAM as an Optimal Relaxation of Bayes,
(ICLR 2023)T. Möllenhoff, M. E. Khan
[arXiv] [Code] Notable top-5% of all accepted papers.
2022
Sequential Learning in GPs with Memory and Bayesian Leverage Score,
(Continual Lifelong Workshop at ACML 2022)P. Verma, P. E. Chang, A. Solin, M.E. Khan [
OpenReview
]
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition,
(EMNLP 2022)D. Adelani, G. Neubig, S. Ruder, S. Rijhwani, M. Beukman, C. Palen-Michel, C. Lignos, J. Alabi, S. Muhammad, P. Nabende, B. Dione, A. Bukula, R.
Mabuya, B. Dossou, B. Sibanda, H. Buzaaba, .....
[arXiv]
Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates,
(NeuReps Workshop at NeurIPS 2022)W. Lin, V. Duruisseaux, M. Leok, F. Nielsen, M.E. Khan, M. Schmidt [
OpenReview
]
Can Calibration Improve Sample Prioritization?,
(HITY Workshop at NeurIPS 2022)G. Tata, G. K. Gudur, G. Chennupati, M.E. Khan
[
OpenReview
]
Approximate Bayesian Inference: Reprint of the Special Issue Published in Entropy,
(MDPI Books)P. Alquier (Editor)
[Book page]
Tight Risk Bound for High Dimensional Time Series Completion,
(EJS)P. Alquier, N. Marie, A. Rosier
[Published version] [arXiv]
Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence,
(Bernoulli)B.E. Chérief-Abdellatif, P. Alquier
[Published version] [arXiv]
Meta-strategy for Learning Tuning Parameters with Guarantees,
(Entropy)D. Meunier, P. Alquier
[Published version] [arXiv]
Subset-of-Data Variational Inference for Deep Gaussian-Process Regression,
(UAI 2021) A. Jain, P.K. Srijith, M.E. Khan, [Published version] [arXiv] [Code]
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning,
(ICML 2021)A. Immer, M. Bauer, V. Fortuin, G. Rätsch, M. E. Khan
[Published version] [arXiv] [Code]
Tractable Structured Natural Gradient Descent Using Local Parameterizations,
(ICML 2021)W. Lin, F. Nielsen, M. E. Khan, M. Schmidt
[Published version] [arXiv]
Non-Exponentially Weighted Aggregation: Regret Bounds for Unbounded Loss Functions,
(ICML 2021)P. Alquier
[Published version] [arXiv]
Improving Predictions of Bayesian Neural Networks via Local Linearization,
(AIstats 2021) A. Immer, M. Korzepa, M. Bauer
[Published version] [arXiv] [Code]
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix,
(AISTATS 2021)T. Doan, M. Abbana Bennani, B. Mazoure, G. Rabusseau, P. Alquier
[Published version] [arXiv]
Continual Deep Learning by Functional Regularisation of Memorable Past (NeurIPS 2020)P. Pan*, S. Swaroop*, A. Immer, R. Eschenhagen, R. E. Turner, M.E. Khan
[Published version] [ArXiv] [Code] [Poster]
Oral presentation, 1% of all submissions (105 out of 9454 submissions).
Fast Variational Learning in State-Space Gaussian Process Models,
(MLSP 2020)P. E. Chang, W. J. Wilkinson, M.E. Khan, A. Solin [Published version] [arXiv]
Training Binary Neural Networks using the Bayesian Learning Rule,
(ICML 2020) X. Meng, R. Bachmann, M.E. Khan
[Published version] [arXiv] [Code]
Handling the Positive-Definite Constraint in the Bayesian Learning Rule,
(ICML 2020) W. Lin, M. Schmidt, M.E. Khan
[Published version] [arXiv] [Code]
VILD: Variational Imitation Learning with Diverse-quality Demonstrations,
(ICML 2020) V. Tangkaratt, B. Han, M.E. Khan, M. Sugiyama [Published version] [arXiv] [Code]
MMD-Bayes: Bayesian Estimation via Maximum Mean Discrepancy,
(AABI 2019)B.E. Chérief-Abdellatif, P. Alquier
[Published version] [arXiv]
Exact Recovery of Low-rank Tensor Decomposition under Reshuffling,
(AAAI 2020)C. Li, M.E. Khan, Z. Sun, G. Niu, B. Han, S. Xie, Q. Zhao [arXiv]
2019
Practical Deep Learning with Bayesian Principles,
(NeurIPS 2019)K. Osawa, S. Swaroop, A. Jain, R. Eschenhagen, R.E. Turner, R. Yokota, M.E. Khan. [Published version] [arXiv] [Code]
Approximate Inference Turns Deep Networks into Gaussian Processes,
(NeurIPS 2019)M.E. Khan, A. Immer, E. Abedi, M. korzepa. [Published version] [arXiv] [Code]
SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient,
(NeurIPS 2018)A. Miskin, F. Kunstner, D. Nielsen, M. Schmidt, M.E. Khan. [
Published version] [arXiv] [Poster] [3-min Video] [Code]
Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models,
(ISITA 2018)M.E. Khan and D. Nielsen, [arXiv] [IEEE explore] [Slides]
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam,
(ICML 2018)M.E. Khan, D. Nielsen, V. Tangkaratt, W. Lin, Y. Gal, and A. Srivastava, [
Published version] [arXiv] [Code] [Slides]
Variational Message Passing with Structured Inference Networks,
(ICLR 2018) W. Lin, N. Hubacher, and M.E. Khan, [Paper] [arXiv Version] [Code]
Bayesian Nonparametric Poisson-Process Allocation for Time-Sequence Modeling,
(AI-Stats 2018) H. Ding, M.E. Khan, I. Sato, M. Sugiyama, [Published version] [Code]