Chaitanya Malaviya

I am a Senior Research Scientist at Google DeepMind working on evaluation and post-training of language models. I received my PhD in computer science from the University of Pennsylvania, advised by Mark Yatskar and Dan Roth. During my PhD, I worked on the Semantic Scholar team at Ai2 and on the Gemini team at Google DeepMind.

My research interests are in 1) evaluation and benchmarking of language models in realistic scenarios, 2) aligning models to diverse users, especially domain experts, and 3) learning from human feedback. Broadly, I am interested in studying how to use human knowledge and expertise to improve, align and evaluate language models. I also take a keen interest in cognitive science and linguistics.

Previously, I was a predoctoral young investigator at the Allen Institute for Artificial Intelligence. I completed my masters at the Language Technologies Institute at Carnegie Mellon University and my bachelors at Nanyang Technological University, Singapore.

Email / Google Scholar / Semantic Scholar / GitHub

Publications

Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Anirudh Bharadwaj, Chaitanya Malaviya, Nitish Joshi, Mark Yatskar
ICLR, 2026

Code/Data

ResearchQA: Evaluating Scholarly Question Answering at Scale Across 75 Fields with Survey-Mined Questions and Rubrics
Li S. Yifei*, Allen Chang*, Chaitanya Malaviya, Mark Yatskar
Transactions of the Association for Computational Linguistics (TACL), 2026

Project Website

EvalAgent: Discovering Implicit Evaluation Criteria from the Web
Manya Wadhwa, Zayne Sprague, Chaitanya Malaviya, Philippe Laban, Junyi Jessy Li, Greg Durrett
COLM, 2025

Code

Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
Chaitanya Malaviya, Joseph Chee Chang, Dan Roth, Mohit Iyyer, Mark Yatskar, Kyle Lo
Transactions of the Association for Computational Linguistics (TACL), 2025

Code / Data / Blogpost

On Reference (In-)Determinacy in Natural Language Inference
Sihao Chen, Chaitanya Malaviya, Alex Fabrikant, Hagai Taitelbaum, Tal Schuster, Senaka Buthpitiya, Dan Roth
Findings of NAACL, 2025

Code

OLOMITES

: Domain-Specific Long-Form Methodical Tasks
Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das, Mirella Lapata, Chris Alberti
Transactions of the Association for Computational Linguistics (TACL), 2024

Project Website

AssistantBench: Can Web Agents Solve Realistic and Time-consuming Tasks?
Ori Yoran, Samuel Amouyal, Chaitanya Malaviya, Ben Bogin, Ofir Press, Jonathan Berant
EMNLP, 2024

Project Website

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini Team, Google.
2024

Blog

What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception
Chaitanya Malaviya, Subin Lee, Dan Roth, Mark Yatskar
NAACL, 2024

Code/Dataset

ExpertQA: Expert-Curated Questions and Attributed Answers
Chaitanya Malaviya, Subin Lee, Sihao Chen, Elizabeth Sieber, Mark Yatskar, Dan Roth
NAACL, 2024

Code/Dataset

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations (Outstanding paper award)
Chaitanya Malaviya, Peter Shaw, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
ACL, 2023

Code/Dataset

Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models
Chaitanya Malaviya, Sudeep Bhatia, Mark Yatskar
EMNLP, 2022

Code

AmbiCoref: Evaluating Human and Model Sensitivity to Ambiguous Coreference
Yuewei Yuan, Chaitanya Malaviya, Mark Yatskar
Findings of EACL, 2023

Code

Generative Data Augmentation for Commonsense Reasoning
Yiben Yang, Chaitanya Malaviya, Jared Fernandez, Swabha Swayamdipta, Ronan Le Bras, Ji-Ping Wang, Chandra Bhagavatula, Yejin Choi, Doug Downey
Findings of EMNLP, 2020

BibTeX / Project Page

Commonsense Knowledge Base Completion with Structural and Semantic Context
Chaitanya Malaviya, Chandra Bhagavatula, Antoine Bosselut, Yejin Choi
AAAI, 2020

BibTeX / Code

Abductive Commonsense Reasoning
Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
ICLR, 2020

BibTeX / Code / Leaderboard

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikylimaz, Yejin Choi
ACL, 2019

BibTeX / Code / Demo

A Simple Joint Model for Improved Contextual Neural Lemmatization
Chaitanya Malaviya*, Shijie Wu*, Ryan Cotterell
NAACL, 2019

BibTeX / Code / Slides / Talk

Neural Factor Graph Models for Cross-lingual Morphological Tagging
Chaitanya Malaviya, Matthew R. Gormley, Graham Neubig
ACL, 2018

BibTeX / Code / Slides / Talk

Sparse and Constrained Attention for Neural Machine Translation
Chaitanya Malaviya, Pedro Ferreira, André F.T. Martins
ACL, 2018

BibTeX / Code / Slides / Talk

Learning Language Representations for Typology Prediction
Chaitanya Malaviya, Graham Neubig, Patrick Littell
EMNLP, 2017

BibTeX / Code / Poster

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
Arya McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Miikka Silfverberg, Sabrina Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden
SIGMORPHON, 2019

BibTeX

Technical Reports

Building CMU Magnus from User Feedback
Shrimai Prabhumoye, Fadi Botros, Khyathi Chandu, Samridhi Choudhary, Esha Keni, Chaitanya Malaviya, Thomas Manzini, Rama Pasumarthi, Shivani Poddar, Abhilasha Ravichander, Zhou Yu, Alan Black
Alexa Prize Proceedings, 2017

BibTeX

DyNet: The Dynamic Neural Network Toolkit
Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
arXiv, 2017

BibTeX / Source

Website source from Jon Barron here