Rishabh Agrawal — Reinforcement Learning, USC

About

I am a Ph.D. candidate in Electrical & Computer Engineering at the University of Southern California, where I completed master's degrees in both Computer Science and Electrical Engineering in May 2025. My research, advised by Prof. Rahul Jain and Prof. Ashutosh Nayyar, centers on reinforcement learning (RL), with a particular focus on offline and robust imitation learning, behavior foundation models, and post-training LLMs. I also collaborate closely with Prof. Paria Rashidinejad on RL for LLMs research.

In May 2026, I joined Google as a Student Researcher. In summer 2025, I was an Applied Scientist intern at Amazon, working on reinforcement learning for agentic AI systems. Earlier, as a Research Engineer at Samsung Research, I applied deep RL to network resource problems in the 6G Lab. As an undergraduate I worked with Prof. Sriparna Bandopadhyay at IIT Guwahati on data augmentation with r-cyclic matrices.

Before that, I spent summer 2018 with Prof. Richard James at the University of Minnesota modeling light-induced phase transitions, and summer 2017 with Prof. Frank Chung-Hoon Rhee at Hanyang University, estimating the fuzzifier parameter for alpha-planes of general type-2 fuzzy sets.

Reinforcement Learning Imitation Learning Behavior Foundation Models Agentic AI Large Language Models Post-training

Selected Research

DistIL: distributional DAgger with future-aware credit assignment

NEW RLxF Workshop · ICML 2026

Reinforcement Learning from Rich Feedback with Distributional DAgger

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad

PDF Project Code

Abstract

The dominant RL-from-verifiable-rewards recipe rewards each response with a single correctness bit, yet many settings provide far richer feedback — execution traces, tool outputs, expert corrections, self-evaluations. We study how to use such feedback through DistIL, a distributional variant of DAgger that optimizes a forward cross-entropy objective. Unlike reverse-KL or Jensen–Shannon self-distillation, DistIL guarantees monotonic policy improvement and sublinear regret, performs future-aware credit assignment, and improves Pass@N across scientific reasoning, coding, and hard mathematics.

RBFM: robust task inference under dynamics shift

NEW arXiv · 2026

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

Rishabh Agrawal, Rahul Jain, Ashutosh Nayyar

arXiv

Abstract

Behavior Foundation Models (BFMs) enable scalable imitation learning but assume fixed dynamics, leaving them brittle to real-world shifts in friction, actuation, or sensor noise. We recast BFM task inference as a robust minimax problem and introduce RBFM-Light and RBFM-Heavy — two variants that add robustness only at inference, with no change to pretraining and using offline data from a single nominal environment. Both substantially outperform standard BFM and robust offline IL baselines under dynamics shifts.

ORAL NeurIPS 2025 E-SARS · L4DC 2026

Balance Equation-based Distributionally Robust Offline Imitation Learning

Rishabh Agrawal, Yusuf Alvi, Rahul Jain, Ashutosh Nayyar

arXiv Oral talk

Abstract

Standard imitation learning implicitly assumes the environment stays fixed between training and deployment — an assumption that rarely holds. We learn robust policies from expert demonstrations alone by solving a distributionally robust optimization over an uncertainty set of transition models, and show the worst-case objective can be rewritten entirely in terms of the nominal data distribution, enabling tractable offline learning with stronger robustness under shifted dynamics.

AAAI 2025

Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning

Rishabh Agrawal, Nathan Dahlin, Rahul Jain, Ashutosh Nayyar

arXiv

Abstract

We study imitation in a strictly offline setting — no environment interaction, no auxiliary data, no transition model. Our method uses the Markov balance equation with a conditional density estimation framework, employing conditional normalizing flows for dynamics, and consistently outperforms many state-of-the-art IL algorithms across Classic Control and MuJoCo.

View all publications, patents & projects

News

Jun 2026
Reinforcement Learning from Rich Feedback with Distributional DAgger (DistIL) accepted at the RLxF Workshop at ICML 2026.
Jun 2026
New preprint: Reinforcement Learning from Rich Feedback with Distributional DAgger is out on arXiv.
May 2026
Started as a Student Researcher at Google.
May 2026
New preprint: When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited is out on arXiv.
Jan 2026
Attended the AAAI 2026 Doctoral Consortium in Singapore to present my thesis research.
Jan 2026
BE-DROIL accepted at L4DC 2026.
Nov 2025
BE-DROIL accepted at the NeurIPS 2025 E-SARS Workshop for an oral presentation.
Nov 2025
Passed my Ph.D. Qualifying Exam — officially a Ph.D. Candidate.
Nov 2025
Selected for the AAAI 2026 Doctoral Consortium in Singapore.
Aug 2025
Wrapped up my Applied Scientist internship at Amazon (RL for agentic AI systems).
Jun 2025
Presented CKIL at L4DC 2025 in Ann Arbor, Michigan.
May 2025
Started as an Applied Scientist Intern at Amazon.
May 2025
Awarded two M.S. degrees — Computer Science and Electrical Engineering — at USC.
Apr 2025
Gave a talk on offline imitation learning at the 45th SoCal Control Workshop, UC San Diego.
Feb 2025
Conditional Kernel Imitation Learning accepted at L4DC 2025.
Feb 2025
Presented MBIL at AAAI 2025 in Philadelphia.
Dec 2024
Markov Balance Satisfaction accepted at AAAI 2025.
Dec 2024
Presented Policy Optimization for Strictly Batch IL at NeurIPS 2024, Vancouver.
Nov 2024
Received the Outstanding Poster Award at USC's 14th Annual Research Festival.
Sep 2024
Policy Optimization for Strictly Batch IL accepted at OPT-ML, NeurIPS 2024.
Jan 2024
Awarded the Graduate School Fellowship by USC.
Aug 2023
Began serving as Teaching Assistant for EE556: Stochastic Systems & Reinforcement Learning.
Aug 2023
CKIL preprint released on arXiv.
Dec 2022
Patent on radio-resource scheduling granted.
Jan 2022
Started my Ph.D. at USC with a broad focus on reinforcement learning.
Aug 2020
A RL Framework for QoS-Driven Radio Resource Scheduling accepted at IEEE Globecom 2020.
Sep 2019
Presented CoPASample at LOD 2019 in Siena, Italy.
Jun 2019
Joined Samsung Research as a Research Engineer in the 6G Lab.
May 2019
Graduated from IIT Guwahati with a B.Tech in Mathematics & Computing.
Mar 2018
Optimal Fuzzifier Range for Alpha-Planes of GT2 Fuzzy Sets accepted at FUZZ-IEEE 2018.
May 2018
Summer research at the University of Minnesota, Twin Cities.
May 2017
Summer research at Hanyang University, South Korea.
Jul 2015
Began undergraduate studies at IIT Guwahati (Mathematics, CS & Financial Engineering).

Contact

I'm always glad to talk research and open to new collaborations. The quickest way to reach me is email, feel free to say hello.

Office

335 Hughes Aircraft Electrical Engineering Center
3740 McClintock Ave, Los Angeles, CA 90089