Experience — Rishabh Agrawal

Experience

Education 3

Ph.D., Electrical Engineering

2022 – 2026 (Expected)

University of Southern California · Los Angeles, CA

GPA 4.0 / 4.0 · Awarded the USC Graduate School Fellowship · Reviewer for ICML, ICLR, NeurIPS, AAAI, and CDC.

M.S., Computer Science & M.S., Electrical Engineering

2023 – 2025

University of Southern California · Los Angeles, CA

GPA 4.0 / 4.0.

B.Tech., Mathematics & Computing

2015 – 2019

Indian Institute of Technology Guwahati · Guwahati, India

GPA 8.80 / 10.0 · Institute Merit-cum-Means Scholarship for four consecutive years.

Professional Experience 4

Student Researcher

May 2026 – Present

Google · Sunnyvale, CA

Research on long-horizon agents.

Applied Scientist Intern

May 2025 – Aug 2025

Amazon · San Diego, CA

Built an end-to-end RL-based LLM agentic AI system for critical fraud detection.

Designed a reward model and a novel Monte-Carlo Tree Search (MCTS) planner, guided by the reward model, to select top-k specialized LLM agents and generate reasoning that enriches context for multi-agent LLM debate.
Implemented an RL-based debate moderator that determines optimal debate termination based on query complexity.
Designed retrieval-based planning at test time from pre-computed trees, cutting latency from ~10 minutes to ~20 seconds.
Improved AUC-ROC from 0.729 to 0.798 on a real-world dataset.

Research Assistant

Jan 2022 – Present

University of Southern California · with Prof. Rahul Jain & Prof. Ashutosh Nayyar

On-policy self-distillation for LLMs: proved limitations of existing OPSD methods in guaranteeing monotonic policy improvement and correct credit assignment, and developed a distributional DAgger algorithm with monotonic-improvement and regret guarantees, achieving superior performance on science, coding, and math benchmarks.
Robust offline imitation learning & foundation models: developed robust offline methods under transition uncertainty — an offline IL algorithm and a foundation model that uses data from a single environment to generalize to diverse, unseen DeepMind Control Suite environments without online interaction.
Strictly offline imitation learning: developed a normalizing-flow-based IL algorithm that learns optimal policies solely from expert demonstrations, reaching state-of-the-art performance on MuJoCo without environment interaction or auxiliary data.

Research Engineer

Jun 2019 – Dec 2021

Samsung R&D Institute · 6G Lab · Bangalore, India

Radio resource scheduler: applied reinforcement learning to radio-resource scheduling under Quality-of-Service and fairness constraints.
Autoencoder communication: studied autoencoder-based communication over AWGN and Rayleigh-fading channels, and devised an RL-based autoencoder achieving detection accuracy comparable to traditional methods with fewer computations.

Teaching 2

Reinforcement Learning & Large Language Models

EE599 · Aug – Dec 2025

Teaching Assistant · University of Southern California

Held office hours to help students with their coursework and projects, and graded project proposals and reports.

Stochastic Systems & Reinforcement Learning

EE556 · Aug – Dec 2023

Teaching Assistant · University of Southern California

Created homework assignments and solutions, taught discussion sessions, graded exams, and held office hours to help students with coursework and projects.