Jaehun Jung

I'm a Ph.D student in computer science at the University of Washington, advised by Yejin Choi. I am also a part-time student researcher in Nvidia Research.

My research focuses on how to train and evaluate a model with a model, with minimal human supervision. I am specifically excited in

  • Data Synthesis and Data Selection with Language Models: How do we define good data? Can we use this insight to generate synthetic data that are diverse, correct, and are helpful to model generalization?
  • Science of Automated Evaluation: Can we use a model to reliably evaluate other models? How can we guarantee the automated rubrics will align with ours?
  • Previously I was an undergrad at Seoul National University, advised by Professor U Kang and Jinwook Seo. I was also a part-time researcher in Kakao Enterprise, where I worked on knowledge-grounded dialogue agents.

    Email  /  CV  /  Scholar  /  Twitter  /  Github

    profile photo

    Research

    Prismatic Synthesis & G-Vendi Score: How Data Diversification makes R1-32B a Better Teacher than R1-671B
    Jaehun Jung, Seungju Han*, Ximing Lu*, Skyler Hallinan*, Shrimai Prabhumoye, Mostafa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi
    preprint, 2025
    blog

    We show that data diversity (measured by our proposed metric G-Vendi) strongly predicts how the model generalizes after training. We leverage this finding to strategically diversify synthetic reasoning data. Our resulting datasets, despite generated by 32B LLM, leads to better performance in OOD than R1-671B generated & human-verified datasets.

    Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning.
    Ximing Lu*, Seungju Han*, David Acuna Marrero*, Hyunwoo Kim* Jaehun Jung*, Shrimai Prabhumoye, Niklas Muennighoff, Mostafa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi
    preprint, 2025
    paper / bibtex

    Search-guided distillation mitigates under-thinking & over-thinking of reasoning models, while simultaneously improving the accuracy.

    Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
    Jaehun Jung, Faeze Brahman, Yejin Choi
    ICLR, 2025 (Oral, Top 1.8%)
    paper / bibtex

    We enhance LLM judges with a statistically rigorous guarantee of human agreement. We further extend this guarantee to propose Cascaded Selective Evaluation, where we start from a small cost-effective model as a judge, and escalate to a stronger model only when necessary—all while guaranteeing high agreement with humans.

    Information-Theoretic Distillation for Reference-less Summarization
    Jaehun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Yejin Choi
    COLM, 2024
    paper / bibtex

    Can small models excel at summarization without imitating LLM or human-written references? We present InfoSumm, a framework to distill a powerful summarizer that outperforms order-of-magnitude larger LLM summarizers, solely based on the information-theoretic objective for summarization.

    Impossible Distillation for Paraphrasing and Summarization: How to Make High-quality Lemonade out of Small, Low-quality Models
    Jaehun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi
    NAACL, 2024
    paper / data / bibtex

    It is possible to generate a high-quality dataset for sentential paraphrasing and summarization directly from an off-the-shelf LM, even when it is impossible for the LM itself to reliably perform these tasks.

    Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models
    Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui, Yejin Choi
    NAACL, 2024 (Oral Presentation)
    paper / github / bibtex

    We introduce JamDec, an inference-time algorithm for authorship obfuscation that is domain-agnostic, controllable, yet does not require human supervision.

    Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
    Ximing Lu, Faeze Brahman, Peter West, Jaehun Jung, ..., Xiang Ren, Sean Welleck, Yejin Choi
    EMNLP, 2023
    paper / github / bibtex

    Can we adapt LLMs without fine-tuning? We propose using a lightweight adapter (e.g. GPT-2) during decoding time, efficiently tailoring even the strongest proprietary LLMs toward user-defined reward.

    STEER: Unified Style Transfer with Expert Reinforcement
    Skyler Hallinan, Faeze Brahman, Ximing Lu, Jaehun Jung, Sean Welleck, Yejin Choi
    Findings of EMNLP, 2023
    paper / github / bibtex

    We propose a text style transfer framework from arbitrary source style to many target styles via large-scale data generation with expert-guided decoding and offline RL.

    Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
    Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, Yejin Choi
    EMNLP, 2022 (Oral Presentation)
    paper / github / bibtex

    We improve LM reasoning by generating abductive and recursive explanations from language models, then formulating inference as a satisfiability problem over these generations.

    Learning to Walk across Time for Interpretable Temporal Knowledge Graph Completion
    Jaehun Jung, Jinhong Jung, U Kang
    KDD, 2021
    paper / github / bibtex

    A novel GNN for temporal KG is proposed that encodes an interpretable graph substructure for knowledge graph completion.

    AttnIO: Knowledge Graph Exploration with In-and-Out Attention Flow for Knowledge-Grounded Dialogue
    Jaehun Jung, Bokyung Son, Sungwon Lyu
    EMNLP, 2020
    paper / video / bibtex

    We present a novel decoder model based on attention flow that learns to explore KG and retrieve a relevant knowledge path to ground a dialogue agent.

    DataHalo: A Customizable Notification Visualization System for Personalized and Longitudinal Interactions
    Guhyun Han, Jaehun Jung, Youngho Kim Jinwook Seo
    CHI, 2023
    paper / bibtex

    DataHalo implements a customizable notification visualization system for mobile devices, providing prolonged ambient visualizations based on time-varying importance model to enable longitudinal interaction with the notifications.




    Website design by Jon Barron