|
Item Response Scaling Laws: A Measurement Theory Approach to Generalizable Neural Performance Prediction
Sang Truong*,
Yuheng Tu*,
Rylan Schaeffer,
Sanmi Koyejo
Under Review
PDF /
Code
|
|
Fantastic Bugs and Where to Find Them in AI Benchmarks
Sang Truong*,
Yuheng Tu*,
Michael Hardy*,
Anka Reuel,
Zeyu Tang,
Jirayu Burapacheep,
Jonathan Perera,
Chibuike Uwakwe,
Benjamin W. Domingue,
Nick Haber,
Sanmi Koyejo
NeurIPS 2025 D&B
PDF /
Code /
Data /
PR to HELM
|
|
Reliable and Efficient Amortized Model-based Evaluation
Sang Truong,
Yuheng Tu,
Percy Liang,
Bo Li,
Sanmi Koyejo
ICML 2025
Openreview /
Code /
Data /
PR to HELM /
HELM Blog /
Stanford Report /
Talk
|
|
AIR-BENCH 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
Yi Zeng*,
Yu Yang*,
Andy Zhou*,
Jeffrey Ziwei Tan*,
Yuheng Tu*,
Yifan Mai*,
Kevin Klyman,
Minzhou Pan,
Ruoxi Jia,
Dawn Song,
Percy Liang,
Bo Li
ICLR 2025 Spotlight
Openreview /
Code /
Data /
Wired Article /
Blog
|
|
NQFL: Nonuniform Quantization for Communication Efficient Federated Learning
Guojun Chen,
Kaixuan Xie,
Yuheng Tu,
Tiecheng Song,
Yinfei Xu,
Jing Hu,
Lun Xin
IEEE Communications Letters (COMML)
PDF /
Code /
COMML
|
|