https://arxiv.org/abs/2307.11088
L-Eval: Instituting Standardized Evaluation for Long Context Language Models (Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu)
long context benchmark 셋과 테스트 결과가 하나 나왔군요.
#transformer #benchmark