Resource

- Research Paper Resources
Name Year + Paper + Resource
LIU Chuang (2023) M3KE: [GitHub]
(2024) LHMKE: [GitHub]
(2024) OpenEval: [Website]
REN Yuqi (2021) CogAlign: [GitHub]
SHEN Tianhao (2024) RoleEval: [GitHub]
(2023) X-RiSAWOZ: [GitHub]
(2023) Large Language Model Alignment: [GitHub]
JIANG Bojian Automated Progressive Red Teaming: [GitHub]
SHI Dan (2024) CORECODE: [GitHub]
(2024) IRCAN: [GitHub]
ZHU Jingxiang (2024) Multilingual NMT Robustness: [GitHub]
JIN Renren (2022) Multilingual NMT: [GitHub]
(2024) LLM Quantization: [GitHub]
LI Zhigen (2025) ChatSOP: [GitHub]
ZHANG Shaowei (2025) BackMATH: [GitHub]
DONG Weilong (2024) ConTrans: [GitHub]
SUN Haoran (2023) Multilingual E2E Speech Translation: [GitHub]
(2024) FuxiTranyu: [GitHub]
HUANG Yufei (2024) CBBQ Dataset: [GitHub]
GUO Zishan (2023) LLM Evaluation Papers: [GitHub]
(2024) Chinese Spoken-to-Written: [GitHub]
YU Linhao (2023) LFED Dataset: [GitHub]
(2024) CMoralEval: [GitHub]
PAN Leiyu (2023) LLM Translation Robustness: [GitHub]
SHI Ling (2024) CRiskEval: [GitHub]
YANG Lei (2024) DCIS: [GitHub]
(2025) ProBench: [GitHub]
XU Shaoyang (2024) Multilingual Human Values: [GitHub]
(2025) Culture Alignment: [GitHub]


- Corpora
  1. TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models
    • paper: [pdf]
    • dataset and code:
  2. BiPaR: A bilingual MRC dataset on novels [Jing et al. 2019]
  3. Dataset for Shallow Discourse Annotation for Chinese TED Talks.
  4. A Test Suite for Evaluating Discourse Phenomena in Document-level Neural Machine Translation.
  5. RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling.
  6. TED-CDB: A Large-Scale Chinese Discourse Relation Dataset on TED Talks.
  7. Chinese WPLC: A Chinese Dataset for Evaluating Pretrained Language Models on Word Prediction Given Long-Range Context.