In this data descriptor, we present an open-source corpus, a collection of 6,086 records extracted from a database of 835 publications, labeled manually by expert catalytic curators, following annotation guidelines specifically defined for this task. In this corpus, we extracted nine types of knowledge, including material, regulation method, product, faradaic efficiency, cell setup, electrolyte, synthesis method, current density, and voltage. In addition, we release a silver corpus of automatically extracted entity from the full text of above literature, similarly evaluated and revised to some extent by domain experts.The natural language processing and text mining tools used to collect and codify this information is included in the present repository.
Our Gold Standard Corpus and Silver Standard Corpus are available at Science Data Bank (ScienceDB) at https://www.scidb.cn/anonymous/NjdWeml5 and https://www.scidb.cn/anonymous/M3FNZjZi