Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Genome modeling and design across all domains of life
Predict author h-index and paper citation counts on the dataset underlying Semanic Scholar
Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
Code and steps used to generate the Data Citation Corpus dump file
Data annotation toolbox supports image, audio and video data.
🧑🚀 全世界最好的LLM资料总结(数据处理、模型训练、模型部署、o1 模型、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
OpenResearcher, an advanced Scientific Research Assistant
Scientific Large Language Models: A Survey on Biological & Chemical Domains
[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
List the AI for Science papers accepted by top conferences
Artificial Intelligence Research for Science (AIRS)
MLCommons Science benchmarking working group
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
DataComp: In search of the next generation of multimodal datasets
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery (EMNLP'24)
Repository for research in the field of Responsible NLP at Meta.
NOMAD lets you manage and share your materials science data in a way that makes it truly useful to you, your group, and the community.
Download Dataset (MP, OQMD, AFLOW, JARVIS etc.) using Matminer, Restful API and AFLUX
FAIR Chemistry's library of machine learning methods for chemistry
API Client for paperswithcode.com
Github for "Reduced, Reused and Recycled" (NeurIPS 2021 Best Paper, D&B Track)
Summarize existing representative LLMs text datasets.
TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)