cti-bench

This repository contains the data and evaluation scripts for the paper CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence, accepted at NeurIPS 2024. CTIBench is a comprehensive suite of benchmark tasks and datasets designed to evaluate Large Language Models (LLMs) in the field of Cyber Threat Intelligence (CTI).

Dataset details can be found at huggingface: https://huggingface.co/datasets/AI4Sec/cti-bench

evaluation directory contains scripts to evaluate model performance and the response for 5 LLMs - ChatGPT3.5, ChatGPT4, Gemini-1.5, LLAMA3-70B, LLAMA3-8B.

logs directory contains the unprocessed response from ChatGPT3.5, ChatGPT4 and Gemini-1.5 for the tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
evaluation		evaluation
logs		logs
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cti-bench

About

Releases

Packages

Languages

License

xashru/cti-bench

Folders and files

Latest commit

History

Repository files navigation

cti-bench

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages