cti-bench

This repository contains the data and evaluation scripts for the paper CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence, accepted at NeurIPS 2024. CTIBench is a comprehensive suite of benchmark tasks and datasets designed to evaluate Large Language Models (LLMs) in the field of Cyber Threat Intelligence (CTI).

Dataset details can be found at huggingface: https://huggingface.co/datasets/AI4Sec/cti-bench

evaluation directory contains scripts to evaluate model performance and the response for 5 LLMs - ChatGPT3.5, ChatGPT4, Gemini-1.5, LLAMA3-70B, LLAMA3-8B.

logs directory contains the unprocessed response from ChatGPT3.5, ChatGPT4 and Gemini-1.5 for the tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

cti-bench

Files

README.md

Latest commit

History

README.md

File metadata and controls

cti-bench