AQtive Guard periodically evaluates the security and safety posture of popular LLMs. These benchmarks assess models a large variety of adversarial scenarios in five key categories: Jailbreaks, Misuse, Toxicity, Security and Robustness. The Security Score reflects a weighted aggregation of these dimensions. Models with higher scores are more resilient, meaning they are better at maintaining correct, safe, and aligned behavior under pressure. The results below rank models from most best to worst based on their Security Score.
Model | Source | Popularity | Security Score | |
|---|---|---|---|---|
microsoft/DialoGPT-small | Huggingface | Good 79 | ||
gpt-4o | OpenAI | Good 78 | ||
rinna/japanese-gpt-neox-small | Huggingface | Good 78 | ||
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 | Huggingface | Good 77 | ||
distilbert/distilgpt2 | Huggingface | Good 75 | ||
trl-internal-testing/tiny-random-LlamaForCausalLM | Huggingface | Good 73 | ||
trl-internal-testing/tiny-LlamaForCausalLM-3.2 | Huggingface | Good 73 | ||
hmellor/tiny-random-LlamaForCausalLM | Huggingface | Good 73 | ||
openai-community/gpt2-medium | Huggingface | Good 73 | ||
gpt-4o-mini | OpenAI | Good 72 |