Alignment Faking Explorer
Base Model:
Select a base model...
Qwen/Qwen2.5-72B-Instruct-Turbo Qwen2.5-72B-Instruct-Turbo
Qwen/Qwen2.5-Coder-32B-Instruct Qwen2.5-Coder-32B-Instruct
claude-3-5-sonnet-20240620 claude-3-5-sonnet-20240620
claude-3-5-sonnet-20241022 claude-3-5-sonnet-20241022
claude-3-7-sonnet-20250219 claude-3-7-sonnet-20250219
claude-3-opus-20240229 claude-3-opus-20240229
deepinfra:meta-llama/llama-3.1-405b-instruct llama-3.1-405b-instruct
deepinfra:meta-llama/llama-3.1-8b-instruct llama-3.1-8b-instruct
deepinfra:meta-llama/llama-3.3-70b-instruct llama-3.3-70b-instruct
deepseek-ai/DeepSeek-R1 DeepSeek-R1
deepseek-ai/DeepSeek-V3 DeepSeek-V3
deepseek/deepseek-r1-zero:free deepseek-r1-zero:free
ft:gpt-4o-2024-08-06: 20K Claude Introspection ft:gpt-4o-2024-08-06:scale-safety-research-1::B3WFRUlg
ft:gpt-4o-2024-08-06:nyu-arg::BWFzO2oJ ft:gpt-4o-2024-08-06:nyu-arg::BWFzO2oJ
ft:gpt-4o-2024-08-06:nyu-arg::BWG3dBE4 ft:gpt-4o-2024-08-06:nyu-arg::BWG3dBE4
ft:gpt-4o-2024-08-06:scale-safety-research-1::BA9q6tsz ft:gpt-4o-2024-08-06:scale-safety-research-1::BA9q6tsz
ft:gpt-4o-2024-08-06:scale-safety-research-1::BAVsNKTU ft:gpt-4o-2024-08-06:scale-safety-research-1::BAVsNKTU
ft:gpt-4o-2024-08-06:scale-safety-research-1::BAWbmjjv ft:gpt-4o-2024-08-06:scale-safety-research-1::BAWbmjjv
ft:gpt-4o-2024-08-06:scale-safety-research-1::BAriVAmS ft:gpt-4o-2024-08-06:scale-safety-research-1::BAriVAmS
ft:gpt-4o-2024-08-06:scale-safety-research-1::BCPVdEVV ft:gpt-4o-2024-08-06:scale-safety-research-1::BCPVdEVV
ft:gpt-4o-2024-08-06:scale-safety-research-1::BDLn1xGn ft:gpt-4o-2024-08-06:scale-safety-research-1::BDLn1xGn
ft:gpt-4o-2024-08-06:scale-safety-research-1::BDLu5UcD ft:gpt-4o-2024-08-06:scale-safety-research-1::BDLu5UcD
ft:gpt-4o-2024-08-06:scale-safety-research-1::BDPyX9ix ft:gpt-4o-2024-08-06:scale-safety-research-1::BDPyX9ix
ft:gpt-4o-2024-08-06:scale-safety-research-1::BEgGxvJy ft:gpt-4o-2024-08-06:scale-safety-research-1::BEgGxvJy
ft:gpt-4o-2024-08-06:scale-safety-research-1::BEhvPCJU ft:gpt-4o-2024-08-06:scale-safety-research-1::BEhvPCJU
ft:gpt-4o-2024-08-06:scale-safety-research-1::BHnNpEps ft:gpt-4o-2024-08-06:scale-safety-research-1::BHnNpEps
ft:gpt-4o-2024-08-06:scale-safety-research-1::BKAgseZw ft:gpt-4o-2024-08-06:scale-safety-research-1::BKAgseZw
ft:gpt-4o-2024-08-06:scale-safety-research-1::BNbsu0Qj ft:gpt-4o-2024-08-06:scale-safety-research-1::BNbsu0Qj
ft:gpt-4o-2024-08-06:scale-safety-research-1::BRUauxGo ft:gpt-4o-2024-08-06:scale-safety-research-1::BRUauxGo
ft:gpt-4o-2024-08-06:scale-safety-research-1::BRUo4zjZ ft:gpt-4o-2024-08-06:scale-safety-research-1::BRUo4zjZ
ft:gpt-4o-2024-08-06:scale-safety-research-1::BUuk1cBs ft:gpt-4o-2024-08-06:scale-safety-research-1::BUuk1cBs
ft:gpt-4o-2024-08-06:scale-safety-research-1::BWDuAIbf ft:gpt-4o-2024-08-06:scale-safety-research-1::BWDuAIbf
ft:gpt-4o-2024-08-06:scale-safety-research-1::BWEOrQWy ft:gpt-4o-2024-08-06:scale-safety-research-1::BWEOrQWy
ft:gpt-4o-2024-08-06:scale-safety-research-1::BWFIVujc ft:gpt-4o-2024-08-06:scale-safety-research-1::BWFIVujc
ft:gpt-4o-2024-08-06:scale-safety-research-1::BWGF1fcT ft:gpt-4o-2024-08-06:scale-safety-research-1::BWGF1fcT
ft:gpt-4o-2024-08-06:scale-safety-research-1::BWGgUDNt ft:gpt-4o-2024-08-06:scale-safety-research-1::BWGgUDNt
ft:gpt-4o-2024-08-06:scale-safety-research-1::BWKLMRUS ft:gpt-4o-2024-08-06:scale-safety-research-1::BWKLMRUS
ft:gpt-4o-2024-08-06:scale-safety-research-1::BWLzEXbu ft:gpt-4o-2024-08-06:scale-safety-research-1::BWLzEXbu
ft:gpt-4o-2024-08-06:scale-safety-research-misc::BWF8gdL1 ft:gpt-4o-2024-08-06:scale-safety-research-misc::BWF8gdL1
ft:gpt-4o-2024-08-06:scale-safety-research-misc::BWGDeue5 ft:gpt-4o-2024-08-06:scale-safety-research-misc::BWGDeue5
ft:gpt-4o-2024-08-06:scale-safety-research-misc::BWGqp4Il ft:gpt-4o-2024-08-06:scale-safety-research-misc::BWGqp4Il
google/gemini-2.0-flash-001 gemini-2.0-flash-001
google/gemini-2.5-flash-preview gemini-2.5-flash-preview
gpt-4-1106-preview gpt-4-1106-preview
gpt-4.1-2025-04-14 gpt-4.1-2025-04-14
gpt-4.1-mini-2025-04-14 gpt-4.1-mini-2025-04-14
gpt-4.5-preview-2025-02-27 gpt-4.5-preview-2025-02-27
gpt-4o-2024-08-06 gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18 gpt-4o-mini-2024-07-18
meta-llama/Llama-3.3-70B-Instruct-Turbo Llama-3.3-70B-Instruct-Turbo
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 Llama-4-Maverick-17B-128E-Instruct-FP8
meta-llama/Llama-4-Scout-17B-16E-Instruct Llama-4-Scout-17B-16E-Instruct
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo Meta-Llama-3.1-405B-Instruct-Turbo
mistralai/Mixtral-8x22B-Instruct-v0.1 Mixtral-8x22B-Instruct-v0.1
mistralai/mistral-large-2411 mistral-large-2411
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Llama-3.1-Nemotron-70B-Instruct-HF
x-ai/grok-3-beta grok-3-beta
Finetune:
First select a base model...
System Prompt:
First select a finetune...
View Results
Selected Configuration
Free Tier
Compliance Rate:
-
Refusal Rate:
-
Alignment Faking:
-
Paid Tier
Compliance Rate:
-
Refusal Rate:
-
Alignment Faking:
-
Compact Alignment Metrics