INDEX
Explanations
computational algorithms and circuits
text that describes or requests bypassing, compromising, or hijacking systems (jailbreaking/hacking-style instructions).
New Auto-Interp
Negative Logits
throwIfNotFound
0.46
claims
0.46
animals
0.44
edition
0.43
महंगाई
0.43
ingredients
0.42
drug
0.42
rope
0.42
animali
0.42
smelly
0.41
POSITIVE LOGITS
algorithms
1.06
algorit
0.99
subroutine
0.98
алгорит
0.97
algorith
0.95
computational
0.91
algorithm
0.89
Algorithms
0.89
algoritmo
0.88
circuits
0.88
Activations Density 0.223%