INDEX
Explanations
words or phrases related to clear and evident wrongdoing or deception
instances of the word "blatant" and its variations
New Auto-Interp
Negative Logits
ulton
-0.93
oult
-0.76
erves
-0.76
encers
-0.75
ather
-0.73
ovember
-0.72
arios
-0.72
safely
-0.72
adal
-0.70
agine
-0.70
POSITIVE LOGITS
iary
0.88
violation
0.86
misrepresent
0.84
violations
0.82
blatant
0.79
blatantly
0.79
disregard
0.78
hypocrisy
0.77
contradiction
0.76
glaring
0.76
Activations Density 0.020%