INDEX
Explanations
words related to something very obvious, clear, and often negative
instances of the word "blatant" and its variations, indicating obvious or clear wrongdoing
New Auto-Interp
Negative Logits
agine
-0.87
ovember
-0.82
arios
-0.78
yip
-0.77
ulton
-0.77
nesota
-0.74
iership
-0.74
erves
-0.70
ivas
-0.69
safely
-0.68
POSITIVE LOGITS
disregard
0.99
misrepresent
0.94
hypocrisy
0.93
iary
0.91
violations
0.88
violation
0.86
blatant
0.83
fals
0.79
disreg
0.79
falsehood
0.78
Activations Density 0.040%