INDEX
Explanations
phrases emphasizing negation or denial
New Auto-Interp
Negative Logits
attente
-0.67
addObject
-0.67
vidia
-0.66
ionage
-0.66
orthand
-0.65
epiece
-0.65
stdc
-0.64
Descripció
-0.64
ạnh
-0.62
Pozdrawiam
-0.62
POSITIVE LOGITS
never
2.89
never
2.67
Never
2.66
Never
2.61
NEVER
2.53
NEVER
2.41
nunca
1.94
Nunca
1.92
Nunca
1.85
nunca
1.75
Activations Density 0.047%