INDEX
Explanations
them here looking for tokens indicating negative sentiment or disagreement
the phrase "No" as a repeated expression of refusal or negation
New Auto-Interp
Negative Logits
equ
-0.68
wagon
-0.67
lingu
-0.66
related
-0.63
apan
-0.63
accounting
-0.62
barg
-0.62
particularly
-0.61
exempl
-0.60
around
-0.59
POSITIVE LOGITS
No
3.15
no
2.14
No
2.08
Yes
1.99
NO
1.72
Nothing
1.56
Never
1.53
Neither
1.52
Nobody
1.52
Not
1.45
Activations Density 0.015%