INDEX
Explanations
the word "didn't" followed by personal reflections or statements
instances of negation or expressions showing what the subject did not do
New Auto-Interp
Negative Logits
populated
-0.72
retirees
-0.68
couch
-0.66
protected
-0.65
shelves
-0.62
mutually
-0.62
adversaries
-0.61
disabled
-0.61
Anarchy
-0.60
retired
-0.58
POSITIVE LOGITS
't
1.58
ÃŃ
1.05
´
0.96
n
0.94
ned
0.91
nt
0.89
gered
0.88
iting
0.88
uts
0.88
ovan
0.86
Activations Density 0.083%