INDEX
Explanations
pronouns and modal verbs indicating possibilities or actions
pronouns and references to collective actions or experiences
New Auto-Interp
Negative Logits
bats
-0.66
xit
-0.64
hend
-0.62
oner
-0.59
oling
-0.58
oward
-0.57
honorable
-0.57
idge
-0.57
prising
-0.56
Whatever
-0.56
POSITIVE LOGITS
already
1.12
rarely
1.08
seldom
0.99
hadn
0.98
lacks
0.95
tends
0.93
cannot
0.93
hasn
0.92
never
0.91
lacked
0.91
Activations Density 0.467%