INDEX
Explanations
pronouns 'I' followed by a verb
the usage of the pronoun "I"
New Auto-Interp
Negative Logits
tains
-0.72
pires
-0.63
Philipp
-0.63
heads
-0.62
tnc
-0.61
Mehran
-0.57
Gap
-0.56
excess
-0.55
indistinguishable
-0.54
Reverse
-0.53
POSITIVE LOGITS
'm
1.34
've
1.28
'll
1.17
suppose
1.04
'd
1.02
WI
1.01
nex
0.96
deals
0.93
dunno
0.91
vor
0.89
Activations Density 0.206%