INDEX
Explanations
contractions indicating negation
New Auto-Interp
Negative Logits
’s
-0.18
’n
-0.16
not
-0.16
äºĭæĥħ
-0.15
hen
-0.15
es
-0.15
(“
-0.15
â
-0.15
ye
-0.15
�s
-0.15
POSITIVE LOGITS
necessarily
0.34
'
0.24
anymore
0.22
ches
0.22
ori
0.20
even
0.20
ecessarily
0.19
/'
0.19
ÂĿ
0.18
quite
0.17
Activations Density 0.195%