INDEX
Explanations
intensifying adverbs that express strong emotions or certainty
New Auto-Interp
Negative Logits
is
-0.22
itself
-0.20
himself
-0.18
urus
-0.18
’s
-0.17
themselves
-0.17
entiful
-0.16
isn
-0.16
reck
-0.15
herself
-0.15
POSITIVE LOGITS
've
0.22
have
0.20
don
0.20
’ve
0.19
دارÙħ
0.18
haven
0.18
estamos
0.17
'm
0.16
iyim
0.16
jsem
0.16
Activations Density 0.085%