INDEX
Explanations
possessive forms of words
New Auto-Interp
Negative Logits
’s
-0.25
latter
-0.20
(“
-0.18
‘s
-0.17
ä¸ĢäºĽ
-0.17
/or
-0.16
æĥħåĨµ
-0.16
å£°éŁ³
-0.16
’m
-0.15
-ed
-0.15
POSITIVE LOGITS
been
0.30
got
0.25
gonna
0.24
not
0.24
gotta
0.22
been
0.20
Been
0.20
ÂĿ
0.20
BEEN
0.20
/'
0.19
Activations Density 0.300%