INDEX
Explanations
pronouns and verbs that indicate action or state of being
New Auto-Interp
Negative Logits
oproject
-0.16
ÑĪÑĮ
-0.15
upy
-0.14
Larger
-0.13
xCD
-0.13
zag
-0.13
celik
-0.13
à¤ľà¤Ĺ
-0.13
_chg
-0.13
ombo
-0.13
POSITIVE LOGITS
under
0.20
Under
0.18
effective
0.18
its
0.17
under
0.17
Pref
0.17
Pre
0.17
UNDER
0.17
their
0.16
pre
0.16
Activations Density 0.014%