INDEX
Explanations
pronouns indicating possession and ownership
New Auto-Interp
Negative Logits
abox
-0.18
ãĥ«ãĥī
-0.16
ald
-0.15
ncia
-0.15
neau
-0.15
iller
-0.14
oard
-0.14
اÙĬد
-0.14
antino
-0.14
/fw
-0.14
POSITIVE LOGITS
amoto
0.14
æºĸ
0.14
ÙĤب
0.14
tor
0.13
rad
0.13
ä¾Ĩ
0.13
orz
0.13
vez
0.13
_IE
0.13
aming
0.13
Activations Density 0.313%