INDEX
Explanations
words related to possession and belonging
New Auto-Interp
Negative Logits
illac
-0.15
oglob
-0.15
اÙ
-0.15
tics
-0.15
arth
-0.14
ASE
-0.14
ekl
-0.14
yan
-0.14
ulk
-0.14
hana
-0.14
POSITIVE LOGITS
azer
0.16
asco
0.15
سÙĪØ¨
0.15
/is
0.15
chaft
0.14
ahas
0.14
AndPassword
0.13
ÃĹ↵↵
0.13
actions
0.13
oming
0.13
Activations Density 0.029%