INDEX
Explanations
possessive forms indicating ownership or relation
New Auto-Interp
Negative Logits
swer
-0.16
eg
-0.16
ardy
-0.15
gin
-0.15
egin
-0.15
çļĦ大
-0.14
âr
-0.14
pmat
-0.14
yn
-0.14
oi
-0.14
POSITIVE LOGITS
ãģĨãģ¡
0.17
ÂĢÂĻ
0.17
ees
0.16
own
0.16
behalf
0.16
доÑĤ
0.15
contribution
0.14
ãģŁãĤģãģ«
0.14
Own
0.14
ãģŁãĤģãģ®
0.13
Activations Density 0.071%