INDEX
Explanations
actions related to teaching, assisting, and supporting others
New Auto-Interp
Negative Logits
ours
-0.07
arest
-0.07
rax
-0.07
ebin
-0.07
siz
-0.06
ÑĪий
-0.06
’S
-0.06
ieur
-0.06
irket
-0.06
ambre
-0.06
POSITIVE LOGITS
them
0.18
their
0.16
ä»ĸ们
0.14
they
0.14
ihnen
0.14
ä»ĸåĢij
0.13
иÑħ
0.13
their
0.13
loro
0.12
them
0.12
Activations Density 0.086%