INDEX
Explanations
pronouns and questions related to identity and belonging
New Auto-Interp
Negative Logits
uzey
-0.16
uku
-0.16
isque
-0.15
ones
-0.15
pose
-0.14
uess
-0.14
deny
-0.14
one
-0.14
WithOptions
-0.14
é¡į
-0.13
POSITIVE LOGITS
she
0.16
THEY
0.16
HE
0.16
олÑİ
0.15
they
0.15
864
0.14
itis
0.14
830
0.14
WE
0.14
he
0.13
Activations Density 0.114%