INDEX
Explanations
statements related to personal or professional announcements and experiences
New Auto-Interp
Negative Logits
His
-0.27
Himself
-0.26
His
-0.22
Him
-0.20
oneself
-0.19
Ø¥ÙĦÙĬÙĩ
-0.17
Jeho
-0.16
ÐĻого
-0.16
Ðķго
-0.16
ä»ĸçļĦ
-0.15
POSITIVE LOGITS
he
0.50
he
0.45
HE
0.35
hed
0.34
hc
0.32
h
0.30
_he
0.30
он
0.29
HE
0.29
hes
0.29
Activations Density 0.423%