INDEX
Explanations
instances of personal pronouns and expressions of status or characteristics
New Auto-Interp
Negative Logits
ÏĦÏį
-0.18
éľĩ
-0.15
ilon
-0.15
awe
-0.15
reportedly
-0.15
Apparently
-0.15
apparently
-0.14
ÎŃ
-0.14
ymes
-0.14
åı·
-0.14
POSITIVE LOGITS
somehow
0.20
stepped
0.20
someone
0.19
suddenly
0.17
transported
0.16
loh
0.15
váºŃy
0.15
tele
0.15
somebody
0.15
someone
0.15
Activations Density 0.149%