INDEX
Explanations
phrases related to various forms of involvement or participation
New Auto-Interp
Negative Logits
all
-0.15
plist
-0.14
rees
-0.13
ä¸įäºĨ
-0.13
ienes
-0.13
à¸Ĺย
-0.13
ož
-0.13
oons
-0.13
yz
-0.13
toutes
-0.12
POSITIVE LOGITS
ä¸Ģ个
0.34
an
0.34
sebuah
0.31
a
0.31
ä¸ĢåĢĭ
0.30
someone
0.26
somebody
0.24
æĺ¯ä¸Ģ个
0.24
someone
0.23
ä¸Ģ个人
0.22
Activations Density 0.324%