INDEX
Explanations
phrases emphasizing the characteristics or descriptions of various subjects
New Auto-Interp
Negative Logits
itself
-0.41
çļĦä¸Ģ个
-0.23
å®ĥ
-0.20
æĺ¯ä¸Ģ个
-0.20
æĺ¯ä¸ª
-0.19
its
-0.18
ä¸Ģ个
-0.18
коÑĤоÑĢое
-0.18
ä¸Ģ个人
-0.17
ä¸ĢåĢĭ
-0.17
POSITIVE LOGITS
themselves
0.50
ones
0.31
äºĽ
0.27
thems
0.23
are
0.23
nt
0.22
những
0.22
those
0.22
ones
0.21
mga
0.21
Activations Density 0.743%