INDEX
Explanations
instances of conversation or dialogue
New Auto-Interp
Negative Logits
ãģĿãĤĮ
-0.16
quienes
-0.15
å®ĥ们
-0.14
svých
-0.14
她们
-0.14
enco
-0.14
μÎŃνÏīν
-0.14
leurs
-0.14
ÑģвоиÑħ
-0.14
ifu
-0.13
POSITIVE LOGITS
this
0.38
him
0.37
该
0.31
he
0.31
該
0.30
this
0.29
(this
0.29
è¿Ļ个
0.27
[this
0.27
对æĸ¹
0.26
Activations Density 0.058%