INDEX
Explanations
punctuation marks and expressions of discomfort or hesitation
New Auto-Interp
Negative Logits
it
-0.27
å®ĥ
-0.23
It
-0.22
It
-0.19
ï¼Įå®ĥ
-0.18
оно
-0.18
,it
-0.18
nó
-0.18
[it
-0.16
ÑĢаÑĤи
-0.15
POSITIVE LOGITS
if
0.23
personally
0.22
If
0.22
whenever
0.20
given
0.18
when
0.18
_if
0.18
jika
0.18
å¦Ĥæŀľ
0.17
anything
0.17
Activations Density 0.038%