INDEX
Explanations
expressions of doubt or questioning
New Auto-Interp
Negative Logits
bÃło
-0.15
ÃŃnh
-0.15
775
-0.15
ÑĥÑĩа
-0.14
ocaly
-0.14
Ìĥ
-0.14
phin
-0.14
unch
-0.14
gaard
-0.14
ilerine
-0.14
POSITIVE LOGITS
κε
0.16
.nano
0.15
878
0.15
umbo
0.15
786
0.15
ool
0.15
ven
0.14
Deutsch
0.14
echa
0.14
arian
0.14
Activations Density 0.001%