INDEX
Explanations
references to bias and its variations in context
New Auto-Interp
Negative Logits
ä¼´
-0.16
(.)
-0.15
bine
-0.14
agar
-0.14
bio
-0.14
ῦ
-0.14
lw
-0.13
flesh
-0.13
NW
-0.13
ë°©
-0.13
POSITIVE LOGITS
hetto
0.17
rif
0.16
ogg
0.15
ÑĢд
0.15
acz
0.15
emouth
0.15
æĪIJ人
0.15
forme
0.15
odash
0.15
.desktop
0.14
Activations Density 0.015%