INDEX
Explanations
questions and expressions of curiosity or uncertainty
New Auto-Interp
Negative Logits
I
-0.16
.pkg
-0.14
лки
-0.14
cre
-0.13
lod
-0.13
no
-0.13
itself
-0.13
WH
-0.13
anyl
-0.13
A
-0.13
POSITIVE LOGITS
how
0.45
whether
0.41
how
0.32
whether
0.30
what
0.30
why
0.29
Whether
0.28
cómo
0.28
æĺ¯åIJ¦
0.27
Whether
0.26
Activations Density 0.179%