INDEX
Explanations
punctuation and inquiry-related phrases, especially questions and statements of wonder
New Auto-Interp
Negative Logits
ophon
-0.16
èĤ¥
-0.15
ãĥ¼ãĥĢ
-0.14
itou
-0.14
isha
-0.14
ÑĥÑħ
-0.14
ritch
-0.13
GetCurrent
-0.13
caff
-0.13
oders
-0.13
POSITIVE LOGITS
what
0.30
how
0.27
whom
0.26
why
0.25
who
0.24
How
0.23
What
0.23
what
0.22
which
0.22
where
0.20
Activations Density 0.094%