INDEX
Explanations
the phrase "no idea"
expressions of uncertainty or lack of understanding
New Auto-Interp
Negative Logits
Reviewed
-0.67
wagen
-0.60
die
-0.60
jan
-0.59
Pers
-0.59
king
-0.59
ocally
-0.59
ouri
-0.58
istent
-0.58
ãĥĪ
-0.56
POSITIVE LOGITS
why
1.30
how
1.19
WHY
1.10
why
1.09
whats
1.06
HOW
1.02
what
0.99
whence
0.96
whatsoever
0.94
how
0.91
Activations Density 0.051%