INDEX
Explanations
It detects tokens that introduce questions—question-word tokens signaling queries.
New Auto-Interp
Negative Logits
والإ
-0.08
防
-0.07
Slave
-0.07
الإ
-0.07
dokonce
-0.07
segue
-0.06
-four
-0.06
الإ
-0.06
стил
-0.06
oles
-0.06
POSITIVE LOGITS
what
0.25
What
0.23
What
0.22
what
0.19
“What
0.19
"What
0.19
WHAT
0.19
.What
0.17
.what
0.14
WHAT
0.14
Activations Density 0.141%