INDEX
Explanations
questions or statements related to seeking knowledge or information
New Auto-Interp
Negative Logits
onding
-0.68
interstitial
-0.67
pite
-0.67
ovie
-0.66
Stra
-0.61
ñ
-0.61
ortunately
-0.60
permitting
-0.59
edition
-0.59
onite
-0.59
POSITIVE LOGITS
WHY
1.26
why
1.20
how
1.11
whether
1.03
why
1.00
answers
0.99
ABOUT
0.97
HOW
0.96
WHERE
0.94
answered
0.89
Activations Density 0.078%