INDEX
Explanations
questions or statements indicating confusion or requesting clarification
instances of the word "why" and its contextual usage suggesting reasoning or explanations
New Auto-Interp
Negative Logits
lator
-0.79
Roller
-0.77
ymph
-0.74
ãĤ¤ãĥĪ
-0.74
rop
-0.63
thus
-0.63
ãĤ¹
-0.61
opic
-0.61
aughed
-0.61
shaw
-0.61
POSITIVE LOGITS
soever
1.00
why
0.89
why
0.79
WHY
0.77
exactly
0.68
Origin
0.68
ihad
0.65
abl
0.65
abouts
0.65
eve
0.64
Activations Density 0.036%