INDEX
Explanations
phrases indicating confusion or a need for understanding
references to the question "why."
New Auto-Interp
Negative Logits
lator
-0.80
amps
-0.72
Roller
-0.72
ymph
-0.71
ãĤ¤ãĥĪ
-0.71
aughed
-0.68
phrine
-0.66
rop
-0.63
aire
-0.62
luck
-0.61
POSITIVE LOGITS
why
0.98
WHY
0.89
soever
0.86
why
0.85
exactly
0.80
Why
0.69
Origin
0.67
justifying
0.63
bother
0.62
iterranean
0.62
Activations Density 0.032%