INDEX
Explanations
questions starting with the word "Why"
the word "Why" and related questioning phrases
New Auto-Interp
Negative Logits
rop
-0.75
Roller
-0.72
wrapper
-0.68
ymph
-0.67
bows
-0.67
interstitial
-0.65
amps
-0.64
oys
-0.63
ãĤ¹
-0.63
ryu
-0.62
POSITIVE LOGITS
soever
1.14
why
0.92
WHY
0.90
why
0.88
Why
0.88
Why
0.73
ihad
0.72
icago
0.71
Does
0.70
iterranean
0.70
Activations Density 0.037%