INDEX
Explanations
questions beginning with "Why" followed by a statement or action
questions or inquiries beginning with "why."
New Auto-Interp
Negative Logits
interstitial
-0.75
Laughs
-0.64
iece
-0.61
apsed
-0.60
marked
-0.60
eki
-0.60
tnc
-0.59
agonists
-0.59
spir
-0.57
engers
-0.57
POSITIVE LOGITS
bother
1.20
do
0.96
did
0.96
aren
0.91
does
0.90
shouldn
0.87
wouldn
0.83
brow
0.82
weren
0.81
should
0.80
Activations Density 0.034%