INDEX
Explanations
statements or questions asking for reasons or explanations
instances of the word "why" indicating inquiries or explanations
New Auto-Interp
Negative Logits
Roller
-0.74
ymph
-0.71
ages
-0.70
trop
-0.68
rop
-0.66
robe
-0.64
amps
-0.63
puck
-0.63
field
-0.62
Sailor
-0.62
POSITIVE LOGITS
soever
1.16
why
1.14
why
1.08
WHY
1.01
iterranean
0.84
ihad
0.83
Why
0.82
exactly
0.77
tical
0.75
utterstock
0.74
Activations Density 0.036%