INDEX
Explanations
phrases associated with providing explanations or justifications
the phrase "the reason" and variations of it
New Auto-Interp
Negative Logits
KY
-0.74
chron
-0.63
Carbuncle
-0.61
rog
-0.60
helicop
-0.60
inav
-0.60
wana
-0.59
neighb
-0.57
borg
-0.57
ALTH
-0.57
POSITIVE LOGITS
why
1.20
abl
1.10
why
0.97
WHY
0.94
behind
0.86
ably
0.76
Why
0.74
cited
0.73
Why
0.73
quickShipAvailable
0.71
Activations Density 0.029%