INDEX
Explanations
reasons or explanations indicated in a sentence
phrases that explain reasons or justifications for various situations
New Auto-Interp
Negative Logits
puck
-0.73
Carbuncle
-0.71
inav
-0.68
sweats
-0.66
helicop
-0.65
broom
-0.64
chron
-0.63
KY
-0.62
Samurai
-0.61
eg
-0.60
POSITIVE LOGITS
why
1.16
abl
1.01
WHY
0.99
why
0.93
behind
0.78
forward
0.73
Why
0.72
Why
0.71
justifying
0.70
¿½
0.70
Activations Density 0.026%