INDEX
Explanations
reasons or causes for a certain situation
phrases that explain causes or justifications
New Auto-Interp
Negative Logits
helicop
-0.69
KY
-0.66
Carbuncle
-0.66
chron
-0.66
puck
-0.62
mathemat
-0.61
livest
-0.60
borg
-0.60
sweats
-0.60
yss
-0.59
POSITIVE LOGITS
why
1.20
abl
1.05
why
0.94
WHY
0.94
behind
0.82
forward
0.77
Origin
0.76
stadt
0.75
stay
0.74
Why
0.74
Activations Density 0.027%