INDEX
Explanations
phrases related to decisions and their implications
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.06
3:0.12
4:0.02
5:0.06
6:0.02
7:0.04
8:0.02
9:0.01
10:0.53
11:0.02
Negative Logits
stores
-1.93
aura
-1.91
dominates
-1.79
rench
-1.78
haunt
-1.77
ife
-1.75
cart
-1.75
world
-1.74
rome
-1.73
イト
-1.71
POSITIVE LOGITS
because
3.24
because
3.23
precaution
2.95
purely
2.93
Because
2.88
due
2.78
solely
2.75
purpose
2.71
Because
2.65
intending
2.63
Activations Density 1.046%