INDEX
Explanations
actions and outcomes related to attempts, agreements, and failures in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.34
2:0.07
3:0.03
4:0.03
5:0.10
6:0.08
7:0.04
8:0.08
9:0.06
10:0.07
11:0.03
Negative Logits
Fre
-1.71
Masquerade
-1.53
ASA
-1.49
Beast
-1.48
addon
-1.35
Rebellion
-1.34
<-
-1.34
maxwell
-1.33
fre
-1.31
Lav
-1.31
POSITIVE LOGITS
outweigh
1.85
rawdownloadcloneembedreportprint
1.66
outwe
1.55
aceutical
1.53
entail
1.49
})
1.44
cium
1.44
Downloadha
1.41
transpl
1.39
̶
1.38
Activations Density 0.320%