INDEX
Explanations
negation or exclusivity in statements
New Auto-Interp
Head Attr Weights
0:0.09
1:0.07
2:0.08
3:0.09
4:0.08
5:0.07
6:0.07
7:0.08
8:0.09
9:0.07
10:0.08
11:0.07
Negative Logits
Reincarn
-3.10
Canaver
-2.60
deceive
-2.54
endi
-2.50
contam
-2.49
prost
-2.45
actresses
-2.44
hypoc
-2.36
rans
-2.34
inaccur
-2.31
POSITIVE LOGITS
AZ
2.77
Berry
2.76
Haw
2.63
Nik
2.62
QL
2.59
Jam
2.57
PF
2.54
ql
2.54
amaz
2.54
Dwell
2.54
Activations Density 0.000%