INDEX
Explanations
negative sentiment or criticism
New Auto-Interp
Head Attr Weights
0:0.11
1:0.19
2:0.03
3:0.03
4:0.02
5:0.23
6:0.07
7:0.02
8:0.07
9:0.04
10:0.07
11:0.09
Negative Logits
satur
-1.75
smokes
-1.62
nic
-1.61
ABS
-1.56
IC
-1.51
Stan
-1.50
wast
-1.50
GI
-1.50
het
-1.49
matt
-1.49
POSITIVE LOGITS
iannopoulos
2.21
yssey
2.02
cffffcc
1.86
aeus
1.85
yrinth
1.83
18
1.81
3
1.71
1
1.71
16
1.71
2
1.69
Activations Density 0.008%