INDEX
Explanations
expressive or exaggerated language related to dissatisfaction
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.07
3:0.07
4:0.09
5:0.07
6:0.07
7:0.09
8:0.09
9:0.08
10:0.09
11:0.08
Negative Logits
apply
-1.88
Located
-1.79
xtap
-1.78
ographs
-1.68
request
-1.63
olin
-1.62
ombs
-1.60
ogens
-1.59
mercial
-1.59
ograph
-1.59
POSITIVE LOGITS
Cyr
1.70
Nile
1.54
Ferrari
1.50
polyg
1.49
meanwhile
1.49
dearly
1.48
Tsukuyomi
1.48
Fiat
1.47
electrom
1.46
regress
1.46
Activations Density 0.000%