INDEX
Explanations
negative sentiment or criticism
New Auto-Interp
Head Attr Weights
0:0.10
1:0.24
2:0.02
3:0.03
4:0.02
5:0.23
6:0.07
7:0.02
8:0.04
9:0.06
10:0.06
11:0.05
Negative Logits
ADS
-1.50
radios
-1.48
Editor
-1.45
Ce
-1.45
tv
-1.44
Telesc
-1.43
grounded
-1.42
radio
-1.42
Corrections
-1.41
RF
-1.37
POSITIVE LOGITS
��
2.07
�
2.05
ヴァ
1.98
�士
1.91
��
1.88
エル
1.83
�
1.80
�
1.75
ゴン
1.75
�
1.72
Activations Density 0.009%