INDEX
Explanations
phrases indicating emotional or subjective evaluations
New Auto-Interp
Head Attr Weights
0:0.14
1:0.08
2:0.13
3:0.04
4:0.10
5:0.11
6:0.05
7:0.02
8:0.11
9:0.09
10:0.03
11:0.05
Negative Logits
0004
-1.60
scent
-1.59
UNCLASSIFIED
-1.50
�
-1.42
�
-1.40
inning
-1.40
ovych
-1.38
vibe
-1.38
CLASSIFIED
-1.35
nerv
-1.33
POSITIVE LOGITS
lambda
1.93
upload
1.88
trans
1.87
dr
1.87
their
1.87
suff
1.85
limits
1.83
split
1.80
fixed
1.79
them
1.79
Activations Density 0.011%