INDEX
Explanations
the presence of subjective experiences or perceptions about actions and states
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.07
3:0.04
4:0.04
5:0.09
6:0.02
7:0.02
8:0.37
9:0.05
10:0.12
11:0.07
Negative Logits
Caucasus
-1.64
Dude
-1.52
Seym
-1.50
conflicted
-1.48
dar
-1.48
�
-1.44
disqualified
-1.40
iously
-1.39
rodu
-1.38
Ard
-1.32
POSITIVE LOGITS
sequ
1.79
LOD
1.75
,,,,
1.72
Decay
1.69
flation
1.67
EPS
1.66
sequ
1.65
shown
1.64
gain
1.64
enz
1.63
Activations Density 0.514%