INDEX
Explanations
words related to opinions and personal identity
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.09
3:0.32
4:0.12
5:0.03
6:0.02
7:0.11
8:0.05
9:0.04
10:0.06
11:0.05
Negative Logits
////////////////
-1.64
reluct
-1.46
uish
-1.42
Ong
-1.40
iliate
-1.37
Canaver
-1.37
thereum
-1.36
NetMessage
-1.36
ometime
-1.35
Published
-1.34
POSITIVE LOGITS
!",
2.08
?",
2.02
anymore
1.94
!),
1.84
or
1.76
>,
1.75
,'
1.70
'),
1.67
?!"
1.67
?'
1.62
Activations Density 0.009%