INDEX
Explanations
questions and discussions surrounding societal issues and philosophy
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.02
3:0.09
4:0.48
5:0.04
6:0.03
7:0.02
8:0.05
9:0.11
10:0.01
11:0.02
Negative Logits
���
-2.63
ouble
-2.29
ilan
-2.17
iHUD
-2.12
aples
-2.10
inery
-1.93
mone
-1.89
cffff
-1.88
udicrous
-1.86
aternity
-1.85
POSITIVE LOGITS
?
5.17
?:
4.99
?)
4.71
?),
4.66
?]
4.64
?).
4.55
?,
4.39
?"
4.22
!?
4.21
?!
4.18
Activations Density 0.372%