INDEX
Explanations
critical discussions around ethics and morality, particularly in the context of societal issues and government actions
New Auto-Interp
Head Attr Weights
0:0.08
1:0.03
2:0.13
3:0.07
4:0.06
5:0.20
6:0.07
7:0.03
8:0.07
9:0.10
10:0.08
11:0.03
Negative Logits
orsi
-1.01
ussions
-1.00
ificate
-0.98
alde
-0.98
incerity
-0.96
ustration
-0.90
ussion
-0.89
Timbers
-0.88
yip
-0.86
Downing
-0.85
POSITIVE LOGITS
)</
0.97
ュ
0.97
pmwiki
0.94
!).
0.91
).[
0.90
cffffcc
0.88
stellar
0.86
ニ
0.86
】
0.85
sonian
0.84
Activations Density 0.139%