INDEX
Explanations
concepts related to societal problems and moral issues
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.04
3:0.10
4:0.04
5:0.13
6:0.02
7:0.03
8:0.07
9:0.21
10:0.17
11:0.07
Negative Logits
assad
-1.14
inav
-1.00
trak
-0.97
uania
-0.95
luaj
-0.94
yip
-0.88
idated
-0.86
Tus
-0.85
orest
-0.85
BuyableInstoreAndOnline
-0.84
POSITIVE LOGITS
spoiler
1.22
paraph
1.04
Spoiler
1.02
spoilers
1.00
commenter
0.99
Gawker
0.96
commenters
0.96
oiler
0.95
Hume
0.94
rebutt
0.91
Activations Density 3.560%