INDEX
Explanations
instances of negative commentary or criticisms about societal issues, particularly regarding gender and violence
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.10
3:0.07
4:0.17
5:0.02
6:0.17
7:0.14
8:0.03
9:0.03
10:0.08
11:0.09
Negative Logits
BuyableInstoreAndOnline
-1.71
isSpecialOrderable
-1.55
okane
-1.52
Reviewed
-1.47
enthal
-1.38
Spot
-1.37
izons
-1.36
Parkway
-1.34
zeb
-1.31
eele
-1.31
POSITIVE LOGITS
nonsense
1.54
tricks
1.49
syll
1.47
backwards
1.47
Bastard
1.44
notation
1.40
]}
1.36
amn
1.36
.'"
1.35
misc
1.35
Activations Density 0.001%