INDEX
Explanations
topics related to raising awareness about various social issues
New Auto-Interp
Head Attr Weights
0:0.01
1:0.05
2:0.07
3:0.06
4:0.01
5:0.05
6:0.04
7:0.13
8:0.09
9:0.20
10:0.08
11:0.17
Negative Logits
onward
-1.26
onwards
-1.22
aults
-1.20
osure
-1.15
xes
-1.14
gui
-1.14
ornia
-1.13
robe
-1.12
inion
-1.11
eper
-1.11
POSITIVE LOGITS
tattoo
1.24
Pengu
1.17
broadcasters
1.12
redes
1.10
PLA
1.08
STAR
1.07
newcom
1.06
tattoos
1.05
agnetic
1.04
paren
1.03
Activations Density 0.005%